Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Reynold Xin
"if others think it would be helpful, we can cancel this vote, update the SPIP to clarify exactly what I am proposing, and then restart the vote after we have gotten more agreement on what APIs should be exposed" That'd be very useful. At least I was confused by what the SPIP was about. No

Re: Understanding "shared" memory implications

2016-03-19 Thread Reynold Xin
I always thought Arrow was just an in-memory format, and it is the responsibility of whoever else that want to use it to carry that responsibilities out, because depending on workloads, different frameworks might pick very different applications. Otherwise it seems to be doing too much and having

Re: Comparing with Parquet

2016-02-25 Thread Reynold Xin
To put it even more layman, on-disk formats are typically designed for more permanent storage on disks/ssds, and as a result the format would want to reduce the size, because: 1. For some clusters, they are bottlenecked by the amount of disk space available. In these cases, you'd want to compress