Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-29 Thread Mridul Muralidharan
ynold Xin > Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid; Jason Lowe; Matei > Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev > Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar > Processing Support > > More feedback would be great, this has

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-29 Thread Tom Graves
n; Imran Rashid; Jason Lowe; Matei Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support More feedback would be great, this has been open a long time though, let's extend til Wednesday the 29th and

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-27 Thread Felix Cheung
: Public APIs for extended Columnar Processing Support More feedback would be great, this has been open a long time though, let's extend til Wednesday the 29th and see where we are at. Tom Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InP

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-26 Thread Tom Graves
More feedback would be great, this has been open a long time though, let's extend til Wednesday the 29th and see where we are at. Tom Sent from Yahoo Mail on Android On Sat, May 25, 2019 at 6:28 PM, Holden Karau wrote: Same I meant to catch up after kubecon but had some unexpected trave

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-25 Thread Holden Karau
Same I meant to catch up after kubecon but had some unexpected travels. On Sat, May 25, 2019 at 10:56 PM Reynold Xin wrote: > Can we push this to June 1st? I have been meaning to read it but > unfortunately keeps traveling... > > On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun > wrote: > >> +1 >>

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-25 Thread Reynold Xin
Can we push this to June 1st? I have been meaning to read it but unfortunately keeps traveling... On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun wrote: > +1 > > Thanks, > Dongjoon. > > On Fri, May 24, 2019 at 17:03 DB Tsai wrote: > >> +1 on exposing the APIs for columnar processing support. >> >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-25 Thread Dongjoon Hyun
+1 Thanks, Dongjoon. On Fri, May 24, 2019 at 17:03 DB Tsai wrote: > +1 on exposing the APIs for columnar processing support. > > I understand that the scope of this SPIP doesn't cover AI / ML > use-cases. But I saw a good performance gain when I converted data > from rows to columns to leverage

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-24 Thread DB Tsai
+1 on exposing the APIs for columnar processing support. I understand that the scope of this SPIP doesn't cover AI / ML use-cases. But I saw a good performance gain when I converted data from rows to columns to leverage on SIMD architectures in a POC ML application. With the exposed columnar proc

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-15 Thread Bobby Evans
It would allow for the columnar processing to be extended through the shuffle. So if I were doing say an FPGA accelerated extension it could replace the ShuffleExechangeExec with one that can take a ColumnarBatch as input instead of a Row. The extended version of the ShuffleExchangeExec could then

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-15 Thread Imran Rashid
sorry I am late to the discussion here -- the jira mentions using this extensions for dealing with shuffles, can you explain that part? I don't see how you would use this to change shuffle behavior at all. On Tue, May 14, 2019 at 10:59 AM Thomas graves wrote: > Thanks for replying, I'll extend

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-14 Thread Thomas graves
Thanks for replying, I'll extend the vote til May 26th to allow your and other people feedback who haven't had time to look at it. Tom On Mon, May 13, 2019 at 4:43 PM Holden Karau wrote: > > I’d like to ask this vote period to be extended, I’m interested but I don’t > have the cycles to review

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-13 Thread Holden Karau
I’d like to ask this vote period to be extended, I’m interested but I don’t have the cycles to review it in detail and make an informed vote until the 25th. On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng wrote: > My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't > feel strong

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-13 Thread Xiangrui Meng
My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't feel strongly about it. I would still suggest doing the following: 1. Link the POC mentioned in Q4. So people can verify the POC result. 2. List public APIs we plan to expose in Appendix A. I did a quick check. Beside ColumnarB

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-13 Thread Thomas graves
It would be nice to get feedback from people who responded on the other vote thread - Reynold, Matei, Xiangrui, does the new version look good? Thanks, Tom On Mon, May 13, 2019 at 8:22 AM Jason Lowe wrote: > > +1 (non-binding) > > Jason > > On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: >>

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-13 Thread Jason Lowe
+1 (non-binding) Jason On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > Hi everyone, > > I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs > for extended Columnar Processing Support. The proposal is to extend > the support to allow for more columnar processing. We had

RE: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-12 Thread tcondie
+1 (non-binding) Tyson Condie From: Kazuaki Ishizaki Sent: Thursday, May 9, 2019 9:17 AM To: Bryan Cutler Cc: Bobby Evans ; Spark dev list ; Thomas graves Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support +1 (non-binding) Kazuaki Ishizaki

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-09 Thread Kazuaki Ishizaki
+1 (non-binding) Kazuaki Ishizaki From: Bryan Cutler To: Bobby Evans Cc: Thomas graves , Spark dev list Date: 2019/05/09 03:20 Subject:Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support +1 (non-binding) On Tue, May 7, 2019 at 12:04

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-08 Thread Bryan Cutler
+1 (non-binding) On Tue, May 7, 2019 at 12:04 PM Bobby Evans wrote: > I am +! > > On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > >> Hi everyone, >> >> I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs >> for extended Columnar Processing Support. The proposal is to ext

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-07 Thread Bobby Evans
I am +! On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > Hi everyone, > > I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs > for extended Columnar Processing Support. The proposal is to extend > the support to allow for more columnar processing. We had previous > vote

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-02 Thread Bryan Cutler
bby Evans > wrote: > > > > > > > > > > I think you misunderstood the point of this SPIP. I responded to > > your comments in the SPIP JIRA. > > > > > > > > > > On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng > > wrote:

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-30 Thread Bobby Evans
> 2. ML/DL systems that can benefits from columnar format are mostly > in Python. > > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > > > So would an improved Pandas UDF API

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-23 Thread Matei Zaharia
> > 2. ML/DL systems that can benefits from columnar format are mostly in > > > Python. > > > 3. Simple operations, though benefits vectorization, might not be worth > > > the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be goo

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
umnar data processing support. From: Jules Damji Sent: Friday, April 19, 2019 12:21 PM To: Bryan Cutler Cc: Dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support + (non-binding) Sent from my iPhone Pardon the dumb thumb typos :) On Apr 19, 2019,

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
Spark could >> just expose byte arrays directly and work on those if the API is not >> guaranteed to stay stable (that is, we’d still use our own classes to >> manipulate the data internally, and end users could use the Arrow library >> if they want it). >> >> Matei &g

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Reynold Xin
;> keep a storage format backward-compatible: just document the format and >>> extend it only in ways that don’t break the meaning of old data (for >>> example, add new version numbers or field types that are read in a >>> d

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
by Evans wrote: >> > > >> > > I think you misunderstood the point of this SPIP. I responded to your >> comments in the SPIP JIRA. >> > > >> > > On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng >> wrote: >> > > I posted my comment

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Xiangrui Meng
at can benefits from columnar format are mostly in > Python. > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough? For example, > SPARK-264

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
o would an improved Pandas UDF API would be good enough? For example, > > SPARK-26412 (UDF that takes an iterator of of Arrow batches). > > > > Sorry that I should join the discussion earlier! Hope it is not too late:) > > > > On Fri, Apr 19, 2

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
risky. Arrow might have >> 1.0 release someday. >> > > 2. ML/DL systems that can benefits from columnar format are mostly in >> Python. >> > > 3. Simple operations, though benefits vectorization, might not be >> worth the data exchange overhead. >> > >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bryan Cutler
terator of of Arrow batches). > > > > > > Sorry that I should join the discussion earlier! Hope it is not too > late:) > > > > > > On Fri, Apr 19, 2019 at 1:20 PM wrote: > > > +1 (non-binding) for better columnar data processing support. > &g

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Matei Zaharia
gt; So would an improved Pandas UDF API would be good enough? For example, > > SPARK-26412 (UDF that takes an iterator of of Arrow batches). > > > > Sorry that I should join the discussion earlier! Hope it is not too late:) > > > > On Fri, Apr 19, 2019 at 1:2

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bobby Evans
ARK-26412 (UDF that takes an iterator of of Arrow batches). > > > > Sorry that I should join the discussion earlier! Hope it is not too > late:) > > > > On Fri, Apr 19, 2019 at 1:20 PM wrote: > > +1 (non-binding) for better columnar data processing support. >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Matei Zaharia
columnar data processing support. > > > > From: Jules Damji > Sent: Friday, April 19, 2019 12:21 PM > To: Bryan Cutler > Cc: Dev > Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar > Processing Support > > > > + (non-bind

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bobby Evans
t:* Friday, April 19, 2019 12:21 PM >> *To:* Bryan Cutler >> *Cc:* Dev >> *Subject:* Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended >> Columnar Processing Support >> >> >> >> + (non-binding) >> >> Sent from my iPhone >> >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Xiangrui Meng
t; *From:* Jules Damji > *Sent:* Friday, April 19, 2019 12:21 PM > *To:* Bryan Cutler > *Cc:* Dev > *Subject:* Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended > Columnar Processing Support > > > > + (non-binding) > > Sent from my iPhone > > Pardon th

RE: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread tcondie
+1 (non-binding) for better columnar data processing support. From: Jules Damji Sent: Friday, April 19, 2019 12:21 PM To: Bryan Cutler Cc: Dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support + (non-binding) Sent from my iPhone Pardon the

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Jules Damji
+ (non-binding) Sent from my iPhone Pardon the dumb thumb typos :) > On Apr 19, 2019, at 10:30 AM, Bryan Cutler wrote: > > +1 (non-binding) > >> On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe wrote: >> +1 (non-binding). Looking forward to seeing better support for processing >> columnar data.

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Bryan Cutler
+1 (non-binding) On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe wrote: > +1 (non-binding). Looking forward to seeing better support for processing > columnar data. > > Jason > > On Tue, Apr 16, 2019 at 10:38 AM Tom Graves > wrote: > >> Hi everyone, >> >> I'd like to call for a vote on SPARK-27396

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-18 Thread Jason Lowe
+1 (non-binding). Looking forward to seeing better support for processing columnar data. Jason On Tue, Apr 16, 2019 at 10:38 AM Tom Graves wrote: > Hi everyone, > > I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for > extended Columnar Processing Support. The proposal is to ex

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Bobby Evans
I am +1, I better be because I am proposing the SPIP. Thanks, Bobby On Tue, Apr 16, 2019 at 10:38 AM Tom Graves wrote: > Hi everyone, > > I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for > extended Columnar Processing Support. The proposal is to extend the > support to allow