Re: [build system] meet your build engineer @ the sparkAI summit!

2019-04-22 Thread shane knapp
just a quick reminder: if you want to confirm that i am, indeed, a living breathing human being, and you will be at the spark/ai summit this week, stop on by the riselab booth and say hi! ;) we'll also have a couple of researchers from the lab present, and will be doing some project demos

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-22 Thread Shixiong(Ryan) Zhu
+1 I have tested it and looks good! Best Regards, Ryan On Sun, Apr 21, 2019 at 8:49 PM Wenchen Fan wrote: > Yea these should be mentioned in the 2.4.1 release notes. > > It seems we only have one ticket that is labeled as "release-notes" for > 2.4.2:

Re: Is there a way to read a Parquet File as ColumnarBatch?

2019-04-22 Thread Jacek Laskowski
Hi Priyanka, I've been exploring this part of Spark SQL and could help a little bit. > but for some reason it never hit the breakpoints I placed in these classes. Was this for local[*]? I ran "SPARK_SUBMIT_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"

Re: Spark jdbc update SaveMode

2019-04-22 Thread NuthanReddy
Hi Maciej Bryński, Did you happen to finish that or is there a way to do it? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Reynold Xin
"if others think it would be helpful, we can cancel this vote, update the SPIP to clarify exactly what I am proposing, and then restart the vote after we have gotten more agreement on what APIs should be exposed" That'd be very useful. At least I was confused by what the SPIP was about. No

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
Ok, I'm cancelling the vote for now then and we will make some updates to the SPIP to try to clarify. Tom On Monday, April 22, 2019, 1:07:25 PM CDT, Reynold Xin wrote: "if others think it would be helpful, we can cancel this vote, update the SPIP to clarify exactly what I am

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
Agreed. Tom, could you cancel the vote? On Mon, Apr 22, 2019 at 1:07 PM Reynold Xin wrote: > "if others think it would be helpful, we can cancel this vote, update the > SPIP to clarify exactly what I am proposing, and then restart the vote > after we have gotten more agreement on what APIs

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
Yes, it is technically possible for the layout to change. No, it is not going to happen. It is already baked into several different official libraries which are widely used, not just for holding and processing the data, but also for transfer of the data between the various implementations.

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
Based on there is still discussion and Spark Summit is this week, I'm going to extend the vote til Friday the 26th. TomOn Monday, April 22, 2019, 8:44:00 AM CDT, Bobby Evans wrote: Yes, it is technically possible for the layout to change.  No, it is not going to happen.  It is

Is there a way to read a Parquet File as ColumnarBatch?

2019-04-22 Thread Priyanka Gomatam
Hi, I am new to Spark and have been playing around with the Parquet reader code. I have two questions: 1. I saw the code that starts at DataSourceScanExec class, and moves on to the ParquetFileFormat class and does a VectorizedParquetRecordReader. I tried doing a spark.read.parquet(...) and

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Xiangrui Meng
Per Robert's comment on the JIRA, ETL is the main use case for the SPIP. I think the SPIP should list a concrete ETL use case (from POC?) that can benefit from this *public Java/Scala API, *does *vectorization*, and significantly *boosts the performance *even with data conversion overhead. The