Re: Java - Spark dataframe to Arrow format

2019-12-11 Thread Micah Kornfield
t; Liya Fan > > > > On Fri, Dec 6, 2019 at 2:14 AM Chen Li wrote: > > > > > We have a similar use case, and we use ArrowConverters.scala mentioned > by > > > Wes. However, the overhead of the conversion is kinda high. > > > -- > > &

Re: Java - Spark dataframe to Arrow format

2019-12-06 Thread GaoXiang Wang
rs.scala mentioned by > > Wes. However, the overhead of the conversion is kinda high. > > -- > > *From:* Wes McKinney > > *Sent:* Thursday, December 5, 2019 6:53 AM > > *To:* dev > > *Cc:* Fan Liya ; > > jeetendra.jais...@impetus.co.in.invalid > > >

Re: Java - Spark dataframe to Arrow format

2019-12-06 Thread Fan Liya
ney > *Sent:* Thursday, December 5, 2019 6:53 AM > *To:* dev > *Cc:* Fan Liya ; > jeetendra.jais...@impetus.co.in.invalid > > *Subject:* Re: Java - Spark dataframe to Arrow format > > hi folks, > > I understand the question to be about serialization. > > see >

Re: Java - Spark dataframe to Arrow format

2019-12-05 Thread Chen Li
Subject: Re: Java - Spark dataframe to Arrow format hi folks, I understand the question to be about serialization. see * https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java * https://github.com/apache/spark/blob/master/sql

Re: Java - Spark dataframe to Arrow format

2019-12-05 Thread Wes McKinney
hi folks, I understand the question to be about serialization. see * https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java * https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/executio

Re: Java - Spark dataframe to Arrow format

2019-12-05 Thread GaoXiang Wang
Hi Jeetendra and Liya, I am actually having a similar use case. We have some data stored as *parquet format in HDFS* and would like to make use of Apache Arrow to improve compute performance if possible. Right now, I didn't see there is a direct way to do in Java with Spark. I have search the Spa

Re: Java - Spark dataframe to Arrow format

2019-12-05 Thread Fan Liya
Hi Jeetendra, I am not sure if I understand your question correctly. Arrow is an in-memory columnar data format, and Spark has its own in-memory data format for DataFrame, which is invisible to end users. So the Spark user has no control over the underlying in-memory layout. If you really want t

Java - Spark dataframe to Arrow format

2019-12-05 Thread Jeetendra Kumar Jaiswal
Hi Dev Team, Can someone please let me know how to convert spark data frame to Arrow format. I am coding in Java. Java documentation of Arrow just has function API information. It is little hard to develop without proper documentation. Is there a way to directly convert spark dataframe to Arro