Re: Getting Flight data into Spark Dataframes

Micah Kornfield Fri, 25 Mar 2022 12:31:40 -0700

This is for pyspark?

This might be a better question on the Spark mailing list for mechanics,
but I think the only way to do this efficiently is going though the
DataSource/DataSourceV2 APIs in Scala/Java.  I think Ryan Murray prototyped
something along this path a while ago [1], but I'm not sure of its current
state.


[1]
https://github.com/rymurr/flight-spark-source/blob/master/src/main/java/org/apache/arrow/flight/spark/FlightDataReader.java

On Fri, Mar 25, 2022 at 12:01 PM James Duong <[email protected]>
wrote:

> Most of the examples I've seen for loading data into Spark convert a
> FlightStreamReader to pandas using to_pandas().
>
> However this seems to load the entire contents of the stream into memory.
> Is there an easy way to load individual chunks from the FlightStream into
> dataframes? I figure you could call get_chunk() to get a record batch then
> convert the batch to a dataframe.
>
> --
>
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | [email protected]
> https://www.bitquilltech.com
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>

Re: Getting Flight data into Spark Dataframes

Reply via email to