This is for pyspark? This might be a better question on the Spark mailing list for mechanics, but I think the only way to do this efficiently is going though the DataSource/DataSourceV2 APIs in Scala/Java. I think Ryan Murray prototyped something along this path a while ago [1], but I'm not sure of its current state.
[1] https://github.com/rymurr/flight-spark-source/blob/master/src/main/java/org/apache/arrow/flight/spark/FlightDataReader.java On Fri, Mar 25, 2022 at 12:01 PM James Duong <[email protected]> wrote: > Most of the examples I've seen for loading data into Spark convert a > FlightStreamReader to pandas using to_pandas(). > > However this seems to load the entire contents of the stream into memory. > Is there an easy way to load individual chunks from the FlightStream into > dataframes? I figure you could call get_chunk() to get a record batch then > convert the batch to a dataframe. > > -- > > *James Duong* > Lead Software Developer > Bit Quill Technologies Inc. > Direct: +1.604.562.6082 | [email protected] > https://www.bitquilltech.com > > This email message is for the sole use of the intended recipient(s) and > may contain confidential and privileged information. Any unauthorized > review, use, disclosure, or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy > all copies of the original message. Thank you. >
