Arrow RecordBatches to Spark Dataframe

Tanveer Ahmad - EWI Wed, 24 Jun 2020 21:27:10 -0700

Hi all,

I have a small question, if you people can help me.


In this code 
snippet<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_linar-2Djether_7dd61ed6fa89098ab9c58a1ab428b2b5&d=DwMFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=0FbbJetCCSYzJEnEDCQ1rNv76vTL6SUFCukKhvNosPs&m=xmBkwlg97mtA1QdP5CjruMn_xeOPwDNai-A67sGzgE8&s=AamSgwvubLZjISfIuoBJCWRNB4aikOo78kezYSyRMqw&e=>,
 Jether is converting an prdd (RDD) of pd.Dataframes objects to Arrow 
RecordBatches (slices) and then to Spark Dataframe finally. Similarly the code 
in 
Scala<https://github.com/apache/spark/blob/65a189c7a1ddceb8ab482ccc60af5350b8da5ea5/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L192-L206>
 converts   JavaRDD to Spark Dataframe.

If I already have an ardd (RDD) of pa.RecordBatch (Arrow RecordBatches) 
objects, how can I convert it to Spark Dataframe directly without using Pandas 
in PySpark? Thanks.


Regards,
Tanveer Ahmad

Arrow RecordBatches to Spark Dataframe

Reply via email to