Hi I have a DataFrame which I need to convert into JavaRDD and back to
DataFrame I have the following code
DataFrame sourceFrame =
hiveContext.read().format("orc").load("/path/to/orc/file");
//I do order by in above sourceFrame and then I convert it into JavaRDD
JavaRDD<Row> modifiedRDD = sourceFrame.toJavaRDD().map(new
Function<Row,Row>({
public Row call(Row row) throws Exception {
if(row != null) {
//updated row by creating new Row
return RowFactory.create(updateRow);
}
return null;
});
//now I convert above JavaRDD<Row> into DataFrame using the following
DataFrame modifiedFrame = sqlContext.createDataFrame(modifiedRDD,schema);
sourceFrame and modifiedFrame schema is same when I call sourceFrame.show()
output is expected I see every column has corresponding values and no column
is empty but when I call modifiedFrame.show() I see all the columns values
gets merged into first column value for e.g. assume source DataFrame has 3
column as shown below
_col1 _col2 _col3
ABC 10 DEF
GHI 20 JKL
When I print modifiedFrame which I converted from JavaRDD it shows in the
following order
_col1 _col2 _col3
ABC,10,DEF
GHI,20,JKL
As shown above all the _col1 has all the values and _col2 and _col3 is
empty. I dont know what is wrong I am doing please guide I am new to Spark
thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-created-from-JavaRDD-Row-copies-all-columns-data-into-first-column-tp23961.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]