Looks like the parallelization into RDD was the right move I was omitting,

JavaRDD<Row> jsonRDD = new JavaSparkContext(sparkSession.
sparkContext()).parallelize(results);

then I created a schema as

List<StructField> fields  = new ArrayList<StructField>();
fields.add(DataTypes.createStructField("column_name1",
DataTypes.StringType, true));
fields.add....
StructType schema = DataTypes.createStructType(fields);

and then just, voilà! Have my dataset withou any nullpointers exceptions :)

Dataset<Row> resultDataset = spark.createDataFrame(rdd, schema);

Thanks a lot!!
Have a nice day,
Karin

On Wed, Mar 29, 2017 at 4:17 AM, Richard Xin <richardxin...@yahoo.com>
wrote:

> Maybe you could try something like that:
>         SparkSession sparkSession = SparkSession
>                 .builder()
>                 .appName("Rows2DataSet")
>                 .master("local")
>                 .getOrCreate();
>         List<Row> results = new LinkedList<Row>();
>         JavaRDD<Row> jsonRDD =
>                 new JavaSparkContext(sparkSession.
> sparkContext()).parallelize(results);
>
>         Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD,
> Row.class);
>
> Richard Xin
>
>
> On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <ka...@datapine.com>
> wrote:
>
>
> Hello!
>
> I am running Spark on Java and bumped into a problem I can't solve or find
> anything helpful among answered questions, so I would really appreciate
> your help.
>
> I am running some calculations, creating rows for each result:
>
> List<Row> results = new LinkedList<Row>();
>
> for(something){
> results.add(RowFactory.create( someStringVariable, someIntegerVariable ));
>          }
>
> Now I ended up with a list of rows I need to turn into dataframe to
> perform some spark sql operations on them, like groupings and sorting.
> Would like to keep the dataTypes.
>
> I tried:
>
> Dataset<Row> toShow = spark.createDataFrame(results, Row.class);
>
> but it throws nullpointer. (spark being SparkSession) Is my logic wrong
> there somewhere, should this operation be possible, resulting in what I
> want?
> Or do I have to create a custom class which extends serializable and
> create a list of those objects rather than Rows? Will I be able to perform
> SQL queries on dataset consisting of custom class objects rather than rows?
>
> I'm sorry if this is a duplicate question.
> Thank you for your help!
> Karin
>
>
>


-- 

datapine GmbH
Skalitzer Straße 33
10999 Berlin

email: ka...@datapine.com

Reply via email to