Looks like the parallelization into RDD was the right move I was omitting, JavaRDD<Row> jsonRDD = new JavaSparkContext(sparkSession. sparkContext()).parallelize(results);
then I created a schema as List<StructField> fields = new ArrayList<StructField>(); fields.add(DataTypes.createStructField("column_name1", DataTypes.StringType, true)); fields.add.... StructType schema = DataTypes.createStructType(fields); and then just, voilà! Have my dataset withou any nullpointers exceptions :) Dataset<Row> resultDataset = spark.createDataFrame(rdd, schema); Thanks a lot!! Have a nice day, Karin On Wed, Mar 29, 2017 at 4:17 AM, Richard Xin <richardxin...@yahoo.com> wrote: > Maybe you could try something like that: > SparkSession sparkSession = SparkSession > .builder() > .appName("Rows2DataSet") > .master("local") > .getOrCreate(); > List<Row> results = new LinkedList<Row>(); > JavaRDD<Row> jsonRDD = > new JavaSparkContext(sparkSession. > sparkContext()).parallelize(results); > > Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD, > Row.class); > > Richard Xin > > > On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <ka...@datapine.com> > wrote: > > > Hello! > > I am running Spark on Java and bumped into a problem I can't solve or find > anything helpful among answered questions, so I would really appreciate > your help. > > I am running some calculations, creating rows for each result: > > List<Row> results = new LinkedList<Row>(); > > for(something){ > results.add(RowFactory.create( someStringVariable, someIntegerVariable )); > } > > Now I ended up with a list of rows I need to turn into dataframe to > perform some spark sql operations on them, like groupings and sorting. > Would like to keep the dataTypes. > > I tried: > > Dataset<Row> toShow = spark.createDataFrame(results, Row.class); > > but it throws nullpointer. (spark being SparkSession) Is my logic wrong > there somewhere, should this operation be possible, resulting in what I > want? > Or do I have to create a custom class which extends serializable and > create a list of those objects rather than Rows? Will I be able to perform > SQL queries on dataset consisting of custom class objects rather than rows? > > I'm sorry if this is a duplicate question. > Thank you for your help! > Karin > > > -- datapine GmbH Skalitzer Straße 33 10999 Berlin email: ka...@datapine.com