Can you have a dataframe with a column which stores json (type string)? Or you can also have a column of array type in which you store all cities matching your query.
On Fri, Dec 28, 2018 at 2:48 AM <em...@yeikel.com> wrote: > Hi community , > > > > As shown in other answers online , Spark does not support the nesting of > DataFrames , but what are the options? > > > > I have the following scenario : > > > > dataFrame1 = List of Cities > > > > dataFrame2 = Created after searching in ElasticSearch for each city in > dataFrame1 > > > > I've tried : > > > > val cities = sc.parallelize(Seq("New York")).toDF() > > cities.foreach(r => { > > val companyName = r.getString(0) > > println(companyName) > > val dfs = sqlContext.esDF("cities/docs", "?q=" + companyName) > //returns a DataFrame consisting of all the cities matching the entry in > cities > > }) > > > > Which triggers the expected null pointer exception > > > > java.lang.NullPointerException > > at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:53) > > at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:51) > > at > org.elasticsearch.spark.sql.package$SQLContextFunctions.esDF(package.scala:37) > > at Main$$anonfun$main$1.apply(Main.scala:43) > > at Main$$anonfun$main$1.apply(Main.scala:39) > > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > > at > org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:921) > > at > org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:921) > > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) > > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > > at org.apache.spark.scheduler.Task.run(Task.scala:109) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > 2018-12-28 02:01:00 ERROR TaskSetManager:70 - Task 7 in stage 0.0 failed 1 > times; aborting job > > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due to stage failure: Task 7 in stage 0.0 failed 1 times, most recent > failure: Lost task 7.0 in stage 0.0 (TID 7, localhost, executor driver): > java.lang.NullPointerException > > > > What options do I have? > > > > Thank you. >