Re: Null pointer exception with RDD while computing a method, creating dataframe.

2016-12-21 Thread Liang-Chi Hsieh
Your sample codes first select distinct zipcodes, and then save the rows of each distinct zipcode into a parquet file. So I think you can simply partition your data by using `DataFrameWriter.partitionBy` API, e.g., df.repartition("zip_code").write.partitionBy("zip_code").parquet(.) -

Re: Null pointer exception with RDD while computing a method, creating dataframe.

2016-12-21 Thread satyajit vegesna
Hi, val df = spark.read.parquet() df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data = spark.sql(zipval) data.write.parquet(

Re: Null pointer exception with RDD while computing a method, creating dataframe.

2016-12-20 Thread Liang-Chi Hsieh
Hi, You can't invoke any RDD actions/transformations inside another transformations. They must be invoked by the driver. If I understand your purpose correctly, you can partition your data (i.e., `partitionBy`) when writing out to parquet files. - Liang-Chi Hsieh | @viirya Spark Technolo