Null pointer exception reading Parquet

2015-03-18 Thread sprookie
Hi All,

I am using Saprk version 1.2 running locally. When I try to read a paquet
file I get below exception, what might be the issue?
Any help will be appreciated. This is the simplest operation/action on a
parquet file.


//code snippet//


  val sparkConf = new SparkConf().setAppName(
Testing).setMaster(local[10])
  val sc = new SparkContext(sparkConf)
  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
  sqlContext.setConf(spark.sql.parquet.binaryAsString,true)

  import sqlContext._
  val temp = local path to file
  val temp2 =  sqlContext.parquetFile(temp)

temp2.printSchema


//end code snippet



//Exception trace

Exception in thread main java.lang.NullPointerException
 at
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
 at
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
 at
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
 at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
 at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:389)
 at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$3.apply(ParquetTypes.scala:457)
 at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$3.apply(ParquetTypes.scala:457)
 at scala.Option.map(Option.scala:145)
 at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:457)
 at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
 at
org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:65)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165)

//End Exception trace




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Null-pointer-exception-reading-Parquet-tp22124.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

LeftOuter Join issue

2015-01-27 Thread sprookie
I have about 15 -20 joins to perform. Each of these tables are in the order
of 6 million to 66 million rows. The number of columns range from 20 are
400.

I read the parquet files and obtain schemaRDDs.
Then use join functionality on 2 SchemaRDDs.
I join the previous join results with the next schemaRDD.

Any ideas how to deal with such join intensive spark SQL process?
Any advise how to handle joins in better ways?

I will appreciate all the inputs.

Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LeftOuter-Join-issue-tp21398.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.