Hi V, I am assuming that each of the three .parquet paths you mentioned have multiple partitions in them.
For eg: [/dataset/city=London/data.parquet/part-r-0.parquet, /dataset/city=London/data.parquet/part-r-1.parquet] I haven't personally used this with "hdfs", but I've worked with a similar file strucutre with '=' in "S3". And how i get around this is by building a string of all the filepaths seperated by commas (with NO spaces inbetween). Then I use that string as the filepath parameter. I think the following adaptation of S3 file access pattern to HDFS would work If I want to load 1 file: sqlcontext.parquetFile( "hdfs://some ip:8029/dataset/city=London/data.parquet") If I want to load multiple files (lets say all 3 of them): sqlcontext.parquetFile( "hdfs://some ip:8029/dataset/city=London/data.parquet,hdfs://some ip:8029/dataset/city=NewYork/data.parquet,hdfs://some ip:8029/dataset/city=Paris/data.parquet") *** But in the multiple file scenario, the schema of all the files should be the same I hope you can use this S3 pattern with HDFS and hope it works !! Thanks in4 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-1-and-Parquet-Partitions-tp22792p22801.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org