Hello, I have two HDFS directories each containing multiple avro files. I want to specify these two directories as input. In Hadoop world, one can specify list of comma separated directories. In Spark that does not work.
Logs ==== 15/04/07 21:10:11 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0 15/04/07 21:10:11 INFO spark.SparkContext: Created broadcast 2 from sequenceFile at DataUtil.scala:120 15/04/07 21:10:11 ERROR yarn.ApplicationMaster: User class threw exception: Input path does not exist: hdfs://namenode_host_name:8020/user/dvasthimal/epdatasets_small/exptsession/2015/04/06,/user/dvasthimal/epdatasets_small/exptsession/2015/04/07 org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://namenode_host_name:8020/user/dvasthimal/epdatasets_small/exptsession/2015/04/06,/user/dvasthimal/epdatasets_small/exptsession/2015/04/07 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:320) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:263) ==== Input Code: sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, AvroKeyInputFormat[GenericRecord]](path) Here path is: /user/dvasthimal/epdatasets_small/exptsession/2015/04/06,/user/dvasthimal/epdatasets_small/exptsession/2015/04/07 -- Deepak