Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-04 Thread Yanbo Liang
I have created SPARK-12146 to track this issue.

2015-12-04 9:16 GMT+08:00 Felix Cheung :

> It looks like this has been broken around Spark 1.5.
>
> Please see JIRA SPARK-10185. This has been fixed in pyspark but
> unfortunately SparkR was missed. I have confirmed this is still broken in
> Spark 1.6.
>
> Could you please open a JIRA?
>
>
>
>
>
> On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3" <
> tomas.rodrig...@transvoyant.com> wrote:
>
> Hello,
>
> I believe to have encountered a bug with Spark 1.5.2. I am using RStudio
> and
> SparkR to read in JSON files with jsonFile(sqlContext, "path"). If "path"
> is
> a single path (e.g., "/path/to/dir0"), then it works fine;
>
> but, when "path" is a vector of paths (e.g.
>
> path <- c("/path/to/dir1","/path/to/dir2"), then I get the following error
> message:
>
> > raw.terror<-jsonFile(sqlContext,path)
> 15/12/03 15:59:55 ERROR RBackendHandler: jsonFile on 1 failed
> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
>   java.io.IOException: No input paths specified in job
> at
>
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)
> at
>
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
> at
> org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
> at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at
>
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at
>
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2
>
> Note that passing a vector of paths in Spark-1.4.1 works just fine. Any
> help
> is greatly appreciated if this is not a bug and perhaps an environment or
> different issue.
>
> Best,
> T
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-in-Spark-1-5-2-jsonFile-Bug-Found-tp25560.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-03 Thread Felix Cheung
It looks like this has been broken around Spark 1.5.
Please see JIRA SPARK-10185. This has been fixed in pyspark but unfortunately 
SparkR was missed. I have confirmed this is still broken in Spark 1.6.
Could you please open a JIRA?






On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3" 
 wrote:





Hello,

I believe to have encountered a bug with Spark 1.5.2. I am using RStudio and
SparkR to read in JSON files with jsonFile(sqlContext, "path"). If "path" is
a single path (e.g., "/path/to/dir0"), then it works fine;

but, when "path" is a vector of paths (e.g.

path <- c("/path/to/dir1","/path/to/dir2"), then I get the following error
message:

> raw.terror<-jsonFile(sqlContext,path)
15/12/03 15:59:55 ERROR RBackendHandler: jsonFile on 1 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  java.io.IOException: No input paths specified in job
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2

Note that passing a vector of paths in Spark-1.4.1 works just fine. Any help
is greatly appreciated if this is not a bug and perhaps an environment or
different issue.

Best,
T



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-in-Spark-1-5-2-jsonFile-Bug-Found-tp25560.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org