[jira] [Closed] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani closed SPARK-10798. --- Resolution: Cannot Reproduce > JsonMappingException with Spark Context Parallelize > --- > > Key: SPARK-10798 > URL: https://issues.apache.org/jira/browse/SPARK-10798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: Linux, Java 1.8.45 >Reporter: Dev Lakhani > > When trying to create an RDD of Rows using a Java Spark Context and if I > serialize the rows with Kryo first, the sparkContext fails. > byte[] data= Kryo.serialize(List) > List fromKryoRows=Kryo.unserialize(data) > List rows= new Vector(); //using a new set of data. > rows.add(RowFactory.create("test")); > javaSparkContext.parallelize(rows); > OR > javaSparkContext.parallelize(fromKryoRows); //using deserialized rows > I get : > com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class > scala.Tuple2) (through reference chain: > org.apache.spark.rdd.RDDOperationScope["parent"]) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) >at > com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) >at > com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) >at > com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) >at > com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) >at > com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) >at > org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >at > org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >at > org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) >... > Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at scala.Option.getOrElse(Option.scala:120) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) >at > com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) >at > com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) >... 19 more > I've tried updating jackson module scala to 2.6.1 but same issue. This > happens in local mode with java 1.8_45. I searched the web and this Jira for > similar issues but found nothing of interest. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055130#comment-15055130 ] Dev Lakhani commented on SPARK-10798: - byte[] data= Kryo.serialize(List) This is just shorthand for new Kryo().serialize(). I think this issue was a classpath issue, I was not able to reproduce it, but if it reappears I will re-open it. > JsonMappingException with Spark Context Parallelize > --- > > Key: SPARK-10798 > URL: https://issues.apache.org/jira/browse/SPARK-10798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: Linux, Java 1.8.45 >Reporter: Dev Lakhani > > When trying to create an RDD of Rows using a Java Spark Context and if I > serialize the rows with Kryo first, the sparkContext fails. > byte[] data= Kryo.serialize(List) > List fromKryoRows=Kryo.unserialize(data) > List rows= new Vector(); //using a new set of data. > rows.add(RowFactory.create("test")); > javaSparkContext.parallelize(rows); > OR > javaSparkContext.parallelize(fromKryoRows); //using deserialized rows > I get : > com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class > scala.Tuple2) (through reference chain: > org.apache.spark.rdd.RDDOperationScope["parent"]) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) >at > com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) >at > com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) >at > com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) >at > com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) >at > com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) >at > org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >at > org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >at > org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) >... > Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at scala.Option.getOrElse(Option.scala:120) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) >at > com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) >at > com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) >... 19 more > I've tried updating jackson module scala to 2.6.1 but same issue. This > happens in local mode with java 1.8_45. I searched the web and this Jira for > similar issues but found nothing of interest. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942651#comment-14942651 ] Dev Lakhani commented on SPARK-10798: - Hi Miao I will create a github project/fork for this to give you the full sample soon. Thanks Dev > JsonMappingException with Spark Context Parallelize > --- > > Key: SPARK-10798 > URL: https://issues.apache.org/jira/browse/SPARK-10798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: Linux, Java 1.8.45 >Reporter: Dev Lakhani > > When trying to create an RDD of Rows using a Java Spark Context and if I > serialize the rows with Kryo first, the sparkContext fails. > byte[] data= Kryo.serialize(List) > List fromKryoRows=Kryo.unserialize(data) > List rows= new Vector(); //using a new set of data. > rows.add(RowFactory.create("test")); > javaSparkContext.parallelize(rows); > OR > javaSparkContext.parallelize(fromKryoRows); //using deserialized rows > I get : > com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class > scala.Tuple2) (through reference chain: > org.apache.spark.rdd.RDDOperationScope["parent"]) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) >at > com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) >at > com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) >at > com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) >at > com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) >at > com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) >at > org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >at > org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >at > org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) >... > Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at scala.Option.getOrElse(Option.scala:120) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) >at > com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) >at > com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) >... 19 more > I've tried updating jackson module scala to 2.6.1 but same issue. This > happens in local mode with java 1.8_45. I searched the web and this Jira for > similar issues but found nothing of interest. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-10798: Environment: Linux, Java 1.8.45 (was: Linux, Java 1.8.40) > JsonMappingException with Spark Context Parallelize > --- > > Key: SPARK-10798 > URL: https://issues.apache.org/jira/browse/SPARK-10798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: Linux, Java 1.8.45 >Reporter: Dev Lakhani > > When trying to create an RDD of Rows using a Java Spark Context and if I > serialize the rows with Kryo first, the sparkContext fails. > byte[] data= Kryo.serialize(List) > List fromKryoRows=Kryo.unserialize(data) > List rows= new Vector(); //using a new set of data. > rows.add(RowFactory.create("test")); > javaSparkContext.parallelize(rows); > OR > javaSparkContext.parallelize(fromKryoRows); //using deserialized rows > I get : > com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class > scala.Tuple2) (through reference chain: > org.apache.spark.rdd.RDDOperationScope["parent"]) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) >at > com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) >at > com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) >at > com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) >at > com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) >at > com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) >at > com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) >at > org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) >at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >at > org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >at > org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) >at > org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) >... > Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) >at scala.Option.getOrElse(Option.scala:120) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) >at > com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) >at > com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) >at > com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) >at > com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) >... 19 more > I've tried updating jackson module scala to 2.6.1 but same issue. This > happens in local mode with java 1.8_45. I searched the web and this Jira for > similar issues but found nothing of interest. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-10798: Description: When trying to create an RDD of Rows using a Java Spark Context and if I serialize the rows with Kryo first, the sparkContext fails. byte[] data= Kryo.serialize(List) List fromKryoRows=Kryo.unserialize(data) List rows= new Vector(); //using a new set of data. rows.add(RowFactory.create("test")); javaSparkContext.parallelize(rows); OR javaSparkContext.parallelize(fromKryoRows); //using deserialized rows I get : com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class scala.Tuple2) (through reference chain: org.apache.spark.rdd.RDDOperationScope["parent"]) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) ... Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) at scala.Option.getOrElse(Option.scala:120) at com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) at com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) at com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) ... 19 more I've tried updating jackson module scala to 2.6.1 but same issue. This happens in local mode with java 1.8_45. I searched the web and this Jira for similar issues but found nothing of interest. was: When trying to create an RDD of Rows using a Java Spark Context: List rows= new Vector(); rows.add(RowFactory.create("test")); javaSparkContext.parallelize(rows); I get : com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class scala.Tuple2) (through reference chain: org.apache.spark.rdd.RDDOperationScope["parent"]) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) at
[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize
[ https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-10798: Description: When trying to create an RDD of Rows using a Java Spark Context: List rows= new Vector(); rows.add(RowFactory.create("test")); javaSparkContext.parallelize(rows); I get : com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class scala.Tuple2) (through reference chain: org.apache.spark.rdd.RDDOperationScope["parent"]) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) ... Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) at scala.Option.getOrElse(Option.scala:120) at com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) at com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) at com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) ... 19 more I've tried updating jackson module scala to 2.6.1 but same issue. This happens in local mode with java 1.8_40. I searched the web and this Jira for similar issues but found nothing of interest. was: When trying to create an RDD of Rows using a Java Spark Context: List rows= new Vector(); rows.add(RowFactory.create("test")); javaSparkContext.parallelize(rows); I get : com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class scala.Tuple2) (through reference chain: org.apache.spark.rdd.RDDOperationScope["parent"]) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) at
[jira] [Created] (SPARK-10798) JsonMappingException with Spark Context Parallelize
Dev Lakhani created SPARK-10798: --- Summary: JsonMappingException with Spark Context Parallelize Key: SPARK-10798 URL: https://issues.apache.org/jira/browse/SPARK-10798 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.0 Environment: Linux, Java 1.8.40 Reporter: Dev Lakhani When trying to create an RDD of Rows using a Java Spark Context: List rows= new Vector(); rows.add(RowFactory.create("test")); javaSparkContext.parallelize(rows); I get : com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class scala.Tuple2) (through reference chain: org.apache.spark.rdd.RDDOperationScope["parent"]) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177) at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338) at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.parallelize(SparkContext.scala:714) at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145) at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157) ... Caused by: scala.MatchError: (None,None) (of class scala.Tuple2) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) at com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32) at scala.Option.getOrElse(Option.scala:120) at com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31) at com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22) at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) at com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) ... 19 more I've tried updating jackson module scala to 2.6.1 but same issue. This happens in local mode with java 1.8_40 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10700) Spark R Documentation not available
[ https://issues.apache.org/jira/browse/SPARK-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-10700: Description: Documentation https://spark.apache.org/docs/latest/api/R/glm.html refered to in https://spark.apache.org/docs/latest/sparkr.html is not available. I searched this JIRA site for sparkr.html SparkR Documentation and do not think any one else has raised this. was: Documentation https://spark.apache.org/docs/latest/sparkr.html is not available. I searched this JIRA site for sparkr.html SparkR Documentation and do not think any one else has raised this. > Spark R Documentation not available > --- > > Key: SPARK-10700 > URL: https://issues.apache.org/jira/browse/SPARK-10700 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Dev Lakhani >Priority: Minor > > Documentation > https://spark.apache.org/docs/latest/api/R/glm.html refered to in > https://spark.apache.org/docs/latest/sparkr.html is not available. > I searched this JIRA site for sparkr.html SparkR Documentation and do not > think any one else has raised this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10700) Spark R Documentation not available
Dev Lakhani created SPARK-10700: --- Summary: Spark R Documentation not available Key: SPARK-10700 URL: https://issues.apache.org/jira/browse/SPARK-10700 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.5.0 Reporter: Dev Lakhani Priority: Minor Documentation https://spark.apache.org/docs/latest/sparkr.html is not available. I searched this JIRA site for sparkr.html SparkR Documentation and do not think any one else has raised this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602669#comment-14602669 ] Dev Lakhani commented on SPARK-1867: [~marcreichman] , [~meiyoula], [~srowen], [~sam] as a minimum isn't it worth us up voting the JDK bug : https://bugs.openjdk.java.net/browse/JDK-7172206 as this seems to be part of the problem? Spark Documentation Error causes java.lang.IllegalStateException: unread block data --- Key: SPARK-1867 URL: https://issues.apache.org/jira/browse/SPARK-1867 Project: Spark Issue Type: Bug Components: Spark Core Reporter: sam I've employed two System Administrators on a contract basis (for quite a bit of money), and both contractors have independently hit the following exception. What we are doing is: 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster I've also included code snippets, and sbt deps at the bottom. When I've Googled this, there seems to be two somewhat vague responses: a) Mismatching spark versions on nodes/user code b) Need to add more jars to the SparkConf Now I know that (b) is not the problem having successfully run the same code on other clusters while only including one jar (it's a fat jar). But I have no idea how to check for (a) - it appears Spark doesn't have any version checks or anything - it would be nice if it checked versions and threw a mismatching version exception: you have user code using version X and node Y has version Z. I would be very grateful for advice on this. The exception: Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException: unread block data [duplicate 59] My code snippet: val conf = new SparkConf() .setMaster(clusterMaster) .setAppName(appName) .setSparkHome(sparkHome) .setJars(SparkContext.jarOfClass(this.getClass)) println(count = + new SparkContext(conf).textFile(someHdfsPath).count()) My SBT dependencies: // relevant org.apache.spark % spark-core_2.10 % 0.9.1, org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0, // standard, probably unrelated com.github.seratch %% awscala % [0.2,), org.scalacheck %% scalacheck % 1.10.1 % test, org.specs2 %% specs2 % 1.14 % test, org.scala-lang % scala-reflect % 2.10.3, org.scalaz %% scalaz-core % 7.0.5, net.minidev % json-smart % 1.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Created] (SPARK-8395) spark-submit documentation is incorrect
Dev Lakhani created SPARK-8395: -- Summary: spark-submit documentation is incorrect Key: SPARK-8395 URL: https://issues.apache.org/jira/browse/SPARK-8395 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.4.0 Reporter: Dev Lakhani Priority: Minor Using a fresh checkout of 1.4.0-bin-hadoop2.6 if you run ./start-slave.sh 1 spark://localhost:7077 you get failed to launch org.apache.spark.deploy.worker.Worker: Default is conf/spark-defaults.conf. 15/06/16 13:11:08 INFO Utils: Shutdown hook called it seems the worker number is not being accepted as desccribed here: https://spark.apache.org/docs/latest/spark-standalone.html The documentation says: ./sbin/start-slave.sh worker# master-spark-URL but the start.slave-sh script states: usage=Usage: start-slave.sh spark-master-URL where spark-master-URL is like spark://localhost:7077 I have checked for similar issues using : https://issues.apache.org/jira/browse/SPARK-6552?jql=text%20~%20%22start-slave%22 and found nothing similar so am raising this as an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8143) Spark application history cannot be found even for finished jobs
[ https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani resolved SPARK-8143. Resolution: Fixed Fix Version/s: 1.4.0 Verified, history for killed jobs are now available in the webui Spark application history cannot be found even for finished jobs Key: SPARK-8143 URL: https://issues.apache.org/jira/browse/SPARK-8143 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0, 1.3.1 Reporter: Dev Lakhani Fix For: 1.4.0 Whenever a job is killed or finished, because of an application error or otherwise and when I then click on Application Detail UI, even through the job state is : FINISHED, I get no log results and the message states: Application history not found for (app-xyz-abc) Application ABC is still in progress. An no logs are presented. I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark under which I see lots of files app-2015xyz-abc.inprogress Even through the job has failed or finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580689#comment-14580689 ] Dev Lakhani commented on SPARK-8142: To clarify [~srowen] 1) I meant the other way around, if we choose to use Apache Spark, which provides Apache Hadoop libs and we then choose a Cloudera Hadoop distribution on our (the rest of our) cluster and use Cloudera Hadoop clients in the application code. Spark will provide Apache Hadoop libs whereas our cluster will be cdh5. Is there any issue in doing this? We choose to use Apache Spark because the CDH is a version behind the official Spark release and we don't want to wait for say Dataframes support. 2) If I mark my spark core as provided right now, as we speak , my code compiles but when I run my application in my IDE using Spark local I get: NoClassFoundError org/apache/spark/api/java/function/Function this is why I am suggesting whether we need maven profiles, one for local testing and one for deployment? So getting back to the issue raised in this JIRA, which we seem to be ignoring, even when Hadoop and Spark is provided and Hbase client/protocol/server is packaged we run into SPARK-1867 which at latest comment suggests a dependency is missing and this results in the obscure exception. Whether this is on the Hadoop side or Spark side is not known but as the JIRA suggests it was caused by a missing dependency. I cannot see this missing class/dependency exception anywhere in the spark logs. This suggests that if anyone using Spark sets any of the userClasspath* misses out a primary, secondary or tertiary dependency they will encounter SPARK-1867. Therefore we are stuck, any suggestions are welcome to overcome this. Either there is a need make ChildFirstURLClassLoader ignore Spark and Hadoop libs or help spark log what's causing SPARK-1867. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580730#comment-14580730 ] Dev Lakhani commented on SPARK-8142: Hi [~vanzin] bq. if you want to use the glassfish jersey version, you shouldn't need to do this, right? Spark depends on the old one that is under com.sun.*, IIRC. Yes I need to make use of glassfish 2.x in my application and not the sun.* one provided, but this could apply to any other dependency that needs to supersede Sparks provided etc. bq. marking all dependencies (including hbase) as provided and using {{spark. {driver,executor}.extraClassPath}} might be the easiest way out if you really need to use userClassPathFirst. This is an option but might be a challenge to scale if we have different folder layouts for the extraClassPath in different clusters/nodes for hbase and hadoop installs. This can be (and usually is) the case when new servers are added to existing ones for example. If we had /disk4/path/to/hbase/libs and the other has /disk3/another/path/to/hbase/libs and so on then the extraClassPath will need to include these both and grow significantly and spark submit args along with it. Also when we update Hbase these then have to change this classpath each time. Maybe the ideal way is to have, as you suggest, a blacklist which would contain spark and hadoop libs. Then we could put whatever we wanted into one uber/fat jar and it doesn't matter where Hbase and Hadoop are installed or what's provided and compiled, but we let spark work it out. These are just my thoughts, I'm sure others will have different preferences and/or better approaches. Thanks anyway for your input on this JIRA. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581130#comment-14581130 ] Dev Lakhani commented on SPARK-1867: Worth a new JIRA to suggest this? Spark Documentation Error causes java.lang.IllegalStateException: unread block data --- Key: SPARK-1867 URL: https://issues.apache.org/jira/browse/SPARK-1867 Project: Spark Issue Type: Bug Components: Spark Core Reporter: sam I've employed two System Administrators on a contract basis (for quite a bit of money), and both contractors have independently hit the following exception. What we are doing is: 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster I've also included code snippets, and sbt deps at the bottom. When I've Googled this, there seems to be two somewhat vague responses: a) Mismatching spark versions on nodes/user code b) Need to add more jars to the SparkConf Now I know that (b) is not the problem having successfully run the same code on other clusters while only including one jar (it's a fat jar). But I have no idea how to check for (a) - it appears Spark doesn't have any version checks or anything - it would be nice if it checked versions and threw a mismatching version exception: you have user code using version X and node Y has version Z. I would be very grateful for advice on this. The exception: Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException: unread block data [duplicate 59] My code snippet: val conf = new SparkConf() .setMaster(clusterMaster) .setAppName(appName) .setSparkHome(sparkHome) .setJars(SparkContext.jarOfClass(this.getClass)) println(count = + new SparkContext(conf).textFile(someHdfsPath).count()) My SBT dependencies: // relevant org.apache.spark % spark-core_2.10 % 0.9.1, org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0, // standard, probably unrelated com.github.seratch %% awscala % [0.2,), org.scalacheck %% scalacheck % 1.10.1 % test, org.specs2 %% specs2 % 1.14 % test, org.scala-lang % scala-reflect % 2.10.3, org.scalaz %% scalaz-core % 7.0.5, net.minidev % json-smart % 1.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580632#comment-14580632 ] Dev Lakhani commented on SPARK-8142: [~vanzin] [~srowen] since this has been verified independently there appears to be a limitation in the ChildFirstURLClassLoader class which may be causing this issue. The approach to mark Spark/Hadoop deps as provided may not be ideal because 1) it requires a maven profile for compilation/testing and deployment 2) If we run into SPARK-1867 there is no easy way to spot missing dependencies. 3) If we are using cdh* versions of Hadoop (client/server) then spark's provided Hadoop versions will differ from the CDH client being used. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578745#comment-14578745 ] Dev Lakhani commented on SPARK-1867: Although this has been marked with not an issue, I agree with [~marcreichman] it is a very misleading error and there's often no way to figure out which classes are missing. There should be an explicit ClassNotFoundException or some other check or warning. Whenever dependencies are missing it needs to be actionable. Spark Documentation Error causes java.lang.IllegalStateException: unread block data --- Key: SPARK-1867 URL: https://issues.apache.org/jira/browse/SPARK-1867 Project: Spark Issue Type: Bug Components: Spark Core Reporter: sam I've employed two System Administrators on a contract basis (for quite a bit of money), and both contractors have independently hit the following exception. What we are doing is: 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster I've also included code snippets, and sbt deps at the bottom. When I've Googled this, there seems to be two somewhat vague responses: a) Mismatching spark versions on nodes/user code b) Need to add more jars to the SparkConf Now I know that (b) is not the problem having successfully run the same code on other clusters while only including one jar (it's a fat jar). But I have no idea how to check for (a) - it appears Spark doesn't have any version checks or anything - it would be nice if it checked versions and threw a mismatching version exception: you have user code using version X and node Y has version Z. I would be very grateful for advice on this. The exception: Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException: unread block data [duplicate 59] My code snippet: val conf = new SparkConf() .setMaster(clusterMaster) .setAppName(appName) .setSparkHome(sparkHome) .setJars(SparkContext.jarOfClass(this.getClass)) println(count = + new SparkContext(conf).textFile(someHdfsPath).count()) My SBT dependencies: // relevant org.apache.spark % spark-core_2.10 % 0.9.1, org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0, // standard, probably unrelated com.github.seratch %% awscala % [0.2,), org.scalacheck %% scalacheck % 1.10.1 % test, org.specs2 %% specs2 % 1.14 % test, org.scala-lang % scala-reflect % 2.10.3, org.scalaz %% scalaz-core % 7.0.5, net.minidev % json-smart % 1.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578916#comment-14578916 ] Dev Lakhani commented on SPARK-8142: Any suggestion on this? To summarise, Spark and Hadoop marked as provided. Now running into https://issues.apache.org/jira/browse/SPARK-1867, some missing dependency is causing this and there is no indication what it is. This is becoming a blocker for our organisation. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576816#comment-14576816 ] Dev Lakhani commented on SPARK-8142: Also I can confirm, without the user userClassPathFirst, the job runs but fails at a different point where my Jersey version clashes with my application code. So it seems to be an issue with the userClassPathFirst setting, when it is set I get the ClassCastException. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577230#comment-14577230 ] Dev Lakhani commented on SPARK-8142: I thought it was only spark deps but I've now removed all HBase and Hadoop deps form my uber jar. Now when I run the job I cannot locate all the relevant hbase client classes + deps without specifying each hbase client/server/protocol etc jar using --driver-class-path. Is there some spark env variable I can set to point to all jars under a folder otherwise I'll have to add all 20+ Hbase libs using the driver class path option? I know about SPARK_CLASSPATH but need a more elegant solution than to reference all hbase and hadoop jars myself. HADOOP_HOME, HADOOP_CLASSPATH and HADOOP_CONF_DIR are already set. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577262#comment-14577262 ] Dev Lakhani commented on SPARK-8142: HBase kept, Hadoop and Spark removed. Now I get: ClasscastException org.apache.hadoop.hbase.mapreduce.TableSplit cannot be case to org.apache.hadoop.hbase.mapreduce.InputSplit at NewHadoopRDD.scala:115. This used to work when I had all spark and hadoop dependencies added in the uber jar. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577265#comment-14577265 ] Dev Lakhani commented on SPARK-8142: To be specific when I say removed I mean marked as provided Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577108#comment-14577108 ] Dev Lakhani commented on SPARK-8142: I am not bundling spark, again, I have pre-built binaries downloaded from the spark website, deployed on my cluster: version 1.3.1 hadoop 2.6. I am using java maven client dependencies for org.apache.spark spark core_2.10 version 1.3.1 in my application- added as a maven dependency. I package my application using mvn shade, this builds me a jar with my applications code, glassfish jersey 2.7 deps and spark 1.3.1 core deps. I then spark-submit my job jar using spark submit (from spark 1.3.1) with my jar and the classcast error occurs if I have the userClasspath set to true as described above. If I don't my REST client tries to do a get operation and uses the glassfish jersey 2.7 api but this conflicts with com.sun.jersey 1.9 which comes with Spark. I am using 1 version of spark 1.3.1-hadoop.2.6 on my cluster, I have no other versions of spark on that cluster. The RELEASE file states 1.3.1 (git revision 908a0bf) build for Hadoop 2.6.0. Confirmed on all nodes. I did a maven dependency tree on my application code and the only spark version is 1.3.1 in all of the maven dependencies that I use. Spark SQL 1.3.1, Spark Core 1.3.1, Spark Network Common 1.3.1. I've been using this version fine for all other non REST based operations and other spark operations, its only when I use userClasspath I get this error. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8143) Spark application history cannot be found even for finished jobs
[ https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577130#comment-14577130 ] Dev Lakhani commented on SPARK-8143: I cannot try it against master because we are restricted to official releases only, we are not allowed external git access due to organisational constraints. If you also have an expectation for users to continually build from master to verify the existence of bugs and JIRAs perhaps you need to mandate that process in: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-JIRA to be explicit as different projects have different rules. In this case, I accept that I did not research this issue before posting it, if that it the case close this as a duplicate, since I cannot verify against master. Spark application history cannot be found even for finished jobs Key: SPARK-8143 URL: https://issues.apache.org/jira/browse/SPARK-8143 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0, 1.3.1 Reporter: Dev Lakhani Whenever a job is killed or finished, because of an application error or otherwise and when I then click on Application Detail UI, even through the job state is : FINISHED, I get no log results and the message states: Application history not found for (app-xyz-abc) Application ABC is still in progress. An no logs are presented. I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark under which I see lots of files app-2015xyz-abc.inprogress Even through the job has failed or finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577155#comment-14577155 ] Dev Lakhani commented on SPARK-8142: spark-core and spark-sql marked as provided, same error. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577371#comment-14577371 ] Dev Lakhani commented on SPARK-8142: Update: having resolved some dependencies issues the current state is this: hadoop-common 2.6.0 - provided hadoop-client 2.6.0 provided hadoop -hdfs 2.6.0 provided spark-sql_s.10 provided spark-core_2.10 provided hbase-client 1.1.0 included.packaged hbase -protocol 1.1.0 included/packaged hbase -server 1.1.0 included/packaged I run the job and run into this: https://issues.apache.org/jira/browse/SPARK-1867 which suggests a class is missing, how do I find which one? There is ClassNotFoundException exception but something might be missing, how can I find this out? Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask Class Cast Exception
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-8142: --- Summary: Spark Job Fails with ResultTask Class Cast Exception (was: Spark Job Fails with ResultTask Class Exception) Spark Job Fails with ResultTask Class Cast Exception Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8142) Spark Job Fails with ResultTask Class Exception
Dev Lakhani created SPARK-8142: -- Summary: Spark Job Fails with ResultTask Class Exception Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8143) Spark application history cannot be found even for finished jobs
Dev Lakhani created SPARK-8143: -- Summary: Spark application history cannot be found even for finished jobs Key: SPARK-8143 URL: https://issues.apache.org/jira/browse/SPARK-8143 Project: Spark Issue Type: Bug Affects Versions: 1.3.1, 1.3.0 Reporter: Dev Lakhani Whenever a job is killed or finished, because of an application error or otherwise and when I then click on Application Detail UI, even through the job state is : FINISHED, I get no log results and the message states: Application history not found for (app-xyz-abc) Application ABC is still in progress. An no logs are presented. I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/sparkunder which I see lots of files app-2015xyz-abc.inprogress Even through the job has failed or finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575958#comment-14575958 ] Dev Lakhani commented on SPARK-8142: I'm using pre-compiled binaries for spark 1.3.1 hadoop 2.6. and checked the spark versions with my application. As I mention, I need to use my classpath not sparks hence the setting. If I don't spark makes use of jersey.version1.9/jersey.version from https://github.com/apache/spark/blob/master/yarn/pom.xml is not compatible with my application code. I know the userClasspath settings are experimental but they are required for my use case. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8143) Spark application history cannot be found even for finished jobs
[ https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575959#comment-14575959 ] Dev Lakhani commented on SPARK-8143: Ok will wait for the official 1.4 release and will confirm. Spark application history cannot be found even for finished jobs Key: SPARK-8143 URL: https://issues.apache.org/jira/browse/SPARK-8143 Project: Spark Issue Type: Bug Affects Versions: 1.3.0, 1.3.1 Reporter: Dev Lakhani Whenever a job is killed or finished, because of an application error or otherwise and when I then click on Application Detail UI, even through the job state is : FINISHED, I get no log results and the message states: Application history not found for (app-xyz-abc) Application ABC is still in progress. An no logs are presented. I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark under which I see lots of files app-2015xyz-abc.inprogress Even through the job has failed or finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask Class Cast Exception
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-8142: --- Description: When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. was: When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3 hadoop 2.6. Spark Job Fails with ResultTask Class Cast Exception Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-8142: --- Summary: Spark Job Fails with ResultTask ClassCastException (was: Spark Job Fails with ResultTask Class Cast Exception) Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-8142: --- Description: When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. was: When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job I run create an RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. Spark Job Fails with ResultTask ClassCastException -- Key: SPARK-8142 URL: https://issues.apache.org/jira/browse/SPARK-8142 Project: Spark Issue Type: Bug Affects Versions: 1.3.1 Reporter: Dev Lakhani When running a Spark Job, I get no failures in the application code whatsoever but a weird ResultTask Class exception. In my job, I create a RDD from HBase and for each partition do a REST call on an API, using a REST client. This has worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are the configs I set to override the spark classpath because I want to use my own glassfish jersey version: sparkConf.set(spark.driver.userClassPathFirst,true); sparkConf.set(spark.executor.userClassPathFirst,true); I see no other warnings or errors in any of the logs. Unfortunately I cannot post my code, but please ask me questions that will help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6846) Stage kill URL easy to accidentally trigger and possibility for security issue.
[ https://issues.apache.org/jira/browse/SPARK-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543970#comment-14543970 ] Dev Lakhani commented on SPARK-6846: As a more complex solution would it be possible to have a unique stage id and use that in the URL? http://localhost:4040/stages/kill/?id=0stage-id=UNIQUE-STAGE-IDterminate=true A simple auto complete of a previous kill command in chrome followed by an enter can kill hours worth of work done. Or any other ideas? Stage kill URL easy to accidentally trigger and possibility for security issue. --- Key: SPARK-6846 URL: https://issues.apache.org/jira/browse/SPARK-6846 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.3.0 Reporter: Dev Lakhani Assignee: Sean Owen Priority: Minor Fix For: 1.4.0 On a similar note: When the kill link is cached in the browser bar, it's easy to accidentally kill a job just by pressing enter. For example: You press the kill stage button and get the prompt whether you want to kill the stage. You launch a new job and start typing: http://localhost:4040/ Chrome for example starts auto completing with http://localhost:4040/stages/kill/?id=0terminate=true If you accidentally press enter it will kill the current stage without any prompts. I think its also a bit of a security issue if from any host you can curl/wget/issue: http://localhost:4040/stages/kill/?id=0terminate=true and it will kill the current stage without prompting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6846) Stage kill URL easy to accidentally trigger and possibility for security issue.
[ https://issues.apache.org/jira/browse/SPARK-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494012#comment-14494012 ] Dev Lakhani commented on SPARK-6846: [~srowen] please go ahead, I wont have time for this this week. Stage kill URL easy to accidentally trigger and possibility for security issue. --- Key: SPARK-6846 URL: https://issues.apache.org/jira/browse/SPARK-6846 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.3.0 Reporter: Dev Lakhani Priority: Minor On a similar note: When the kill link is cached in the browser bar, it's easy to accidentally kill a job just by pressing enter. For example: You press the kill stage button and get the prompt whether you want to kill the stage. You launch a new job and start typing: http://localhost:4040/ Chrome for example starts auto completing with http://localhost:4040/stages/kill/?id=0terminate=true If you accidentally press enter it will kill the current stage without any prompts. I think its also a bit of a security issue if from any host you can curl/wget/issue: http://localhost:4040/stages/kill/?id=0terminate=true and it will kill the current stage without prompting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5273) Improve documentation examples for LinearRegression
[ https://issues.apache.org/jira/browse/SPARK-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-5273: --- Affects Version/s: (was: 1.2.0) Improve documentation examples for LinearRegression Key: SPARK-5273 URL: https://issues.apache.org/jira/browse/SPARK-5273 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Dev Lakhani Priority: Minor In the document: https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html Under Linear least squares, Lasso, and ridge regression The suggested method to use LinearRegressionWithSGD.train() // Building the model val numIterations = 100 val model = LinearRegressionWithSGD.train(parsedData, numIterations) is not ideal even for simple examples such as y=x. This should be replaced with more real world parameters with step size: val lr = new LinearRegressionWithSGD() lr.optimizer.setStepSize(0.0001) lr.optimizer.setNumIterations(100) or LinearRegressionWithSGD.train(input,100,0.0001) To create a reasonable MSE. It took me a while using the dev forum to learn that the step size should be really small. Might help save someone the same effort when learning mllib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5273) Improve documentation examples for LinearRegression
[ https://issues.apache.org/jira/browse/SPARK-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dev Lakhani updated SPARK-5273: --- Affects Version/s: 1.2.0 Improve documentation examples for LinearRegression Key: SPARK-5273 URL: https://issues.apache.org/jira/browse/SPARK-5273 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Dev Lakhani Priority: Minor In the document: https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html Under Linear least squares, Lasso, and ridge regression The suggested method to use LinearRegressionWithSGD.train() // Building the model val numIterations = 100 val model = LinearRegressionWithSGD.train(parsedData, numIterations) is not ideal even for simple examples such as y=x. This should be replaced with more real world parameters with step size: val lr = new LinearRegressionWithSGD() lr.optimizer.setStepSize(0.0001) lr.optimizer.setNumIterations(100) or LinearRegressionWithSGD.train(input,100,0.0001) To create a reasonable MSE. It took me a while using the dev forum to learn that the step size should be really small. Might help save someone the same effort when learning mllib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5273) Improve documentation examples for LinearRegression
Dev Lakhani created SPARK-5273: -- Summary: Improve documentation examples for LinearRegression Key: SPARK-5273 URL: https://issues.apache.org/jira/browse/SPARK-5273 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Dev Lakhani Priority: Minor In the document: https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html Under Linear least squares, Lasso, and ridge regression The suggested method to use LinearRegressionWithSGD.train() // Building the model val numIterations = 100 val model = LinearRegressionWithSGD.train(parsedData, numIterations) is not ideal even for simple examples such as y=x. This should be replaced with more real world parameters with step size: val lr = new LinearRegressionWithSGD() lr.optimizer.setStepSize(0.0001) lr.optimizer.setNumIterations(100) or LinearRegressionWithSGD.train(input,100,0.0001) To create a reasonable MSE. It took me a while using the dev forum to learn that the step size should be really small. Might help save someone the same effort when learning mllib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-576) Design and develop a more precise progress estimator
[ https://issues.apache.org/jira/browse/SPARK-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175023#comment-14175023 ] Dev Lakhani commented on SPARK-576: --- I've created a PR for this: https://github.com/apache/spark/pull/2837/ Design and develop a more precise progress estimator Key: SPARK-576 URL: https://issues.apache.org/jira/browse/SPARK-576 Project: Spark Issue Type: Improvement Reporter: Mosharaf Chowdhury In addition to task_completed/total_tasks, we need to have something that says estimated_time_remaining. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API
[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173619#comment-14173619 ] Dev Lakhani commented on SPARK-2321: There are some issues and bugs under the webui component that are active. Should we incorporate these into this Jira or is it best to work on them separately and then merge these (2321) changes later? https://issues.apache.org/jira/browse/SPARK/component/12322616 Design a proper progress reporting event listener API --- Key: SPARK-2321 URL: https://issues.apache.org/jira/browse/SPARK-2321 Project: Spark Issue Type: Improvement Components: Java API, Spark Core Affects Versions: 1.0.0 Reporter: Reynold Xin Assignee: Josh Rosen Priority: Critical This is a ticket to track progress on redesigning the SparkListener and JobProgressListener API. There are multiple problems with the current design, including: 0. I'm not sure if the API is usable in Java (there are at least some enums we used in Scala and a bunch of case classes that might complicate things). 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of attention to it yet. Something as important as progress reporting deserves a more stable API. 2. There is no easy way to connect jobs with stages. Similarly, there is no easy way to connect job groups with jobs / stages. 3. JobProgressListener itself has no encapsulation at all. States can be arbitrarily mutated by external programs. Variable names are sort of randomly decided and inconsistent. We should just revisit these and propose a new, concrete design. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173926#comment-14173926 ] Dev Lakhani commented on SPARK-3957: Here is my thoughts on a possible approach. Hi All The broadcast occurs form the Spark Context to the broadcastmanager and new Broadcast method. In the first instance, the broadcasted data is stored in the Block Manager (see HttpBroadCast) of the executor. Any tracking of broadcast variables must be referenced by the BlockManagerSlaveActor and BlockManagerMasterActor. In particular UpdateBlockInfo and RemoveBroadcast should update the total memory in blocks used when blocks are added and removed. These can then be hooked up to the UI using a new Page like ExecutorsPage and defining a new methods in the relevant listener such as StorageStatusListener. These are my initial thoughts for someone new to these components, any other ideas or approaches? Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI
[ https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174107#comment-14174107 ] Dev Lakhani commented on SPARK-3957: Hi For now I am happy for [~CodingCat] to take this on, maybe once there are some commits I can help with the UI side, but for now I'll hold back. Broadcast variable memory usage not reflected in UI --- Key: SPARK-3957 URL: https://issues.apache.org/jira/browse/SPARK-3957 Project: Spark Issue Type: Bug Components: Block Manager, Web UI Affects Versions: 1.0.2, 1.1.0 Reporter: Shivaram Venkataraman Assignee: Nan Zhu Memory used by broadcast variables are not reflected in the memory usage reported in the WebUI. For example, the executors tab shows memory used in each executor but this number doesn't include memory used by broadcast variables. Similarly the storage tab only shows list of rdds cached and how much memory they use. We should add a separate column / tab for broadcast variables to make it easier to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)
[ https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165293#comment-14165293 ] Dev Lakhani commented on SPARK-3644: Hi I am doing some work on the REST/JSON aspects and will be happy to take this on. Can someone assign it to me and/or help me get started? We need to first draft out the various endpoints and document them somewhere. Thanks Dev REST API for Spark application info (jobs / stages / tasks / storage info) -- Key: SPARK-3644 URL: https://issues.apache.org/jira/browse/SPARK-3644 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Reporter: Josh Rosen This JIRA is a forum to draft a design proposal for a REST interface for accessing information about Spark applications, such as job / stage / task / storage status. There have been a number of proposals to serve JSON representations of the information displayed in Spark's web UI. Given that we might redesign the pages of the web UI (and possibly re-implement the UI as a client of a REST API), the API endpoints and their responses should be independent of what we choose to display on particular web UI pages / layouts. Let's start a discussion of what a good REST API would look like from first-principles. We can discuss what urls / endpoints expose access to data, how our JSON responses will be formatted, how fields will be named, how the API will be documented and tested, etc. Some links for inspiration: https://developer.github.com/v3/ http://developer.netflix.com/docs/REST_API_Reference https://helloreverb.com/developers/swagger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org