from:"Dev Lakhani $JIRA$"

[jira] [Closed] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-12-13 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani closed SPARK-10798.
---
Resolution: Cannot Reproduce

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-12-13 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055130#comment-15055130
 ] 

Dev Lakhani commented on SPARK-10798:
-

byte[] data= Kryo.serialize(List)

This is just shorthand for new Kryo().serialize(). I think this issue was a 
classpath issue, I was not able to reproduce it, but if it reappears I will 
re-open it.

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-10-04 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942651#comment-14942651
 ] 

Dev Lakhani commented on SPARK-10798:
-

Hi Miao

I will create a github project/fork for this to give you the full sample soon.

Thanks
Dev

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-26 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10798:

Environment: Linux, Java 1.8.45  (was: Linux, Java 1.8.40)

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-26 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10798:

Description: 
When trying to create an RDD of Rows using a Java Spark Context and if I 
serialize the rows with Kryo first, the sparkContext fails.

byte[] data= Kryo.serialize(List)
List fromKryoRows=Kryo.unserialize(data)

List rows= new Vector(); //using a new set of data.
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

OR

javaSparkContext.parallelize(fromKryoRows); //using deserialized rows

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at 
org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
   at 
org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
   ...
Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at scala.Option.getOrElse(Option.scala:120)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
   at 
com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
   at 
com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
   ... 19 more

I've tried updating jackson module scala to 2.6.1 but same issue. This happens 
in local mode with java 1.8_45. I searched the web and this Jira for similar 
issues but found nothing of interest.
 

  was:
When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at

[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-24 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10798:

Description: 
When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at 
org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
   at 
org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
   ...
Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at scala.Option.getOrElse(Option.scala:120)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
   at 
com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
   at 
com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
   ... 19 more

I've tried updating jackson module scala to 2.6.1 but same issue. This happens 
in local mode with java 1.8_40. I searched the web and this Jira for similar 
issues but found nothing of interest.
 

  was:
When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at

[jira] [Created] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-24 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-10798:
---

 Summary: JsonMappingException with Spark Context Parallelize
 Key: SPARK-10798
 URL: https://issues.apache.org/jira/browse/SPARK-10798
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
 Environment: Linux, Java 1.8.40
Reporter: Dev Lakhani


When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at 
org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
   at 
org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
   ...
Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at scala.Option.getOrElse(Option.scala:120)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
   at 
com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
   at 
com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
   ... 19 more

I've tried updating jackson module scala to 2.6.1 but same issue. This happens 
in local mode with java 1.8_40
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10700) Spark R Documentation not available

2015-09-18 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10700:

Description: 
Documentation

https://spark.apache.org/docs/latest/api/R/glm.html refered to in

 https://spark.apache.org/docs/latest/sparkr.html  is not available.

I searched this JIRA site for sparkr.html SparkR Documentation and do not think 
any one else has raised this.

  was:
Documentation https://spark.apache.org/docs/latest/sparkr.html  is not 
available.

I searched this JIRA site for sparkr.html SparkR Documentation and do not think 
any one else has raised this.


> Spark R Documentation not available
> ---
>
> Key: SPARK-10700
> URL: https://issues.apache.org/jira/browse/SPARK-10700
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Dev Lakhani
>Priority: Minor
>
> Documentation
> https://spark.apache.org/docs/latest/api/R/glm.html refered to in
>  https://spark.apache.org/docs/latest/sparkr.html  is not available.
> I searched this JIRA site for sparkr.html SparkR Documentation and do not 
> think any one else has raised this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10700) Spark R Documentation not available

2015-09-18 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-10700:
---

 Summary: Spark R Documentation not available
 Key: SPARK-10700
 URL: https://issues.apache.org/jira/browse/SPARK-10700
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.5.0
Reporter: Dev Lakhani
Priority: Minor


Documentation https://spark.apache.org/docs/latest/sparkr.html  is not 
available.

I searched this JIRA site for sparkr.html SparkR Documentation and do not think 
any one else has raised this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-06-26 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602669#comment-14602669
 ] 

Dev Lakhani commented on SPARK-1867:


[~marcreichman] , [~meiyoula], [~srowen], [~sam] as a minimum isn't it worth us 
up voting the JDK bug : https://bugs.openjdk.java.net/browse/JDK-7172206  as 
this seems to be part of the problem?

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail:

[jira] [Created] (SPARK-8395) spark-submit documentation is incorrect

2015-06-16 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-8395:
--

 Summary: spark-submit documentation is incorrect
 Key: SPARK-8395
 URL: https://issues.apache.org/jira/browse/SPARK-8395
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Dev Lakhani
Priority: Minor


Using a fresh checkout of 1.4.0-bin-hadoop2.6

if you run 
./start-slave.sh  1 spark://localhost:7077

you get
failed to launch org.apache.spark.deploy.worker.Worker:
 Default is conf/spark-defaults.conf.
  15/06/16 13:11:08 INFO Utils: Shutdown hook called

it seems the worker number is not being accepted  as desccribed here:
https://spark.apache.org/docs/latest/spark-standalone.html

The documentation says:
./sbin/start-slave.sh worker# master-spark-URL

but the start.slave-sh script states:
usage=Usage: start-slave.sh spark-master-URL where spark-master-URL is 
like spark://localhost:7077

I have checked for similar issues using :
https://issues.apache.org/jira/browse/SPARK-6552?jql=text%20~%20%22start-slave%22

and found nothing similar so am raising this as an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-12 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani resolved SPARK-8143.

   Resolution: Fixed
Fix Version/s: 1.4.0

Verified, history for killed jobs are now available in the webui

 Spark application history cannot be found even for finished jobs
 

 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0, 1.3.1
Reporter: Dev Lakhani
 Fix For: 1.4.0


 Whenever a job is killed or finished, because of an application error or 
 otherwise and when I then click on Application Detail UI, even through the 
 job state is : FINISHED, I get no log results and the message states:
 Application history not found for (app-xyz-abc) 
 Application ABC is still in progress. 
 An no logs are presented.
 I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark 
 under which I see lots of files
 app-2015xyz-abc.inprogress
 Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-10 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580689#comment-14580689
 ] 

Dev Lakhani commented on SPARK-8142:


To clarify [~srowen] 

1)  I meant the other way around, if we choose to use Apache Spark, which 
provides Apache Hadoop libs and we then choose a Cloudera Hadoop distribution 
on our (the rest of our) cluster and use Cloudera Hadoop clients in the 
application code. Spark will provide Apache Hadoop libs whereas our cluster 
will be cdh5. Is there any issue in doing this?

We choose to use Apache Spark because the CDH is a version behind the official 
Spark release and we don't want to wait for say Dataframes support.

2) If I mark my spark core as provided right now, as we speak , my code 
compiles but when I run my application in my IDE using Spark local I get: 
NoClassFoundError org/apache/spark/api/java/function/Function this is why I am 
suggesting whether we need maven profiles, one for local testing and one for 
deployment? 

So getting back to the issue raised in this JIRA, which we seem to be ignoring, 
even when Hadoop and Spark is provided and Hbase client/protocol/server is 
packaged we run into SPARK-1867 which at latest comment suggests a dependency 
is missing and this results in the obscure exception. Whether this is on the 
Hadoop side or Spark side is not known but as the JIRA suggests it was caused 
by a missing dependency. I cannot see this missing class/dependency exception 
anywhere in the spark logs. This suggests that if anyone using Spark sets any 
of the userClasspath* misses out a primary, secondary or tertiary dependency 
they will encounter SPARK-1867.

Therefore we are stuck, any suggestions are welcome to overcome this. Either 
there is a need make ChildFirstURLClassLoader ignore Spark and Hadoop libs or 
help spark log what's causing SPARK-1867.



 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-10 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580730#comment-14580730
]

Dev Lakhani commented on SPARK-8142:

Hi [~vanzin]

bq. if you want to use the glassfish jersey version, you shouldn't need to do
this, right? Spark depends on the old one that is under com.sun.*, IIRC.

Yes I need to make use of glassfish 2.x in my application and not the sun.* one
provided, but this could apply to any other dependency that needs to supersede
Sparks provided etc.

bq. marking all dependencies (including hbase) as provided and using {{spark.
{driver,executor}.extraClassPath}} might be the easiest way out if you really
need to use userClassPathFirst.

This is an option but might be a challenge to scale if we have different
folder layouts for the extraClassPath in different clusters/nodes for hbase and
hadoop installs. This can be (and usually is) the case when new servers are
added to existing ones for example. If we had /disk4/path/to/hbase/libs and
the other has /disk3/another/path/to/hbase/libs and so on then the
extraClassPath will need to include these both and grow significantly and spark
submit args along with it. Also when we update Hbase these then have to change
this classpath each time.

Maybe the ideal way is to have, as you suggest, a blacklist which would contain
spark and hadoop libs. Then we could put whatever we wanted into one uber/fat
jar and it doesn't matter where Hbase and Hadoop are installed or what's
provided and compiled, but we let spark work it out.

These are just my thoughts, I'm sure others will have different preferences
and/or better approaches. Thanks anyway for your input on this JIRA.

Spark Job Fails with ResultTask ClassCastException
--

Key: SPARK-8142
URL: https://issues.apache.org/jira/browse/SPARK-8142
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

When running a Spark Job, I get no failures in the application code
whatsoever but a weird ResultTask Class exception. In my job, I create a RDD
from HBase and for each partition do a REST call on an API, using a REST
client. This has worked in IntelliJ but when I deploy to a cluster using
spark-submit.sh I get :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 3, host): java.lang.ClassCastException:
org.apache.spark.scheduler.ResultTask cannot be cast to
org.apache.spark.scheduler.Task
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
These are the configs I set to override the spark classpath because I want to
use my own glassfish jersey version:

sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);
I see no other warnings or errors in any of the logs.
Unfortunately I cannot post my code, but please ask me questions that will
help debug the issue. Using spark 1.3.1 hadoop 2.6.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-06-10 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581130#comment-14581130
 ] 

Dev Lakhani commented on SPARK-1867:


Worth a new JIRA to suggest this?

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-10 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580632#comment-14580632
 ] 

Dev Lakhani commented on SPARK-8142:


[~vanzin] [~srowen] since this has been verified independently there appears to 
be a limitation in the ChildFirstURLClassLoader class which may be causing this 
issue. The approach to mark Spark/Hadoop deps as provided may not be ideal 
because 1) it requires a maven profile for compilation/testing and deployment 
2) If we run into SPARK-1867 there is no easy way to spot missing dependencies. 
3) If we are using cdh* versions of Hadoop (client/server) then spark's 
provided Hadoop versions will differ from the CDH client being used. 

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-06-09 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578745#comment-14578745
 ] 

Dev Lakhani commented on SPARK-1867:


Although this has been marked with not an issue, I agree with [~marcreichman] 
it is a very misleading error and there's often no way to figure out which 
classes are missing. There should be an explicit ClassNotFoundException or some 
other check or warning. Whenever dependencies are missing it needs to be 
actionable.  

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-09 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578916#comment-14578916
 ] 

Dev Lakhani commented on SPARK-8142:


Any suggestion on this? To summarise, Spark and Hadoop marked as provided. Now 
running into https://issues.apache.org/jira/browse/SPARK-1867, some  missing 
dependency is causing this and there is no indication what it is. This is 
becoming a blocker for our organisation. 

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576816#comment-14576816
 ] 

Dev Lakhani commented on SPARK-8142:


Also I can confirm, without the user userClassPathFirst, the job runs but fails 
at a different point where my Jersey version clashes with my application code. 
So it seems to be an issue with the userClassPathFirst setting, when it is set 
I get the ClassCastException.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577230#comment-14577230
 ] 

Dev Lakhani commented on SPARK-8142:


I thought it was only spark deps but I've now removed all HBase and Hadoop deps 
form my uber jar. Now when I run the job I cannot locate all the relevant hbase 
client classes + deps without specifying each hbase client/server/protocol etc 
jar using --driver-class-path. Is there some spark env variable I can set to 
point to all jars under a folder otherwise I'll have to add all 20+ Hbase libs  
using the driver class path option? I know about SPARK_CLASSPATH but need a 
more elegant solution than to reference all hbase and hadoop jars myself.

HADOOP_HOME, HADOOP_CLASSPATH and HADOOP_CONF_DIR are already set.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577262#comment-14577262
 ] 

Dev Lakhani commented on SPARK-8142:


HBase kept, Hadoop and Spark removed. Now I get: ClasscastException 
org.apache.hadoop.hbase.mapreduce.TableSplit cannot be case to  
org.apache.hadoop.hbase.mapreduce.InputSplit at NewHadoopRDD.scala:115. This 
used to work when I had all spark and hadoop dependencies added in the uber jar.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577265#comment-14577265
 ] 

Dev Lakhani commented on SPARK-8142:


To be specific when I say removed I mean marked as provided

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577108#comment-14577108
 ] 

Dev Lakhani commented on SPARK-8142:


I am not bundling spark, again, I have pre-built binaries downloaded from the 
spark website, deployed on my cluster: version 1.3.1 hadoop 2.6. I am using 
java maven client dependencies for org.apache.spark spark core_2.10  version 
1.3.1 in my application- added as a maven dependency. I package my application 
using mvn shade, this builds me a jar with my applications code,  glassfish 
jersey 2.7 deps and spark 1.3.1 core deps.

I then spark-submit my job jar using spark submit (from spark 1.3.1) with my 
jar and the classcast error occurs if I have the userClasspath set to true as 
described above.

 If I don't my REST client tries to do a get operation and uses the glassfish 
jersey 2.7 api but this conflicts with com.sun.jersey 1.9 which comes with 
Spark.

I am using 1 version of spark 1.3.1-hadoop.2.6 on my cluster, I have no other 
versions of spark on that cluster. The RELEASE file states 1.3.1 (git revision 
908a0bf) build for Hadoop 2.6.0.  Confirmed on all nodes.

I did a maven dependency tree on my application code and the only spark version 
is 1.3.1 in all of the maven dependencies that I use. Spark SQL 1.3.1, Spark 
Core 1.3.1, Spark Network Common 1.3.1. I've been using this version fine for 
all other non REST based operations and other spark operations, its only when I 
use userClasspath I get this error.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577130#comment-14577130
 ] 

Dev Lakhani commented on SPARK-8143:


I cannot try it against master because we are restricted to official releases 
only, we are not allowed external git access due to organisational constraints. 
If you also have an expectation for users to continually build from master to 
verify the existence of bugs and JIRAs perhaps you need to mandate that process 
in: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-JIRA
  to be explicit as different projects have different rules. In this case, I 
accept that I did not research this issue before posting it, if that it the 
case close this as a duplicate, since I cannot verify against master.

 Spark application history cannot be found even for finished jobs
 

 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0, 1.3.1
Reporter: Dev Lakhani

 Whenever a job is killed or finished, because of an application error or 
 otherwise and when I then click on Application Detail UI, even through the 
 job state is : FINISHED, I get no log results and the message states:
 Application history not found for (app-xyz-abc) 
 Application ABC is still in progress. 
 An no logs are presented.
 I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark 
 under which I see lots of files
 app-2015xyz-abc.inprogress
 Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577155#comment-14577155
 ] 

Dev Lakhani commented on SPARK-8142:


spark-core and spark-sql marked as provided, same error.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577371#comment-14577371
 ] 

Dev Lakhani commented on SPARK-8142:


Update:  having resolved some dependencies issues the current state is this:

hadoop-common 2.6.0 - provided
hadoop-client 2.6.0 provided
hadoop -hdfs 2.6.0 provided
spark-sql_s.10 provided
spark-core_2.10 provided
hbase-client 1.1.0 included.packaged
hbase -protocol 1.1.0 included/packaged
hbase -server 1.1.0 included/packaged

I run the job and run into this: 
https://issues.apache.org/jira/browse/SPARK-1867 which suggests a class is 
missing, how do I find which one? There is ClassNotFoundException exception but 
something might be missing, how can I find this out?

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask Class Cast Exception

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Summary: Spark Job Fails with ResultTask Class Cast Exception  (was: Spark 
Job Fails with ResultTask Class Exception)

 Spark Job Fails with ResultTask Class Cast Exception
 

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job I run create an 
 RDD from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8142) Spark Job Fails with ResultTask Class Exception

2015-06-06 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-8142:
--

 Summary: Spark Job Fails with ResultTask Class Exception
 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani


When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3 hadoop 2.6.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-06 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-8143:
--

 Summary: Spark application history cannot be found even for 
finished jobs
 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1, 1.3.0
Reporter: Dev Lakhani


Whenever a job is killed or finished, because of an application error or 
otherwise and when I then click on Application Detail UI, even through the job 
state is : FINISHED, I get no log results and the message states:

Application history not found for (app-xyz-abc) 

Application ABC is still in progress. 

An no logs are presented.

I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/sparkunder 
which I see lots of files

app-2015xyz-abc.inprogress

Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-06 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575958#comment-14575958
 ] 

Dev Lakhani commented on SPARK-8142:


I'm using pre-compiled binaries for spark 1.3.1 hadoop 2.6. and checked the 
spark versions with my application. As I mention, I need to use my classpath 
not sparks hence the setting. If I don't spark makes use of 
jersey.version1.9/jersey.version from 
https://github.com/apache/spark/blob/master/yarn/pom.xml is not compatible with 
my application code. I know the userClasspath settings are experimental but 
they are required for my use case. 

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-06 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575959#comment-14575959
 ] 

Dev Lakhani commented on SPARK-8143:


Ok will wait for the official 1.4 release and will confirm.

 Spark application history cannot be found even for finished jobs
 

 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.0, 1.3.1
Reporter: Dev Lakhani

 Whenever a job is killed or finished, because of an application error or 
 otherwise and when I then click on Application Detail UI, even through the 
 job state is : FINISHED, I get no log results and the message states:
 Application history not found for (app-xyz-abc) 
 Application ABC is still in progress. 
 An no logs are presented.
 I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark 
 under which I see lots of files
 app-2015xyz-abc.inprogress
 Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask Class Cast Exception

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Description: 
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3.1 hadoop 2.6.



  was:
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3 hadoop 2.6.




 Spark Job Fails with ResultTask Class Cast Exception
 

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job I run create an 
 RDD from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Summary: Spark Job Fails with ResultTask ClassCastException  (was: Spark 
Job Fails with ResultTask Class Cast Exception)

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job I run create an 
 RDD from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Description: 
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job, I create a RDD from HBase 
and for each partition do a REST call on an API, using a REST client.  This has 
worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3.1 hadoop 2.6.



  was:
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3.1 hadoop 2.6.




 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6846) Stage kill URL easy to accidentally trigger and possibility for security issue.

2015-05-14 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543970#comment-14543970
 ] 

Dev Lakhani commented on SPARK-6846:


As a more complex solution would it be possible to have a unique stage id and 
use that in the URL?

 
http://localhost:4040/stages/kill/?id=0stage-id=UNIQUE-STAGE-IDterminate=true

A simple auto complete of a previous kill command in chrome followed by an 
enter can kill hours worth of work done. Or any other ideas?

 Stage kill URL easy to accidentally trigger and possibility for security 
 issue.
 ---

 Key: SPARK-6846
 URL: https://issues.apache.org/jira/browse/SPARK-6846
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Dev Lakhani
Assignee: Sean Owen
Priority: Minor
 Fix For: 1.4.0


 On a similar note: When the kill link is cached in the browser bar, it's easy 
 to accidentally kill a job just by pressing enter. For example:
 You press the kill stage button and get the prompt whether you want to kill 
 the stage. You launch a new job and start typing:
 http://localhost:4040/
 Chrome for example starts auto completing with 
 http://localhost:4040/stages/kill/?id=0terminate=true 
 If you accidentally press enter it will kill the current stage without any 
 prompts.
 I think its also a bit of a security issue if from any host you can 
 curl/wget/issue: http://localhost:4040/stages/kill/?id=0terminate=true and 
 it will kill the current stage without prompting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6846) Stage kill URL easy to accidentally trigger and possibility for security issue.

2015-04-14 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494012#comment-14494012
 ] 

Dev Lakhani commented on SPARK-6846:


[~srowen] please go ahead, I wont have time for this this week. 

 Stage kill URL easy to accidentally trigger and possibility for security 
 issue.
 ---

 Key: SPARK-6846
 URL: https://issues.apache.org/jira/browse/SPARK-6846
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Dev Lakhani
Priority: Minor

 On a similar note: When the kill link is cached in the browser bar, it's easy 
 to accidentally kill a job just by pressing enter. For example:
 You press the kill stage button and get the prompt whether you want to kill 
 the stage. You launch a new job and start typing:
 http://localhost:4040/
 Chrome for example starts auto completing with 
 http://localhost:4040/stages/kill/?id=0terminate=true 
 If you accidentally press enter it will kill the current stage without any 
 prompts.
 I think its also a bit of a security issue if from any host you can 
 curl/wget/issue: http://localhost:4040/stages/kill/?id=0terminate=true and 
 it will kill the current stage without prompting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5273) Improve documentation examples for LinearRegression

2015-02-08 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-5273:
---
Affects Version/s: (was: 1.2.0)

 Improve documentation examples for LinearRegression 
 

 Key: SPARK-5273
 URL: https://issues.apache.org/jira/browse/SPARK-5273
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Dev Lakhani
Priority: Minor

 In the document:
 https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html
 Under
 Linear least squares, Lasso, and ridge regression
 The suggested method to use LinearRegressionWithSGD.train()
 // Building the model
 val numIterations = 100
 val model = LinearRegressionWithSGD.train(parsedData, numIterations)
 is not ideal even for simple examples such as y=x. This should be replaced 
 with more real world parameters with step size:
 val lr = new LinearRegressionWithSGD()
 lr.optimizer.setStepSize(0.0001)
 lr.optimizer.setNumIterations(100)
 or
 LinearRegressionWithSGD.train(input,100,0.0001)
 To create a reasonable MSE. It took me a while using the dev forum to learn 
 that the step size should be really small. Might help save someone the same 
 effort when learning mllib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5273) Improve documentation examples for LinearRegression

2015-02-08 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-5273:
---
Affects Version/s: 1.2.0

 Improve documentation examples for LinearRegression 
 

 Key: SPARK-5273
 URL: https://issues.apache.org/jira/browse/SPARK-5273
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Dev Lakhani
Priority: Minor

 In the document:
 https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html
 Under
 Linear least squares, Lasso, and ridge regression
 The suggested method to use LinearRegressionWithSGD.train()
 // Building the model
 val numIterations = 100
 val model = LinearRegressionWithSGD.train(parsedData, numIterations)
 is not ideal even for simple examples such as y=x. This should be replaced 
 with more real world parameters with step size:
 val lr = new LinearRegressionWithSGD()
 lr.optimizer.setStepSize(0.0001)
 lr.optimizer.setNumIterations(100)
 or
 LinearRegressionWithSGD.train(input,100,0.0001)
 To create a reasonable MSE. It took me a while using the dev forum to learn 
 that the step size should be really small. Might help save someone the same 
 effort when learning mllib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5273) Improve documentation examples for LinearRegression

2015-01-15 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-5273:
--

 Summary: Improve documentation examples for LinearRegression 
 Key: SPARK-5273
 URL: https://issues.apache.org/jira/browse/SPARK-5273
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Dev Lakhani
Priority: Minor


In the document:
https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html

Under
Linear least squares, Lasso, and ridge regression

The suggested method to use LinearRegressionWithSGD.train()
// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

is not ideal even for simple examples such as y=x. This should be replaced with 
more real world parameters with step size:

val lr = new LinearRegressionWithSGD()
lr.optimizer.setStepSize(0.0001)
lr.optimizer.setNumIterations(100)

or

LinearRegressionWithSGD.train(input,100,0.0001)

To create a reasonable MSE. It took me a while using the dev forum to learn 
that the step size should be really small. Might help save someone the same 
effort when learning mllib.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-576) Design and develop a more precise progress estimator

2014-10-17 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175023#comment-14175023
 ] 

Dev Lakhani commented on SPARK-576:
---

I've created a PR for this: https://github.com/apache/spark/pull/2837/

 Design and develop a more precise progress estimator
 

 Key: SPARK-576
 URL: https://issues.apache.org/jira/browse/SPARK-576
 Project: Spark
  Issue Type: Improvement
Reporter: Mosharaf Chowdhury

 In addition to task_completed/total_tasks, we need to have something that 
 says estimated_time_remaining.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-16 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173619#comment-14173619
]

Dev Lakhani commented on SPARK-2321:

There are some issues and bugs under the webui component that are active.
Should we incorporate these into this Jira or is it best to work on them
separately and then merge these (2321) changes later?

https://issues.apache.org/jira/browse/SPARK/component/12322616

Design a proper progress reporting event listener API
---

Key: SPARK-2321
URL: https://issues.apache.org/jira/browse/SPARK-2321
Project: Spark
Issue Type: Improvement
Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

This is a ticket to track progress on redesigning the SparkListener and
JobProgressListener API.
There are multiple problems with the current design, including:
0. I'm not sure if the API is usable in Java (there are at least some enums
we used in Scala and a bunch of case classes that might complicate things).
1. The whole API is marked as DeveloperApi, because we haven't paid a lot of
attention to it yet. Something as important as progress reporting deserves a
more stable API.
2. There is no easy way to connect jobs with stages. Similarly, there is no
easy way to connect job groups with jobs / stages.
3. JobProgressListener itself has no encapsulation at all. States can be
arbitrarily mutated by external programs. Variable names are sort of randomly
decided and inconsistent.
We should just revisit these and propose a new, concrete design.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173926#comment-14173926
]

Dev Lakhani commented on SPARK-3957:

Here is my thoughts on a possible approach.

Hi All

The broadcast occurs form the Spark Context to the broadcastmanager and new
Broadcast method. In the first instance, the broadcasted data is stored in the
Block Manager (see HttpBroadCast) of the executor. Any tracking of broadcast
variables must be referenced by the BlockManagerSlaveActor and
BlockManagerMasterActor. In particular UpdateBlockInfo and RemoveBroadcast
should update the total memory in blocks used when blocks are added and removed.

These can then be hooked up to the UI using a new Page like ExecutorsPage and
defining a new methods in the relevant listener such as StorageStatusListener.

These are my initial thoughts for someone new to these components, any other
ideas or approaches?

Broadcast variable memory usage not reflected in UI
---

Key: SPARK-3957
URL: https://issues.apache.org/jira/browse/SPARK-3957
Project: Spark
Issue Type: Bug
Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

Memory used by broadcast variables are not reflected in the memory usage
reported in the WebUI. For example, the executors tab shows memory used in
each executor but this number doesn't include memory used by broadcast
variables. Similarly the storage tab only shows list of rdds cached and how
much memory they use.
We should add a separate column / tab for broadcast variables to make it
easier to debug.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174107#comment-14174107
 ] 

Dev Lakhani commented on SPARK-3957:


Hi 

For now I am happy for [~CodingCat] to take this on, maybe once there are some 
commits I can help with the UI side, but for now I'll hold back.



 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)

2014-10-09 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165293#comment-14165293
 ] 

Dev Lakhani commented on SPARK-3644:


Hi I am doing some work on the REST/JSON aspects and will be happy to take this 
on. Can someone assign it to me and/or help me get started?

We need to first draft out the various endpoints and document them somewhere.

Thanks
Dev

 REST API for Spark application info (jobs / stages / tasks / storage info)
 --

 Key: SPARK-3644
 URL: https://issues.apache.org/jira/browse/SPARK-3644
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Reporter: Josh Rosen

 This JIRA is a forum to draft a design proposal for a REST interface for 
 accessing information about Spark applications, such as job / stage / task / 
 storage status.
 There have been a number of proposals to serve JSON representations of the 
 information displayed in Spark's web UI.  Given that we might redesign the 
 pages of the web UI (and possibly re-implement the UI as a client of a REST 
 API), the API endpoints and their responses should be independent of what we 
 choose to display on particular web UI pages / layouts.
 Let's start a discussion of what a good REST API would look like from 
 first-principles.  We can discuss what urls / endpoints expose access to 
 data, how our JSON responses will be formatted, how fields will be named, how 
 the API will be documented and tested, etc.
 Some links for inspiration:
 https://developer.github.com/v3/
 http://developer.netflix.com/docs/REST_API_Reference
 https://helloreverb.com/developers/swagger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

44 matches

Mail list logo