[jira] [Created] (SPARK-2013) Add Python pickleFile to programming guide
Matei Zaharia created SPARK-2013: Summary: Add Python pickleFile to programming guide Key: SPARK-2013 URL: https://issues.apache.org/jira/browse/SPARK-2013 Project: Spark Issue Type: Documentation Components: Documentation, PySpark Reporter: Matei Zaharia Priority: Trivial Fix For: 1.1.0 Should be added in the Python version of http://spark.apache.org/docs/latest/programming-guide.html#external-datasets. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2014) Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default
Matei Zaharia created SPARK-2014: Summary: Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default Key: SPARK-2014 URL: https://issues.apache.org/jira/browse/SPARK-2014 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Matei Zaharia Since the data is serialized on the Python side, there's not much point in keeping it as byte arrays in Java, or even in skipping compression. We should make cache() in PySpark use MEMORY_ONLY_SER and turn on spark.rdd.compress for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017477#comment-14017477 ] Xiangrui Meng commented on SPARK-1977: -- This is more likely a version conflict in your dependencies. From the Spark WebUI, you can find the system classpath in the environment tab. Please verify that you don't have two different versions of spark, kryo, or any other related library. Classes may hide inside an assembly jar. mutable.BitSet in ALS not serializable with KryoSerializer -- Key: SPARK-1977 URL: https://issues.apache.org/jira/browse/SPARK-1977 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Neville Li Priority: Minor OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member. KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet. Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill. {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet Serialization trace: shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at
[jira] [Created] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large
Reynold Xin created SPARK-2016: -- Summary: rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large Key: SPARK-2016 URL: https://issues.apache.org/jira/browse/SPARK-2016 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Try run {code} sc.parallelize(1 to 100, 100).cache().count() {code} And open the storage UI for this RDD. It takes forever to load the page. When the number of partitions is very large, I think there are a few alternatives: 0. Only show the top 1000. 1. Pagination 2. Instead of grouping by RDD blocks, group by executors -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
Reynold Xin created SPARK-2017: -- Summary: web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large
[ https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2016: --- Labels: starter (was: ) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large Key: SPARK-2016 URL: https://issues.apache.org/jira/browse/SPARK-2016 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter Try run {code} sc.parallelize(1 to 100, 100).cache().count() {code} And open the storage UI for this RDD. It takes forever to load the page. When the number of partitions is very large, I think there are a few alternatives: 0. Only show the top 1000. 1. Pagination 2. Instead of grouping by RDD blocks, group by executors -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017508#comment-14017508 ] Neville Li edited comment on SPARK-1977 at 6/4/14 8:45 AM: --- We submit 1 spark-assembly and 1 job assembly jar via spark-submit and there are no other obvious scala/spark/kryo jars in the global classpath. I can reproduce the same exception locally with the following snippet, when kryo.register() is commented out. I just added mutable BitSet to Twitter chill: https://github.com/twitter/chill/pull/185 {code} import com.twitter.chill._ import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator} import org.apache.spark.SparkConf import scala.collection.mutable class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { // kryo.register(classOf[mutable.BitSet]) } } case class OutLinkBlock(elementIds: Array[Int], shouldSend: Array[mutable.BitSet]) object KryoTest { def main(args: Array[String]) { println(hello) val conf = new SparkConf() .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.kryo.registrator, classOf[MyRegistrator].getName) val serializer = new KryoSerializer(conf).newInstance() val bytes = serializer.serialize(OutLinkBlock(Array(1, 2, 3), Array(mutable.BitSet(2, 4, 6 serializer.deserialize(bytes).asInstanceOf[OutLinkBlock] } } {code} was (Author: sinisa_lyh): We submit 1 spark-assembly and 1 job assembly jar via spark-submit and there are no other obvious scala/spark/kryo jars in the global classpath. I can reproduce the same exception locally with the following snippet, when kryo.register() is commented out. I just added mutable BitSet to Twitter chill: https://github.com/twitter/chill/pull/185 {{code}} import com.twitter.chill._ import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator} import org.apache.spark.SparkConf import scala.collection.mutable class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { // kryo.register(classOf[mutable.BitSet]) } } case class OutLinkBlock(elementIds: Array[Int], shouldSend: Array[mutable.BitSet]) object KryoTest { def main(args: Array[String]) { println(hello) val conf = new SparkConf() .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.kryo.registrator, classOf[MyRegistrator].getName) val serializer = new KryoSerializer(conf).newInstance() val bytes = serializer.serialize(OutLinkBlock(Array(1, 2, 3), Array(mutable.BitSet(2, 4, 6 serializer.deserialize(bytes).asInstanceOf[OutLinkBlock] } } {{code}} mutable.BitSet in ALS not serializable with KryoSerializer -- Key: SPARK-1977 URL: https://issues.apache.org/jira/browse/SPARK-1977 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Neville Li Priority: Minor OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member. KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet. Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill. {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet Serialization trace: shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017508#comment-14017508 ] Neville Li commented on SPARK-1977: --- We submit 1 spark-assembly and 1 job assembly jar via spark-submit and there are no other obvious scala/spark/kryo jars in the global classpath. I can reproduce the same exception locally with the following snippet, when kryo.register() is commented out. I just added mutable BitSet to Twitter chill: https://github.com/twitter/chill/pull/185 {{code}} import com.twitter.chill._ import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator} import org.apache.spark.SparkConf import scala.collection.mutable class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { // kryo.register(classOf[mutable.BitSet]) } } case class OutLinkBlock(elementIds: Array[Int], shouldSend: Array[mutable.BitSet]) object KryoTest { def main(args: Array[String]) { println(hello) val conf = new SparkConf() .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.kryo.registrator, classOf[MyRegistrator].getName) val serializer = new KryoSerializer(conf).newInstance() val bytes = serializer.serialize(OutLinkBlock(Array(1, 2, 3), Array(mutable.BitSet(2, 4, 6 serializer.deserialize(bytes).asInstanceOf[OutLinkBlock] } } {{code}} mutable.BitSet in ALS not serializable with KryoSerializer -- Key: SPARK-1977 URL: https://issues.apache.org/jira/browse/SPARK-1977 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Neville Li Priority: Minor OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member. KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet. Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill. {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet Serialization trace: shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
[jira] [Issue Comment Deleted] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes
[ https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Chao updated SPARK-1999: - Comment: was deleted (was: https://github.com/apache/spark/pull/950 sorry,i will repost soon, the above link will be invalid.) UI : StorageLevel in storage tab and RDD Storage Info never changes Key: SPARK-1999 URL: https://issues.apache.org/jira/browse/SPARK-1999 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Chen Chao StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if you call rdd.unpersist() and then you give the rdd another different storage level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes
[ https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017519#comment-14017519 ] Chen Chao commented on SPARK-1999: -- PR:https://github.com/apache/spark/pull/968 UI : StorageLevel in storage tab and RDD Storage Info never changes Key: SPARK-1999 URL: https://issues.apache.org/jira/browse/SPARK-1999 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Chen Chao StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if you call rdd.unpersist() and then you give the rdd another different storage level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes
[ https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Chao updated SPARK-1999: - Comment: was deleted (was: I have fixed and tested fine. Please assign it to me , I will post a PR soon!) UI : StorageLevel in storage tab and RDD Storage Info never changes Key: SPARK-1999 URL: https://issues.apache.org/jira/browse/SPARK-1999 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Chen Chao StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if you call rdd.unpersist() and then you give the rdd another different storage level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2018) Big-Endian (IBM Power7) Spark Serialization issue
[ https://issues.apache.org/jira/browse/SPARK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017528#comment-14017528 ] Sean Owen commented on SPARK-2018: -- The meaning of the error is that Java thinks two serializable classes are not mutually compatible. This is because two different serialVersioUIDs get computed for two copies of what may be the same class. If I understand you correctly, you are communicating between different JVM versions, or reading one's output from the other? I don't think it's guaranteed that the auto-generated serialVersionUID will be the same. If so, it's nothing to do with big-endian-ness per se. Does it happen entirely within the same machine / JVM? Big-Endian (IBM Power7) Spark Serialization issue -- Key: SPARK-2018 URL: https://issues.apache.org/jira/browse/SPARK-2018 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Environment: hardware : IBM Power7 OS:Linux version 2.6.32-358.el6.ppc64 (mockbu...@ppc-017.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:43:27 EST 2013 JDK: Java(TM) SE Runtime Environment (build pxp6470sr5-20130619_01(SR5)) IBM J9 VM (build 2.6, JRE 1.7.0 Linux ppc64-64 Compressed References 20130617_152572 (JIT enabled, AOT enabled) Hadoop:Hadoop-0.2.3-CDH5.0 Spark:Spark-1.0.0 or Spark-0.9.1 spark-env.sh: export JAVA_HOME=/opt/ibm/java-ppc64-70/ export SPARK_MASTER_IP=9.114.34.69 export SPARK_WORKER_MEMORY=1m export SPARK_CLASSPATH=/home/test1/spark-1.0.0-bin-hadoop2/lib export STANDALONE_SPARK_MASTER_HOST=9.114.34.69 #export SPARK_JAVA_OPTS=' -Xdebug -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n ' Reporter: Yanjie Gao We have an application run on Spark on Power7 System . But we meet an important issue about serialization. The example HdfsWordCount can meet the problem. ./bin/run-example org.apache.spark.examples.streaming.HdfsWordCount localdir We used Power7 (Big-Endian arch) and Redhat 6.4. Big-Endian is the main cause since the example ran successfully in another Power-based Little Endian setup. here is the exception stack and log: Spark Executor Command: /opt/ibm/java-ppc64-70//bin/java -cp /home/test1/spark-1.0.0-bin-hadoop2/lib::/home/test1/src/spark-1.0.0-bin-hadoop2/conf:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/ -XX:MaxPermSize=128m -Xdebug -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n -Xms512M -Xmx512M org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler 2 p7hvs7br16 4 akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker app-20140604023054- 14/06/04 02:31:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/04 02:31:21 INFO spark.SecurityManager: Changing view acls to: test1,yifeng 14/06/04 02:31:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test1, yifeng) 14/06/04 02:31:22 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/06/04 02:31:22 INFO Remoting: Starting remoting 14/06/04 02:31:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@p7hvs7br16:39658] 14/06/04 02:31:22 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@p7hvs7br16:39658] 14/06/04 02:31:22 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler 14/06/04 02:31:22 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker 14/06/04 02:31:23 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker 14/06/04 02:31:24 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver 14/06/04 02:31:24 INFO spark.SecurityManager: Changing view acls to: test1,yifeng 14/06/04 02:31:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test1, yifeng) 14/06/04 02:31:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/06/04 02:31:24 INFO Remoting: Starting remoting 14/06/04 02:31:24 INFO Remoting: Remoting started; listening on addresses
[jira] [Created] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
sam created SPARK-2019: -- Summary: Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017753#comment-14017753 ] Qiuzhuang Lian commented on SPARK-1520: --- I can run the assembly jar via bin\spark-shell.cmd but couldn't run the example LocalKMeans in Intellj IDEA which throws out Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector. Can somebody suggest a fix since I prefer try coding in Intellj IDEA. Thanks. Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. h1. Isolation and Cause The package-time behavior of Java 6 and 7 differ with respect to the format used for jar files: ||Number of entries||JDK 6||JDK 7|| |= 65536|zip|zip| | 65536|zip*|zip64| zip* is a workaround for the original zip format that [described in JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows some versions of Java 6 to support larger assembly jars. The Scala libraries we depend on have added a large number of classes which bumped us over the limit. This causes the Java 7 packaging to not work with Java 6. We can probably go back under the limit by clearing out some accidental inclusion of FastUtil, but eventually we'll go over again. The real answer is to force people to build with JDK 6 if they want to run Spark on JRE 6. -I've found that if I just unpack and re-pack the jar (using `jar`) it always works:- {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself.- -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017776#comment-14017776 ] Mark Hamstra commented on SPARK-2019: - Please don't leave the Affects Version/s selector on None. As with the SO question, is this an issue that you are seeing with Spark 0.9.0? If so, then the version of Spark that you are using is significantly out of date even on the 0.9 branch. Several bug fixes are present in the 0.9.1 release of Spark, which has been available for almost two months. There are a few more in the current 0.9.2-SNAPSHOT code, and many more in the recent 1.0.0 release. Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1817) RDD zip erroneous when partitions do not divide RDD count
[ https://issues.apache.org/jira/browse/SPARK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017812#comment-14017812 ] Kan Zhang commented on SPARK-1817: -- There are 2 issues related to this bug. One is that we partition numeric ranges (e.g., Long and Double ranges) differently from other types of sequences (i.e, at different indexes). This causes elements to be dropped when zipping with numeric ranges since we zip by partition and partitions for numeric ranges may have different sizes from other sequences (even if the total length and the number of partitions are the same). This is fixed in SPARK-1837. One caveat is currently partitioning Double ranges still doesn't work properly due to a Scala bug that breaks {{take}} and {{drop}} on Double ranges (https://issues.scala-lang.org/browse/SI-8518). The other issue is instead of dropping elements silently, we should throw an error during zipping when we found out that partition sizes are not the same between 2 sequences. This is fixed by https://github.com/apache/spark/pull/944 RDD zip erroneous when partitions do not divide RDD count - Key: SPARK-1817 URL: https://issues.apache.org/jira/browse/SPARK-1817 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Michael Malak Assignee: Kan Zhang Fix For: 1.1.0 Example: scala sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect res1: Array[(Long, Int)] = Array((2,11)) But more generally, it's whenever the number of partitions does not evenly divide the total number of elements in the RDD. See https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-1817) RDD zip erroneous when partitions do not divide RDD count
[ https://issues.apache.org/jira/browse/SPARK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-1817: - Comment: was deleted (was: PR: https://github.com/apache/spark/pull/760) RDD zip erroneous when partitions do not divide RDD count - Key: SPARK-1817 URL: https://issues.apache.org/jira/browse/SPARK-1817 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Michael Malak Assignee: Kan Zhang Fix For: 1.1.0 Example: scala sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect res1: Array[(Long, Int)] = Array((2,11)) But more generally, it's whenever the number of partitions does not evenly divide the total number of elements in the RDD. See https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2013) Add Python pickleFile to programming guide
[ https://issues.apache.org/jira/browse/SPARK-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2013: - Assignee: Kan Zhang Add Python pickleFile to programming guide -- Key: SPARK-2013 URL: https://issues.apache.org/jira/browse/SPARK-2013 Project: Spark Issue Type: Documentation Components: Documentation, PySpark Reporter: Matei Zaharia Assignee: Kan Zhang Priority: Trivial Fix For: 1.1.0 Should be added in the Python version of http://spark.apache.org/docs/latest/programming-guide.html#external-datasets. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1973) Add randomSplit to JavaRDD (with tests, and tidy Java tests)
[ https://issues.apache.org/jira/browse/SPARK-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1973. -- Resolution: Implemented PR: https://github.com/apache/spark/pull/919 Add randomSplit to JavaRDD (with tests, and tidy Java tests) Key: SPARK-1973 URL: https://issues.apache.org/jira/browse/SPARK-1973 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sean Owen Assignee: Sean Owen Priority: Minor Fix For: 1.1.0 I'd like to use randomSplit through the Java API, and would like to add a convenience wrapper for this method to JavaRDD. This is fairly trivial. (In fact, is the intent that JavaRDD not wrap every RDD method? and that sometimes users should just use JavaRDD.wrapRDD()?) Along the way, I added tests for it, and also touched up the Java API test style and behavior. This is maybe the more useful part of this small change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1973) Add randomSplit to JavaRDD (with tests, and tidy Java tests)
[ https://issues.apache.org/jira/browse/SPARK-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-1973: - Assignee: Sean Owen Add randomSplit to JavaRDD (with tests, and tidy Java tests) Key: SPARK-1973 URL: https://issues.apache.org/jira/browse/SPARK-1973 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sean Owen Assignee: Sean Owen Priority: Minor Fix For: 1.1.0 I'd like to use randomSplit through the Java API, and would like to add a convenience wrapper for this method to JavaRDD. This is fairly trivial. (In fact, is the intent that JavaRDD not wrap every RDD method? and that sometimes users should just use JavaRDD.wrapRDD()?) Along the way, I added tests for it, and also touched up the Java API test style and behavior. This is maybe the more useful part of this small change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-1704) java.lang.AssertionError: assertion failed: No plan for ExplainCommand (Project [*])
[ https://issues.apache.org/jira/browse/SPARK-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-1704: Comment: was deleted (was: [~marmbrus] I am attaching the link to the PR.) java.lang.AssertionError: assertion failed: No plan for ExplainCommand (Project [*]) Key: SPARK-1704 URL: https://issues.apache.org/jira/browse/SPARK-1704 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Environment: linux Reporter: Yangjp Labels: sql Fix For: 1.1.0 Original Estimate: 612h Remaining Estimate: 612h 14/05/03 22:08:40 INFO ParseDriver: Parsing command: explain select * from src 14/05/03 22:08:40 INFO ParseDriver: Parse Completed 14/05/03 22:08:40 WARN LoggingFilter: EXCEPTION : java.lang.AssertionError: assertion failed: No plan for ExplainCommand (Project [*]) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:263) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:263) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:264) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:264) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:260) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:248) at org.apache.spark.sql.hive.api.java.JavaHiveContext.hql(JavaHiveContext.scala:39) at org.apache.spark.examples.TimeServerHandler.messageReceived(TimeServerHandler.java:72) at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407) at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) at org.apache.mina.filter.logging.LoggingFilter.messageReceived(LoggingFilter.java:208) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410) at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710) at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664) at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653) at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2020) Spark 1.0.0 fails to run in coarse-grained mesos mode
Ajay Viswanathan created SPARK-2020: --- Summary: Spark 1.0.0 fails to run in coarse-grained mesos mode Key: SPARK-2020 URL: https://issues.apache.org/jira/browse/SPARK-2020 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Environment: Ubuntu 14.04, 64-bit 8GB RAM Reporter: Ajay Viswanathan I am using Mesos to run Spark applications on a cluster. Earlier, in Spark 0.9.1 and below, I could run tasks in coarse-grained more on the workers; but now, when I try to do the same in Spark 1.0.0, I get an exception preventing me from running the tasks. Fine-grained mode works fine in Spark 1.0.0 though. Snippet of stderr - Executor registered on slave Exception in thread main java.lang.NumberFormatException: For input string: ip at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:135) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Running the spark application connected to mesos master throws an error - Is Spark installed on it? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2020) Spark 1.0.0 fails to run in coarse-grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018022#comment-14018022 ] Ajay Viswanathan commented on SPARK-2020: - Do I have to use Java 8 to rectify this error? Spark 1.0.0 fails to run in coarse-grained mesos mode - Key: SPARK-2020 URL: https://issues.apache.org/jira/browse/SPARK-2020 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Environment: Ubuntu 14.04, 64-bit 8GB RAM Reporter: Ajay Viswanathan I am using Mesos to run Spark applications on a cluster. Earlier, in Spark 0.9.1 and below, I could run tasks in coarse-grained more on the workers; but now, when I try to do the same in Spark 1.0.0, I get an exception preventing me from running the tasks. Fine-grained mode works fine in Spark 1.0.0 though. Snippet of stderr - Executor registered on slave Exception in thread main java.lang.NumberFormatException: For input string: ip at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:135) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Running the spark application connected to mesos master throws an error - Is Spark installed on it? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018048#comment-14018048 ] sam commented on SPARK-2019: Sorry. Its 0.9.1 Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1508) Add support for reading from SparkConf
[ https://issues.apache.org/jira/browse/SPARK-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018043#comment-14018043 ] Zongheng Yang commented on SPARK-1508: -- WIP PR: https://github.com/apache/spark/pull/956 We'd want to support: (1) API calls on SQLConf objects to get/set properties. (2) Support SQL/HiveQL SET commands of various kinds, e.g. SET key=val, SET, SET key in the sense that these should be reflected in / go through SQLConf objects. (3) Make sql(SET ...).collect() (or perhaps also some other operations; also for hql()) return expected results, i.e. the key/val pairs. To do this there are some necessary refactorings for the QueryExecution pipeline. Add support for reading from SparkConf -- Key: SPARK-1508 URL: https://issues.apache.org/jira/browse/SPARK-1508 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Zongheng Yang Fix For: 1.1.0 Right now we have no ability to configure things in Spark SQL. A good start would be passing a SparkConf though the planner such that users could override the number of partitions used during an Exchange. Note that while current spark confs are immutable after the context is created, we want some ability to change settings on a per query basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1508) Add support for reading from SparkConf
[ https://issues.apache.org/jira/browse/SPARK-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018053#comment-14018053 ] Michael Armbrust commented on SPARK-1508: - It is likely we will fix this issue through the solution here. Add support for reading from SparkConf -- Key: SPARK-1508 URL: https://issues.apache.org/jira/browse/SPARK-1508 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Zongheng Yang Fix For: 1.1.0 Right now we have no ability to configure things in Spark SQL. A good start would be passing a SparkConf though the planner such that users could override the number of partitions used during an Exchange. Note that while current spark confs are immutable after the context is created, we want some ability to change settings on a per query basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamstra updated SPARK-2019: Affects Version/s: 0.9.1 Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2019: --- Fix Version/s: 0.9.2 Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: sam Priority: Critical Fix For: 0.9.2 We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018086#comment-14018086 ] Patrick Wendell commented on SPARK-2019: We should dig into this and figure out what's going on. Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: sam Priority: Critical Fix For: 0.9.2 We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2019: --- Priority: Critical (was: Major) Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: sam Priority: Critical Fix For: 0.9.2 We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018155#comment-14018155 ] Xiangrui Meng commented on SPARK-1977: -- In our example code, we only register `Rating` and it works. Could you try adding the following: {code} kryo.register(classOf[Rating]) {code} I need to reproduce this problem with `ALS.train`. mutable.BitSet in ALS not serializable with KryoSerializer -- Key: SPARK-1977 URL: https://issues.apache.org/jira/browse/SPARK-1977 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Neville Li Priority: Minor OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member. KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet. Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill. {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet Serialization trace: shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at
[jira] [Updated] (SPARK-1912) Compression memory issue during reduce
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1912: - Target Version/s: 0.9.2, 1.0.1, 1.1.0 (was: 0.9.2, 1.0.1) Compression memory issue during reduce -- Key: SPARK-1912 URL: https://issues.apache.org/jira/browse/SPARK-1912 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Wenchen Fan Assignee: Wenchen Fan Fix For: 1.1.0 When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so why we create compression instance at the first time? Can we do it lazily that when a block is first read, create compression instance for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1912) Compression memory issue during reduce
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1912: - Target Version/s: 0.9.2, 1.0.1 Compression memory issue during reduce -- Key: SPARK-1912 URL: https://issues.apache.org/jira/browse/SPARK-1912 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Wenchen Fan Assignee: Wenchen Fan Fix For: 1.1.0 When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so why we create compression instance at the first time? Can we do it lazily that when a block is first read, create compression instance for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018254#comment-14018254 ] Neville Li commented on SPARK-1977: --- Yes we did register 'Rating'. And we had to register(classOf[mutable.BitSet]) in addition to make it work. mutable.BitSet in ALS not serializable with KryoSerializer -- Key: SPARK-1977 URL: https://issues.apache.org/jira/browse/SPARK-1977 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Neville Li Priority: Minor OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member. KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet. Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill. {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet Serialization trace: shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at scala.Option.foreach(Option.scala:236)
[jira] [Created] (SPARK-2023) PySpark reduce does a map side reduce and then sends the results to the driver for final reduce, instead do this more like Scala Spark.
holdenk created SPARK-2023: -- Summary: PySpark reduce does a map side reduce and then sends the results to the driver for final reduce, instead do this more like Scala Spark. Key: SPARK-2023 URL: https://issues.apache.org/jira/browse/SPARK-2023 Project: Spark Issue Type: Improvement Components: PySpark Reporter: holdenk PySpark reduce does a map side reduce and then sends the results to the driver for final reduce, instead do this more like Scala Spark. The current implementation could be a bottleneck. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2024) Add saveAsSequenceFile to PySpark
Matei Zaharia created SPARK-2024: Summary: Add saveAsSequenceFile to PySpark Key: SPARK-2024 URL: https://issues.apache.org/jira/browse/SPARK-2024 Project: Spark Issue Type: New Feature Components: PySpark Reporter: Matei Zaharia After SPARK-1414 we will be able to read SequenceFiles from Python, but it remains to write them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1790) Update EC2 scripts to support r3 instance types
[ https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1790: --- Fix Version/s: 1.1.0 0.9.2 Update EC2 scripts to support r3 instance types --- Key: SPARK-1790 URL: https://issues.apache.org/jira/browse/SPARK-1790 Project: Spark Issue Type: Improvement Components: EC2 Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Matei Zaharia Assignee: Sujeet Varakhedi Labels: Starter Fix For: 0.9.2, 1.0.1, 1.1.0 These were recently added by Amazon as a cheaper high-memory option -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1790) Update EC2 scripts to support r3 instance types
[ https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1790. Resolution: Fixed Update EC2 scripts to support r3 instance types --- Key: SPARK-1790 URL: https://issues.apache.org/jira/browse/SPARK-1790 Project: Spark Issue Type: Improvement Components: EC2 Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Matei Zaharia Assignee: Sujeet Varakhedi Labels: Starter Fix For: 0.9.2, 1.0.1, 1.1.0 These were recently added by Amazon as a cheaper high-memory option -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2011) Eliminate duplicate join in Pregel
[ https://issues.apache.org/jira/browse/SPARK-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018327#comment-14018327 ] Tim Weninger commented on SPARK-2011: - I also think that there is a memory leak related to this. When I run a Pregel script/function. It creates and holds an EdgeRDD (that I can see in the Storage Menu on the WebUI) that is not released. So, after 15 pregel iterations, I'll have 15 extra EdgeRDDs taking up space. Is this related or should I make a new bug report (affects 1.0.1 snapshot) TW Eliminate duplicate join in Pregel -- Key: SPARK-2011 URL: https://issues.apache.org/jira/browse/SPARK-2011 Project: Spark Issue Type: Improvement Components: GraphX Reporter: Ankur Dave Assignee: Ankur Dave In the iteration loop, Pregel currently performs an innerJoin to apply messages to vertices followed by an outerJoinVertices to join the resulting subset of vertices back to the graph. These two operations could be merged into a single call to joinVertices, which should be reimplemented in a more efficient manner. This would allow us to examine only the vertices that received messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018328#comment-14018328 ] Shuo Xiang commented on SPARK-1977: --- Hi [~neville], I just run the MovieLens example on my YARN cluster (hadoop-2.0.5-alpha) with kryo enabled and it works. I use the following command: bin/spark-submit --master yarn-cluster --class org.apache.spark.examples.mllib.MovieLensALS --num-executors ** --driver-memory ** --executor-memory ** --executor-cores 1 spark-examples-1.0.0-hadoop2.0.5-alpha.jar --rank 5 --numIterations 20 --lambda 1.0 --kryo /path/to/sample_movielens_data.txt mutable.BitSet in ALS not serializable with KryoSerializer -- Key: SPARK-1977 URL: https://issues.apache.org/jira/browse/SPARK-1977 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Neville Li Priority: Minor OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member. KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet. Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill. {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet Serialization trace: shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
[jira] [Created] (SPARK-2025) EdgeRDD persists after pregel iteration
Tim Weninger created SPARK-2025: --- Summary: EdgeRDD persists after pregel iteration Key: SPARK-2025 URL: https://issues.apache.org/jira/browse/SPARK-2025 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0, 1.0.1 Environment: RHEL6 on local and on spark cluster Reporter: Tim Weninger Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2025) EdgeRDD persists after pregel iteration
[ https://issues.apache.org/jira/browse/SPARK-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Weninger updated SPARK-2025: Description: Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. Update - Dave Ankur says, in comments on SPARK-2011 - {quote} ... is a bug introduced by https://github.com/apache/spark/pull/497. It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.{quote} was: Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. EdgeRDD persists after pregel iteration --- Key: SPARK-2025 URL: https://issues.apache.org/jira/browse/SPARK-2025 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0, 1.0.1 Environment: RHEL6 on local and on spark cluster Reporter: Tim Weninger Labels: Pregel Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. Update - Dave Ankur says, in comments on SPARK-2011 - {quote} ... is a bug introduced by https://github.com/apache/spark/pull/497. It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2026) Maven hadoop* Profiles Should Set the expected Hadoop Version.
Bernardo Gomez Palacio created SPARK-2026: - Summary: Maven hadoop* Profiles Should Set the expected Hadoop Version. Key: SPARK-2026 URL: https://issues.apache.org/jira/browse/SPARK-2026 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 1.0.0 Reporter: Bernardo Gomez Palacio The Maven Profiles that refer to _hadoopX_, e.g. hadoop2.4, should set the expected _hadoop.version_. e.g. {code} profile idhadoop-2.4/id properties protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version /properties /profile {code} as it is suggested {code} profile idhadoop-2.4/id properties hadoop.version2.4.0/hadoop.version yarn.version${hadoop.version}/yarn.version protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version /properties /profile {code} Builds can still define the -Dhadoop.version option but this will correctly default the Hadoop Version to the one that is expected according the profile that is selected. e.g. {code} $ mvn -P hadoop-2.4,yarn clean compile {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2027) spark-ec2 puts Hadoop's log4j ahead of Spark's in classpath
Aaron Davidson created SPARK-2027: - Summary: spark-ec2 puts Hadoop's log4j ahead of Spark's in classpath Key: SPARK-2027 URL: https://issues.apache.org/jira/browse/SPARK-2027 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Aaron Davidson -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-2025) EdgeRDD persists after pregel iteration
[ https://issues.apache.org/jira/browse/SPARK-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave reassigned SPARK-2025: - Assignee: Ankur Dave EdgeRDD persists after pregel iteration --- Key: SPARK-2025 URL: https://issues.apache.org/jira/browse/SPARK-2025 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0, 1.0.1 Environment: RHEL6 on local and on spark cluster Reporter: Tim Weninger Assignee: Ankur Dave Labels: Pregel Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. Update - Dave Ankur says, in comments on SPARK-2011 - {quote} ... is a bug introduced by https://github.com/apache/spark/pull/497. It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2025) EdgeRDD persists after pregel iteration
[ https://issues.apache.org/jira/browse/SPARK-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018389#comment-14018389 ] Tim Weninger commented on SPARK-2025: - adding {{prevG.edges.unpersist(blocking=false)}} after line 152 in Pregel.scala fixes the issue EdgeRDD persists after pregel iteration --- Key: SPARK-2025 URL: https://issues.apache.org/jira/browse/SPARK-2025 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0, 1.0.1 Environment: RHEL6 on local and on spark cluster Reporter: Tim Weninger Assignee: Ankur Dave Labels: Pregel Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. Update - Dave Ankur says, in comments on SPARK-2011 - {quote} ... is a bug introduced by https://github.com/apache/spark/pull/497. It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018390#comment-14018390 ] Neville Li commented on SPARK-1977: --- Our YARN cluster runs 2.2.0. We built spark-assembly and spark-examples jars with 1.0.0 release source and the bundled make_distribution.sh. And here's my command: {code} spark-submit --master yarn-cluster --class org.apache.spark.examples.mllib.MovieLensALS --num-executors 2 --executor-memory 2g --driver-memory 2g dist/lib/spark-examples-1.0.0-hadoop2.2.0.jar --kryo --implicitPrefs sample_movielens_data.txt {code} Here's a complete list of classpath from the environment tab. {code} /etc/hadoop/conf /usr/lib/hadoop-hdfs/hadoop-hdfs-2.2.0.2.0.6.0-76-tests.jar /usr/lib/hadoop-hdfs/hadoop-hdfs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-hdfs/lib/asm-3.2.jar /usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar /usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar /usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.13.jar /usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar /usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar /usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar /usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar /usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar /usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar /usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar /usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar /usr/lib/hadoop-hdfs/lib/jersey-core-1.9.jar /usr/lib/hadoop-hdfs/lib/jersey-server-1.9.jar /usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar /usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar /usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar /usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar /usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar /usr/lib/hadoop-hdfs/lib/netty-3.6.2.Final.jar /usr/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar /usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar /usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar /usr/lib/hadoop-mapreduce/hadoop-archives-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-datajoin-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-distcp-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-extras-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-gridmix-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-hs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76-tests.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-rumen-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar /usr/lib/hadoop-mapreduce/lib/asm-3.2.jar /usr/lib/hadoop-mapreduce/lib/avro-1.7.4.jar /usr/lib/hadoop-mapreduce/lib/commons-compress-1.4.1.jar /usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar /usr/lib/hadoop-mapreduce/lib/guice-3.0.jar /usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar /usr/lib/hadoop-mapreduce/lib/hamcrest-core-1.1.jar /usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar /usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar /usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar /usr/lib/hadoop-mapreduce/lib/jersey-core-1.9.jar /usr/lib/hadoop-mapreduce/lib/jersey-guice-1.9.jar /usr/lib/hadoop-mapreduce/lib/jersey-server-1.9.jar /usr/lib/hadoop-mapreduce/lib/junit-4.10.jar /usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar /usr/lib/hadoop-mapreduce/lib/netty-3.6.2.Final.jar /usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar /usr/lib/hadoop-mapreduce/lib/protobuf-java-2.5.0.jar /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.4.1.jar /usr/lib/hadoop-mapreduce/lib/xz-1.0.jar /usr/lib/hadoop-yarn/hadoop-yarn-api-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-client-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-nodemanager-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-tests-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-site-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar
[jira] [Commented] (SPARK-2025) EdgeRDD persists after pregel iteration
[ https://issues.apache.org/jira/browse/SPARK-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018391#comment-14018391 ] Tim Weninger commented on SPARK-2025: - I'll leave it to you to make the bug fix. You seem to be a pro. EdgeRDD persists after pregel iteration --- Key: SPARK-2025 URL: https://issues.apache.org/jira/browse/SPARK-2025 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0, 1.0.1 Environment: RHEL6 on local and on spark cluster Reporter: Tim Weninger Assignee: Ankur Dave Labels: Pregel Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. Update - Dave Ankur says, in comments on SPARK-2011 - {quote} ... is a bug introduced by https://github.com/apache/spark/pull/497. It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2025) EdgeRDD persists after pregel iteration
[ https://issues.apache.org/jira/browse/SPARK-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018402#comment-14018402 ] Ankur Dave commented on SPARK-2025: --- Proposed fix: https://github.com/apache/spark/pull/972 EdgeRDD persists after pregel iteration --- Key: SPARK-2025 URL: https://issues.apache.org/jira/browse/SPARK-2025 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0, 1.0.1 Environment: RHEL6 on local and on spark cluster Reporter: Tim Weninger Assignee: Ankur Dave Labels: Pregel Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage. This is like a memory leak that affects in the Pregel function. For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD. At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted. i _think_ this is from the replicateVertex function, but I cannot be sure. Update - Dave Ankur says, in comments on SPARK-2011 - {quote} ... is a bug introduced by https://github.com/apache/spark/pull/497. It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1988) Enable storing edges out-of-core
[ https://issues.apache.org/jira/browse/SPARK-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-1988: -- Priority: Minor (was: Major) Enable storing edges out-of-core Key: SPARK-1988 URL: https://issues.apache.org/jira/browse/SPARK-1988 Project: Spark Issue Type: Improvement Components: GraphX Reporter: Ankur Dave Assignee: Ankur Dave Priority: Minor A graph's edges are usually the largest component of the graph, and a cluster may not have enough memory to hold them. For example, a graph with 20 billion edges requires at least 400 GB of memory, because each edge takes 20 bytes. GraphX only ever accesses the edges using full table scans or cluster scans using the clustered index on source vertex ID. The edges are therefore amenable to being stored on disk. EdgePartition should provide the option of storing edges on disk transparently and streaming through them as needed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2018) Big-Endian (IBM Power7) Spark Serialization issue
[ https://issues.apache.org/jira/browse/SPARK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018409#comment-14018409 ] Yanjie Gao commented on SPARK-2018: --- Thanks for your quick reply! I believe they use the same jvm Do you think this may have another reason? How can I debug it to find the reason ? Best regards ! Yanjie Gao here is the ps -aux |grep java log test1 349 0.5 3.7 2945280 195456 pts/7 Sl 02:30 0:22 /opt/ibm/java-ppc64-70//bin/java -cp /home/test1/spark-1.0.0-bin-hadoop2/lib::/home/test1/src/spark-1.0.0-bin-hadoop2/conf:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/ -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 9.114.34.69 --port 7077 --webui-port 8080 test1 492 0.4 3.7 2946496 194432 ? Sl 02:30 0:19 /opt/ibm/java-ppc64-70//bin/java -cp /home/test1/spark-1.0.0-bin-hadoop2/lib::/home/test1/src/spark-1.0.0-bin-hadoop2/conf:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/ -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://9.114.34.69:7077 test1 3160 0.0 0.0 104832 2816 pts/10 S+ 03:40 0:00 grep java test113163 0.1 2.7 1631232 144256 ? Sl Jun02 2:00 /opt/ibm/java-ppc64-70/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0 -Dhadoop.id.str=test1 -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/test1/src/hadoop-2.3.0-cdh5.0.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0/logs -Dhadoop.log.file=hadoop-test1-namenode-p7hvs7br16.log -Dhadoop.home.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0 -Dhadoop.id.str=test1 -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/test1/src/hadoop-2.3.0-cdh5.0.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode test113328 0.0 2.1 1636160 113152 ? Sl Jun02 1:39 /opt/ibm/java-ppc64-70/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0 -Dhadoop.id.str=test1 -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/test1/src/hadoop-2.3.0-cdh5.0.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0/logs -Dhadoop.log.file=hadoop-test1-datanode-p7hvs7br16.log -Dhadoop.home.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0 -Dhadoop.id.str=test1 -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/test1/src/hadoop-2.3.0-cdh5.0.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode test113474 0.0 2.1 1624960 113408 ? Sl Jun02 0:35 /opt/ibm/java-ppc64-70/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/test1/src/hadoop-2.3.0-cdh5.0.0 -Dhadoop.id.str=test1 -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/test1/src/hadoop-2.3.0-cdh5.0.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true
[jira] [Created] (SPARK-2028) Users of HadoopRDD cannot access the partition InputSplits
Aaron Davidson created SPARK-2028: - Summary: Users of HadoopRDD cannot access the partition InputSplits Key: SPARK-2028 URL: https://issues.apache.org/jira/browse/SPARK-2028 Project: Spark Issue Type: Bug Reporter: Aaron Davidson Assignee: Aaron Davidson If a user creates a HadoopRDD (e.g., via textFile), there is no way to find out which file it came from, though this information is contained in the InputSplit within the RDD. We should find a way to expose this publicly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2028) Users of HadoopRDD cannot access the partition InputSplits
[ https://issues.apache.org/jira/browse/SPARK-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2028: --- Issue Type: New Feature (was: Bug) Users of HadoopRDD cannot access the partition InputSplits -- Key: SPARK-2028 URL: https://issues.apache.org/jira/browse/SPARK-2028 Project: Spark Issue Type: New Feature Reporter: Aaron Davidson Assignee: Aaron Davidson If a user creates a HadoopRDD (e.g., via textFile), there is no way to find out which file it came from, though this information is contained in the InputSplit within the RDD. We should find a way to expose this publicly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2028) Let users of HadoopRDD access the partition InputSplits
[ https://issues.apache.org/jira/browse/SPARK-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018474#comment-14018474 ] Patrick Wendell commented on SPARK-2028: I wantonly changed this from a Bug to a New Feature. We just never supported this before, but it would be nice to support in the future. Let users of HadoopRDD access the partition InputSplits --- Key: SPARK-2028 URL: https://issues.apache.org/jira/browse/SPARK-2028 Project: Spark Issue Type: New Feature Reporter: Aaron Davidson Assignee: Aaron Davidson If a user creates a HadoopRDD (e.g., via textFile), there is no way to find out which file it came from, though this information is contained in the InputSplit within the RDD. We should find a way to expose this publicly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2027) spark-ec2 puts Hadoop's log4j ahead of Spark's in classpath
[ https://issues.apache.org/jira/browse/SPARK-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2027: --- Component/s: EC2 spark-ec2 puts Hadoop's log4j ahead of Spark's in classpath --- Key: SPARK-2027 URL: https://issues.apache.org/jira/browse/SPARK-2027 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Aaron Davidson -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2028) Let users of HadoopRDD access the partition InputSplits
[ https://issues.apache.org/jira/browse/SPARK-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2028: --- Summary: Let users of HadoopRDD access the partition InputSplits (was: Users of HadoopRDD cannot access the partition InputSplits) Let users of HadoopRDD access the partition InputSplits --- Key: SPARK-2028 URL: https://issues.apache.org/jira/browse/SPARK-2028 Project: Spark Issue Type: New Feature Reporter: Aaron Davidson Assignee: Aaron Davidson If a user creates a HadoopRDD (e.g., via textFile), there is no way to find out which file it came from, though this information is contained in the InputSplit within the RDD. We should find a way to expose this publicly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2024) Add saveAsSequenceFile to PySpark
[ https://issues.apache.org/jira/browse/SPARK-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018485#comment-14018485 ] Kan Zhang commented on SPARK-2024: -- You meant SPARK-1416? Add saveAsSequenceFile to PySpark - Key: SPARK-2024 URL: https://issues.apache.org/jira/browse/SPARK-2024 Project: Spark Issue Type: New Feature Components: PySpark Reporter: Matei Zaharia After SPARK-1414 we will be able to read SequenceFiles from Python, but it remains to write them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2029) Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT.
Takuya Ueshin created SPARK-2029: Summary: Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. Key: SPARK-2029 URL: https://issues.apache.org/jira/browse/SPARK-2029 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2029) Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT.
[ https://issues.apache.org/jira/browse/SPARK-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018492#comment-14018492 ] Takuya Ueshin commented on SPARK-2029: -- PRed: https://github.com/apache/spark/pull/974 Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. --- Key: SPARK-2029 URL: https://issues.apache.org/jira/browse/SPARK-2029 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2030) Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT.
Takuya Ueshin created SPARK-2030: Summary: Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT. Key: SPARK-2030 URL: https://issues.apache.org/jira/browse/SPARK-2030 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2030) Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT.
[ https://issues.apache.org/jira/browse/SPARK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018493#comment-14018493 ] Takuya Ueshin commented on SPARK-2030: -- PRed: https://github.com/apache/spark/pull/975 Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT. - Key: SPARK-2030 URL: https://issues.apache.org/jira/browse/SPARK-2030 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.2#6252)