[jira] [Commented] (SPARK-2026) Maven hadoop* Profiles Should Set the expected Hadoop Version.
[ https://issues.apache.org/jira/browse/SPARK-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018512#comment-14018512 ] Sean Owen commented on SPARK-2026: -- A few people have mentioned and asked for this, especially as it helps the build work cleanly in IntelliJ. FWIW I would like this change too. Do you have a PR? Maven hadoop* Profiles Should Set the expected Hadoop Version. Key: SPARK-2026 URL: https://issues.apache.org/jira/browse/SPARK-2026 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 1.0.0 Reporter: Bernardo Gomez Palacio The Maven Profiles that refer to _hadoopX_, e.g. hadoop2.4, should set the expected _hadoop.version_. e.g. {code} profile idhadoop-2.4/id properties protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version /properties /profile {code} as it is suggested {code} profile idhadoop-2.4/id properties hadoop.version2.4.0/hadoop.version yarn.version${hadoop.version}/yarn.version protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version /properties /profile {code} Builds can still define the -Dhadoop.version option but this will correctly default the Hadoop Version to the one that is expected according the profile that is selected. e.g. {code} $ mvn -P hadoop-2.4,yarn clean compile {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2031) DAGScheduler supports pluggable clock
Chen Chao created SPARK-2031: Summary: DAGScheduler supports pluggable clock Key: SPARK-2031 URL: https://issues.apache.org/jira/browse/SPARK-2031 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0, 0.9.1 Reporter: Chen Chao DAGScheduler supports pluggable clock like what TaskSetManager does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2031) DAGScheduler supports pluggable clock
[ https://issues.apache.org/jira/browse/SPARK-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018533#comment-14018533 ] Chen Chao commented on SPARK-2031: -- PR https://github.com/apache/spark/pull/976 DAGScheduler supports pluggable clock - Key: SPARK-2031 URL: https://issues.apache.org/jira/browse/SPARK-2031 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 0.9.1, 1.0.0 Reporter: Chen Chao DAGScheduler supports pluggable clock like what TaskSetManager does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling
Matei Zaharia created SPARK-2032: Summary: Add an RDD.samplePartitions method for partition-level sampling Key: SPARK-2032 URL: https://issues.apache.org/jira/browse/SPARK-2032 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia This would allow us to sample a percent of the partitions and not have to materialize all of them. It's less uniform but much faster and may be useful for quickly exploring data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling
[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2032: - Priority: Minor (was: Major) Add an RDD.samplePartitions method for partition-level sampling --- Key: SPARK-2032 URL: https://issues.apache.org/jira/browse/SPARK-2032 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor This would allow us to sample a percent of the partitions and not have to materialize all of them. It's less uniform but much faster and may be useful for quickly exploring data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1228) confusion matrix
[ https://issues.apache.org/jira/browse/SPARK-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng closed SPARK-1228. Resolution: Implemented Fix Version/s: 1.0.0 Assignee: Xiangrui Meng Confusion matrix was added in v1.0 as part of binary classification model evaluation. confusion matrix Key: SPARK-1228 URL: https://issues.apache.org/jira/browse/SPARK-1228 Project: Spark Issue Type: Story Components: MLlib Reporter: Arshak Navruzyan Assignee: Xiangrui Meng Labels: classification Fix For: 1.0.0 utility that print confusion matrix for multi-class classification including precision and recall -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2033) Automatically cleanup checkpoint
Guoqiang Li created SPARK-2033: -- Summary: Automatically cleanup checkpoint Key: SPARK-2033 URL: https://issues.apache.org/jira/browse/SPARK-2033 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Guoqiang Li Now we use ContextCleaner asynchronous cleanup RDD, shuffle, and broadcast. But no checkpoint. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-2019: --- Affects Version/s: (was: 0.9.1) 0.9.0 Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam Priority: Critical We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-2019: --- Fix Version/s: (was: 0.9.2) Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam Priority: Critical We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2033) Automatically cleanup checkpoint
[ https://issues.apache.org/jira/browse/SPARK-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018621#comment-14018621 ] Guoqiang Li commented on SPARK-2033: The PR: https://github.com/apache/spark/pull/855 Automatically cleanup checkpoint - Key: SPARK-2033 URL: https://issues.apache.org/jira/browse/SPARK-2033 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Guoqiang Li Assignee: Guoqiang Li Now we use ContextCleaner asynchronous cleanup RDD, shuffle, and broadcast. But no checkpoint. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018048#comment-14018048 ] sam edited comment on SPARK-2019 at 6/5/14 9:47 AM: Sorry. Its -0.9.1- 0.9.0 was (Author: sams): Sorry. Its 0.9.1 Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam Priority: Critical We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-2019: --- Priority: Major (was: Critical) Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018756#comment-14018756 ] sam commented on SPARK-2019: [~srowen] so when will CDH package up and distribute spark 1.0.0?? Currently they only distribute 0.9.0. Thanks. We seem to be hitting a few bugs with the 0.9.0 - particularly we know that the s3 jets problem is 0.9.0 specific and crops it's head when we add s3 creds to our hdfs-site.xml. Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018760#comment-14018760 ] Sean Owen commented on SPARK-2019: -- I believe that's coming with 5.1 but I don't know when that is scheduled. We can talk about issues like this offline -- really your best bet is support anyway. Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2035) Make a stage's call stack available on the UI
[ https://issues.apache.org/jira/browse/SPARK-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Darabos updated SPARK-2035: -- Attachment: example-html.tgz I've sent a pull request (https://github.com/apache/spark/pull/981), and here is an example of the resulting HTML. It is the worst possible example, because I used `spark-shell`, but it's hopefully enough to demo the idea. Make a stage's call stack available on the UI - Key: SPARK-2035 URL: https://issues.apache.org/jira/browse/SPARK-2035 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Daniel Darabos Priority: Minor Attachments: example-html.tgz Currently the stage table displays the file name and line number that is the call site that triggered the given stage. This is enormously useful for understanding the execution. But once a project adds utility classes and other indirections, the call site can become less meaningful, because the interesting line is further up the stack. An idea to fix this is to display the entire call stack that triggered the stage. It would be collapsed by default and could be revealed with a click. I have started working on this. It is a good way to learn about how the RDD interface ties into the UI. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-2024) Add saveAsSequenceFile to PySpark
[ https://issues.apache.org/jira/browse/SPARK-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-2024: - Comment: was deleted (was: You meant SPARK-1416?) Add saveAsSequenceFile to PySpark - Key: SPARK-2024 URL: https://issues.apache.org/jira/browse/SPARK-2024 Project: Spark Issue Type: New Feature Components: PySpark Reporter: Matei Zaharia After SPARK-1416 we will be able to read SequenceFiles from Python, but it remains to write them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2036) CaseConversionExpression should check if the evaluated value is null.
Takuya Ueshin created SPARK-2036: Summary: CaseConversionExpression should check if the evaluated value is null. Key: SPARK-2036 URL: https://issues.apache.org/jira/browse/SPARK-2036 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin {{CaseConversionExpression}} should check if the evaluated value is {{null}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2036) CaseConversionExpression should check if the evaluated value is null.
[ https://issues.apache.org/jira/browse/SPARK-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018928#comment-14018928 ] Takuya Ueshin commented on SPARK-2036: -- PRed: https://github.com/apache/spark/pull/982 CaseConversionExpression should check if the evaluated value is null. - Key: SPARK-2036 URL: https://issues.apache.org/jira/browse/SPARK-2036 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin {{CaseConversionExpression}} should check if the evaluated value is {{null}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2037) yarn client mode doesn't support spark.yarn.max.executor.failures
Thomas Graves created SPARK-2037: Summary: yarn client mode doesn't support spark.yarn.max.executor.failures Key: SPARK-2037 URL: https://issues.apache.org/jira/browse/SPARK-2037 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Thomas Graves yarn client mode doesn't support the config spark.yarn.max.executor.failures. We should investigate if we need it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2019: --- Description: We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22 We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks was: We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22Hey @sam031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22 We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2029) Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT.
[ https://issues.apache.org/jira/browse/SPARK-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2029. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 974 [https://github.com/apache/spark/pull/974] Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. --- Key: SPARK-2029 URL: https://issues.apache.org/jira/browse/SPARK-2029 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin Assignee: Takuya Ueshin Fix For: 1.1.0 Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2030) Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT.
[ https://issues.apache.org/jira/browse/SPARK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2030. Resolution: Fixed Fix Version/s: 1.0.1 Issue resolved by pull request 975 [https://github.com/apache/spark/pull/975] Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT. - Key: SPARK-2030 URL: https://issues.apache.org/jira/browse/SPARK-2030 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin Fix For: 1.0.1 Bump SparkBuild.scala version number of branch-1.0 to 1.0.1-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1677) Allow users to avoid Hadoop output checks if desired
[ https://issues.apache.org/jira/browse/SPARK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1677. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Issue resolved by pull request 947 [https://github.com/apache/spark/pull/947] Allow users to avoid Hadoop output checks if desired Key: SPARK-1677 URL: https://issues.apache.org/jira/browse/SPARK-1677 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Fix For: 1.0.1, 1.1.0 For compatibility with older versions of Spark it would be nice to have an option `spark.hadoop.validateOutputSpecs` (default true) and a description If set to true, validates the output specification used in saveAsHadoopFile and other variants. This can be disabled to silence exceptions due to pre-existing output directories. This would just wrap the checking done in this PR: https://issues.apache.org/jira/browse/SPARK-1100 https://github.com/apache/spark/pull/11 By first checking the spark conf. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2039) Run hadoop output checks for all formats
Patrick Wendell created SPARK-2039: -- Summary: Run hadoop output checks for all formats Key: SPARK-2039 URL: https://issues.apache.org/jira/browse/SPARK-2039 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Now that SPARK-1677 allows users to disable output checks, we should just run them for all types of output formats. I'm not sure why we didn't do this originally but it might have been out of defensiveness since we weren't sure what all implementations did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2040) Support cross-building with Scala 2.11
Patrick Wendell created SPARK-2040: -- Summary: Support cross-building with Scala 2.11 Key: SPARK-2040 URL: https://issues.apache.org/jira/browse/SPARK-2040 Project: Spark Issue Type: Improvement Components: Build, Spark Core Reporter: Patrick Wendell Assignee: Prashant Sharma Since Scala 2.10/2.11 are source compatible, we should be able to cross build for both versions. From what I understand there are basically three things we need to figure out: 1. Have a two versions of our dependency graph, one that uses 2.11 dependencies and the other that uses 2.10 dependencies. 2. Figure out how to publish different poms for 2.10 and 2.11. I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't really well supported by Maven since published pom's aren't generated dynamically. But we can probably script around it to make it work. I've done some initial sanity checks with a simple build here: https://github.com/pwendell/scala-maven-crossbuild -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1812) Support cross-building with Scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1812: --- Description: Since Scala 2.10/2.11 are source compatible, we should be able to cross build for both versions. From what I understand there are basically three things we need to figure out: 1. Have a two versions of our dependency graph, one that uses 2.11 dependencies and the other that uses 2.10 dependencies. 2. Figure out how to publish different poms for 2.10 and 2.11. I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't really well supported by Maven since published pom's aren't generated dynamically. But we can probably script around it to make it work. I've done some initial sanity checks with a simple build here: https://github.com/pwendell/scala-maven-crossbuild was:We should cross-build for this in addition to 2.10. Support cross-building with Scala 2.11 -- Key: SPARK-1812 URL: https://issues.apache.org/jira/browse/SPARK-1812 Project: Spark Issue Type: New Feature Components: Build, Spark Core Reporter: Matei Zaharia Assignee: Prashant Sharma Since Scala 2.10/2.11 are source compatible, we should be able to cross build for both versions. From what I understand there are basically three things we need to figure out: 1. Have a two versions of our dependency graph, one that uses 2.11 dependencies and the other that uses 2.10 dependencies. 2. Figure out how to publish different poms for 2.10 and 2.11. I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't really well supported by Maven since published pom's aren't generated dynamically. But we can probably script around it to make it work. I've done some initial sanity checks with a simple build here: https://github.com/pwendell/scala-maven-crossbuild -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1812) Support Scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1812: --- Assignee: Prashant Sharma Support Scala 2.11 -- Key: SPARK-1812 URL: https://issues.apache.org/jira/browse/SPARK-1812 Project: Spark Issue Type: New Feature Components: Build, Spark Core Reporter: Matei Zaharia Assignee: Prashant Sharma We should cross-build for this in addition to 2.10. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1812) Support cross-building with Scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1812: --- Summary: Support cross-building with Scala 2.11 (was: Support Scala 2.11) Support cross-building with Scala 2.11 -- Key: SPARK-1812 URL: https://issues.apache.org/jira/browse/SPARK-1812 Project: Spark Issue Type: New Feature Components: Build, Spark Core Reporter: Matei Zaharia Assignee: Prashant Sharma We should cross-build for this in addition to 2.10. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1749) DAGScheduler supervisor strategy broken with Mesos
[ https://issues.apache.org/jira/browse/SPARK-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1749: --- Target Version/s: 1.0.1, 1.1.0 (was: 1.0.1) DAGScheduler supervisor strategy broken with Mesos -- Key: SPARK-1749 URL: https://issues.apache.org/jira/browse/SPARK-1749 Project: Spark Issue Type: Bug Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Assignee: Mark Hamstra Priority: Blocker Labels: mesos, scheduler, scheduling Any bad Python code will trigger this bug, for example `sc.parallelize(range(100)).map(lambda n: undefined_variable * 2).collect()` will cause a `undefined_variable isn't defined`, which will cause spark to try to kill the task, resulting in the following stacktrace: java.lang.UnsupportedOperationException at org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:184) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:182) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:182) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:182) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:175) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:175) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1058) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1045) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1045) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1045) at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:998) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1151) at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1147) at akka.actor.SupervisorStrategy.handleFailure(FaultHandling.scala:295) at akka.actor.dungeon.FaultHandling$class.handleFailure(FaultHandling.scala:253) at akka.actor.ActorCell.handleFailure(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:423) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.run(Mailbox.scala:218) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This is because killTask isn't implemented for the MesosSchedulerBackend. I assume this isn't pyspark-specific, as there will be other instances where you might want to kill the task -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2041) Exception when querying when tableName == columnName
Michael Armbrust created SPARK-2041: --- Summary: Exception when querying when tableName == columnName Key: SPARK-2041 URL: https://issues.apache.org/jira/browse/SPARK-2041 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust {code} [info] java.util.NoSuchElementException: next on empty iterator [info] at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) [info] at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) [info] at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64) [info] at scala.collection.IterableLike$class.head(IterableLike.scala:91) [info] at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:108) [info] at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120) [info] at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:108) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:68) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:65) [info] at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) [info] at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) [info] at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:65) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$3$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:100) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$3$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:97) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) [info] at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1(QueryPlan.scala:51) [info] at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1$$anonfun$apply$1.apply(QueryPlan.scala:65) [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2010) Support for nested data in PySpark SQL
[ https://issues.apache.org/jira/browse/SPARK-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2010: Assignee: Kan Zhang (was: Michael Armbrust) Support for nested data in PySpark SQL -- Key: SPARK-2010 URL: https://issues.apache.org/jira/browse/SPARK-2010 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Kan Zhang Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-2010) Support for nested data in PySpark SQL
[ https://issues.apache.org/jira/browse/SPARK-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-2010: --- Assignee: Michael Armbrust Support for nested data in PySpark SQL -- Key: SPARK-2010 URL: https://issues.apache.org/jira/browse/SPARK-2010 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2026) Maven hadoop* Profiles Should Set the expected Hadoop Version.
[ https://issues.apache.org/jira/browse/SPARK-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019279#comment-14019279 ] Bernardo Gomez Palacio commented on SPARK-2026: --- I'll submit a PR [~srowen]. I am not using Hadoop 0.23 but my guess is that using 0.23.10 as default will suffice. Maven hadoop* Profiles Should Set the expected Hadoop Version. Key: SPARK-2026 URL: https://issues.apache.org/jira/browse/SPARK-2026 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 1.0.0 Reporter: Bernardo Gomez Palacio The Maven Profiles that refer to _hadoopX_, e.g. hadoop2.4, should set the expected _hadoop.version_. e.g. {code} profile idhadoop-2.4/id properties protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version /properties /profile {code} as it is suggested {code} profile idhadoop-2.4/id properties hadoop.version2.4.0/hadoop.version yarn.version${hadoop.version}/yarn.version protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version /properties /profile {code} Builds can still define the -Dhadoop.version option but this will correctly default the Hadoop Version to the one that is expected according the profile that is selected. e.g. {code} $ mvn -P hadoop-2.4,yarn clean compile {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2042) Take triggers unneeded shuffle.
Michael Armbrust created SPARK-2042: --- Summary: Take triggers unneeded shuffle. Key: SPARK-2042 URL: https://issues.apache.org/jira/browse/SPARK-2042 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust This query really shouldn't trigger a shuffle: {code} sql(SELECT * FROM src LIMIT 10).take(5) {code} One fix would be to make the following changes: * Fix take to insert a logical limit and then collect() * Add a rule for collapsing adjacent limits -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-937) Executors that exit cleanly should not have KILLED status
[ https://issues.apache.org/jira/browse/SPARK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-937: Comment: was deleted (was: Hi Aaron, are you still working on this one? If not, could you assign it to me? I have a PR for SPARK-1118 (closed as a duplicate of this JIRA) that I could re-sumit for this one. If you are still working on it or plan to, feel free to pick whatever might be useful to you https://github.com/apache/spark/pull/306) Executors that exit cleanly should not have KILLED status - Key: SPARK-937 URL: https://issues.apache.org/jira/browse/SPARK-937 Project: Spark Issue Type: Improvement Affects Versions: 0.7.3 Reporter: Aaron Davidson Assignee: Kan Zhang Priority: Critical Fix For: 1.1.0 This is an unintuitive and overloaded status message when Executors are killed during normal termination of an application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-937) Executors that exit cleanly should not have KILLED status
[ https://issues.apache.org/jira/browse/SPARK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019309#comment-14019309 ] Kan Zhang commented on SPARK-937: - PR: https://github.com/apache/spark/pull/306 Executors that exit cleanly should not have KILLED status - Key: SPARK-937 URL: https://issues.apache.org/jira/browse/SPARK-937 Project: Spark Issue Type: Improvement Affects Versions: 0.7.3 Reporter: Aaron Davidson Assignee: Kan Zhang Priority: Critical Fix For: 1.1.0 This is an unintuitive and overloaded status message when Executors are killed during normal termination of an application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2043) ExternalAppendOnlyMap doesn't always find matching keys
Matei Zaharia created SPARK-2043: Summary: ExternalAppendOnlyMap doesn't always find matching keys Key: SPARK-2043 URL: https://issues.apache.org/jira/browse/SPARK-2043 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 0.9.1, 0.9.0 Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Blocker The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019394#comment-14019394 ] Mridul Muralidharan commented on SPARK-2017: Currently, for our jobs, I run with spark.ui.retainedStages=3 (so that there is some visibility into past stages) : this is to prevent OOM's in the master when number of tasks per stage is not low (50k for example is not very high imo) The stage details UI becomes very sluggish to pretty much unresponsive for our tasks where tasks 30k ... though that might also be a browser issue (firefox/chrome) ? web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2045) Sort-based shuffle implementation
Matei Zaharia created SPARK-2045: Summary: Sort-based shuffle implementation Key: SPARK-2045 URL: https://issues.apache.org/jira/browse/SPARK-2045 Project: Spark Issue Type: New Feature Reporter: Matei Zaharia Building on the pluggability in SPARK-2044, a sort-based shuffle implementation that takes advantage of an Ordering for keys (or just sorts by hashcode for keys that don't have it) would likely improve performance and memory usage in very large shuffles. Our current hash-based shuffle needs an open file for each reduce task, which can fill up a lot of memory for compression buffers and cause inefficient IO. This would avoid both of those issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2011) Eliminate duplicate join in Pregel
[ https://issues.apache.org/jira/browse/SPARK-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-2011: -- Priority: Minor (was: Major) Eliminate duplicate join in Pregel -- Key: SPARK-2011 URL: https://issues.apache.org/jira/browse/SPARK-2011 Project: Spark Issue Type: Improvement Components: GraphX Reporter: Ankur Dave Assignee: Ankur Dave Priority: Minor In the iteration loop, Pregel currently performs an innerJoin to apply messages to vertices followed by an outerJoinVertices to join the resulting subset of vertices back to the graph. These two operations could be merged into a single call to joinVertices, which should be reimplemented in a more efficient manner. This would allow us to examine only the vertices that received messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019449#comment-14019449 ] Patrick Wendell commented on SPARK-2019: Hey @sams - I'm going to temporarily close this until you get a bit more information. But please do re-open this and/or open other JIRA's if you have any specific issues with 0.9.1 or 1.0.0 that you'd like to report. Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
[ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2019. Resolution: Incomplete Spark workers die/disappear when job fails for nearly any reason Key: SPARK-2019 URL: https://issues.apache.org/jira/browse/SPARK-2019 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: sam We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do. Many thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2046) Support config properties that are changeable across tasks/stages within a job
Zongheng Yang created SPARK-2046: Summary: Support config properties that are changeable across tasks/stages within a job Key: SPARK-2046 URL: https://issues.apache.org/jira/browse/SPARK-2046 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Zongheng Yang Suppose an application consists of multiple stages, where some stages contain computation-intensive tasks, and other stages contain less computation-intensive (or otherwise ordinary) tasks. For such job to run efficiently, it might make sense to provide user a function to set spark.task.cpus to a high number right before the computation-intensive stages/tasks are getting generated in the user code, and set the property to a lower number for other stages/tasks. As a first step, supporting this feature across stages instead of the more fine-grained task-level might suffice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2046) Support config properties that are changeable across tasks/stages within a job
[ https://issues.apache.org/jira/browse/SPARK-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019457#comment-14019457 ] Zongheng Yang commented on SPARK-2046: -- [~shivaram] Support config properties that are changeable across tasks/stages within a job -- Key: SPARK-2046 URL: https://issues.apache.org/jira/browse/SPARK-2046 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Zongheng Yang Suppose an application consists of multiple stages, where some stages contain computation-intensive tasks, and other stages contain less computation-intensive (or otherwise ordinary) tasks. For such job to run efficiently, it might make sense to provide user a function to set spark.task.cpus to a high number right before the computation-intensive stages/tasks are getting generated in the user code, and set the property to a lower number for other stages/tasks. As a first step, supporting this feature across stages instead of the more fine-grained task-level might suffice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2046) Support config properties that are changeable across tasks/stages within a job
[ https://issues.apache.org/jira/browse/SPARK-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019458#comment-14019458 ] Shivaram Venkataraman commented on SPARK-2046: -- FWIW I have an older implementation that did this using LocalProperties in SparkContext. https://github.com/shivaram/spark-1/commit/256a34c12d4f3c8ed1a09174f331868a7bf30e11 I haven't tested it in a setting with multiple jobs running at the same time though Support config properties that are changeable across tasks/stages within a job -- Key: SPARK-2046 URL: https://issues.apache.org/jira/browse/SPARK-2046 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Zongheng Yang Suppose an application consists of multiple stages, where some stages contain computation-intensive tasks, and other stages contain less computation-intensive (or otherwise ordinary) tasks. For such job to run efficiently, it might make sense to provide user a function to set spark.task.cpus to a high number right before the computation-intensive stages/tasks are getting generated in the user code, and set the property to a lower number for other stages/tasks. As a first step, supporting this feature across stages instead of the more fine-grained task-level might suffice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator
Matei Zaharia created SPARK-2047: Summary: Use less memory in AppendOnlyMap.destructiveSortedIterator Key: SPARK-2047 URL: https://issues.apache.org/jira/browse/SPARK-2047 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia This method tries to sort an the key-value pairs in the map in-place but ends up allocating a Tuple2 object for each one, which allocates a nontrivial amount of memory (32 or more bytes per entry on a 64-bit JVM). We could instead try to sort the objects in-place within the data array, or allocate an int array with the indices and sort those using a custom comparator. The latter is probably easiest to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator
[ https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2047: - Priority: Minor (was: Major) Use less memory in AppendOnlyMap.destructiveSortedIterator -- Key: SPARK-2047 URL: https://issues.apache.org/jira/browse/SPARK-2047 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor This method tries to sort an the key-value pairs in the map in-place but ends up allocating a Tuple2 object for each one, which allocates a nontrivial amount of memory (32 or more bytes per entry on a 64-bit JVM). We could instead try to sort the objects in-place within the data array, or allocate an int array with the indices and sort those using a custom comparator. The latter is probably easiest to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator
[ https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2047: - Priority: Major (was: Minor) Use less memory in AppendOnlyMap.destructiveSortedIterator -- Key: SPARK-2047 URL: https://issues.apache.org/jira/browse/SPARK-2047 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia This method tries to sort an the key-value pairs in the map in-place but ends up allocating a Tuple2 object for each one, which allocates a nontrivial amount of memory (32 or more bytes per entry on a 64-bit JVM). We could instead try to sort the objects in-place within the data array, or allocate an int array with the indices and sort those using a custom comparator. The latter is probably easiest to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2043) ExternalAppendOnlyMap doesn't always find matching keys
[ https://issues.apache.org/jira/browse/SPARK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019482#comment-14019482 ] Matei Zaharia commented on SPARK-2043: -- https://github.com/apache/spark/pull/986 ExternalAppendOnlyMap doesn't always find matching keys --- Key: SPARK-2043 URL: https://issues.apache.org/jira/browse/SPARK-2043 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Blocker The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2049) avg function in aggregation may cause overflow
egraldlo created SPARK-2049: --- Summary: avg function in aggregation may cause overflow Key: SPARK-2049 URL: https://issues.apache.org/jira/browse/SPARK-2049 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: egraldlo https://github.com/apache/spark/pull/978 Avg of 2147483644 and 2147483646, this will cause overflow in the current implementation. Maybe this is a problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1988) Enable storing edges out-of-core
[ https://issues.apache.org/jira/browse/SPARK-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave resolved SPARK-1988. --- Resolution: Fixed This is mitigated by SPARK-1991, because the user can increase the number of edge partitions so that each edge partition individually fits in memory, then set the storage level of the edges to MEMORY_AND_DISK. Enable storing edges out-of-core Key: SPARK-1988 URL: https://issues.apache.org/jira/browse/SPARK-1988 Project: Spark Issue Type: Improvement Components: GraphX Reporter: Ankur Dave Assignee: Ankur Dave Priority: Minor A graph's edges are usually the largest component of the graph, and a cluster may not have enough memory to hold them. For example, a graph with 20 billion edges requires at least 400 GB of memory, because each edge takes 20 bytes. GraphX only ever accesses the edges using full table scans or cluster scans using the clustered index on source vertex ID. The edges are therefore amenable to being stored on disk. EdgePartition should provide the option of storing edges on disk transparently and streaming through them as needed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2042) Take triggers unneeded shuffle.
[ https://issues.apache.org/jira/browse/SPARK-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2042: Assignee: Sameer Agarwal Take triggers unneeded shuffle. --- Key: SPARK-2042 URL: https://issues.apache.org/jira/browse/SPARK-2042 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust Assignee: Sameer Agarwal This query really shouldn't trigger a shuffle: {code} sql(SELECT * FROM src LIMIT 10).take(5) {code} One fix would be to make the following changes: * Fix take to insert a logical limit and then collect() * Add a rule for collapsing adjacent limits -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019511#comment-14019511 ] Saisai Shao commented on SPARK-2044: Hi Matei, it's great to see you guys have plan on shuffle things. We also implemented pluggable shuffle manager and are planing to submit a PR, I think the basic idea is quite the same, would you mind taking a look at our implementation (https://github.com/jerryshao/apache-spark/tree/shuffle-write-improvement/core/src/main/scala/org/apache/spark/storage/shuffle). Also I'm wondering if I can contribute my efforts to this proposal or have chances to cooperate. Thanks a lot. Pluggable interface for shuffles Key: SPARK-2044 URL: https://issues.apache.org/jira/browse/SPARK-2044 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Matei Zaharia Assignee: Matei Zaharia Attachments: Pluggableshuffleproposal.pdf Given that a lot of the current activity in Spark Core is in shuffles, I wanted to propose factoring out shuffle implementations in a way that will make experimentation easier. Ideally we will converge on one implementation, but for a while, this could also be used to have several implementations coexist. I'm suggesting this because I aware of at least three efforts to look at shuffle (from Yahoo!, Intel and Databricks). Some of the things people are investigating are: * Push-based shuffle where data moves directly from mappers to reducers * Sorting-based instead of hash-based shuffle, to create fewer files (helps a lot with file handles and memory usage on large shuffles) * External spilling within a key * Changing the level of parallelism or even algorithm for downstream stages at runtime based on statistics of the map output (this is a thing we had prototyped in the Shark research project but never merged in core) I've attached a design doc with a proposed interface. It's not too crazy because the interface between shuffles and the rest of the code is already pretty narrow (just some iterators for reading data and a writer interface for writing it). Bigger changes will be needed in the interaction with DAGScheduler and BlockManager for some of the ideas above, but we can handle those separately, and this interface will allow us to experiment with some short-term stuff sooner. If things go well I'd also like to send a sort-based shuffle implementation for 1.1, but we'll see how the timing on that works out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2044) Pluggable interface for shuffles
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019531#comment-14019531 ] Raymond Liu edited comment on SPARK-2044 at 6/6/14 3:11 AM: Hi Matei regarding the changes to block mnager: That will allow ShuffleManagers to reuse a common block manager. However the interface also allows ShuffleManagers to try new approaches. Have you figure out what the interface should looks like? I see the shuffle writter/read interface is generalize to be a Product2, while eventually, the specific shuffle module will interaction with the disk, and go through blockmanager. will you expect it to be Product2 when talk with DiskBlockmanager, or keep the current implementation by using Files where a lot of shortcut involved in various components say shuffle, spill etc? or anything else like a buf , iterator etc? Since we have also have pluggable storage support in mind spark-1733. the actually IO for a store, even diskstore might not always go though FILE interface. so I have this question. was (Author: colorant): Hi Matei regarding the changes to block mnager: That will allow ShuffleManagers to reuse a common block manager. However the interface also allows ShuffleManagers to try new approaches. Have you figure out what the interface should looks like? I see the shuffle writter/read interface is generalize to be a Product2, while eventually, the specific shuffle module will interaction with the disk, and go through blockmanager. will you expect it to be Product2 when talk with DiskBlockmanager, or keep the current implementation by using Files where a lot of shortcut involved in various components say shuffle, spill etc? or anything else like a buf , iterator etc? Since we have also have pluggable storage support in mind spark-1733. the actually IO for a store, even diskstroe might not always go though FILE interface. so I have this question. Pluggable interface for shuffles Key: SPARK-2044 URL: https://issues.apache.org/jira/browse/SPARK-2044 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Matei Zaharia Assignee: Matei Zaharia Attachments: Pluggableshuffleproposal.pdf Given that a lot of the current activity in Spark Core is in shuffles, I wanted to propose factoring out shuffle implementations in a way that will make experimentation easier. Ideally we will converge on one implementation, but for a while, this could also be used to have several implementations coexist. I'm suggesting this because I aware of at least three efforts to look at shuffle (from Yahoo!, Intel and Databricks). Some of the things people are investigating are: * Push-based shuffle where data moves directly from mappers to reducers * Sorting-based instead of hash-based shuffle, to create fewer files (helps a lot with file handles and memory usage on large shuffles) * External spilling within a key * Changing the level of parallelism or even algorithm for downstream stages at runtime based on statistics of the map output (this is a thing we had prototyped in the Shark research project but never merged in core) I've attached a design doc with a proposed interface. It's not too crazy because the interface between shuffles and the rest of the code is already pretty narrow (just some iterators for reading data and a writer interface for writing it). Bigger changes will be needed in the interaction with DAGScheduler and BlockManager for some of the ideas above, but we can handle those separately, and this interface will allow us to experiment with some short-term stuff sooner. If things go well I'd also like to send a sort-based shuffle implementation for 1.1, but we'll see how the timing on that works out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2051) In yarn.ClientBase spark.yarn.dist.* do not work
Guoqiang Li created SPARK-2051: -- Summary: In yarn.ClientBase spark.yarn.dist.* do not work Key: SPARK-2051 URL: https://issues.apache.org/jira/browse/SPARK-2051 Project: Spark Issue Type: Bug Components: YARN Reporter: Guoqiang Li Spark configuration {{conf/spark-defaults.conf}}: {quote} spark.yarn.dist.archives /toona/conf spark.executor.extraClassPath ./conf spark.driver.extraClassPath ./conf {quote} HDFS directory {{hadoop dfs -cat /toona/conf/toona.conf}} : {quote} redis.num=4 {quote} The following command execution fails {code} YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --num-executors 2 --driver-memory 2g --executor-memory 2g --master yarn-cluster --class toona.DeployTest toona-assembly.jar {code} The following is testing the code {code} package toona import com.typesafe.config.Config import com.typesafe.config.ConfigFactory object DeployTest { def main(args: Array[String]) { val conf = ConfigFactory.load(toona.conf) val redisNum = conf.getInt(redis.num) // Here will throw an `ConfigException` exception assert(redisNum == 4) } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2051) In yarn.ClientBase spark.yarn.dist.* do not work
[ https://issues.apache.org/jira/browse/SPARK-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-2051: --- Description: Spark configuration {{conf/spark-defaults.conf}}: {quote} spark.yarn.dist.archives /toona/conf spark.executor.extraClassPath ./conf spark.driver.extraClassPath ./conf {quote} HDFS directory {{hadoop dfs -cat /toona/conf/toona.conf}} : {quote} redis.num=4 {quote} The following command execution fails {code} YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --num-executors 2 --driver-memory 2g --executor-memory 2g --master yarn-cluster --class toona.DeployTest toona-assembly.jar {code} The following is the test code {code} package toona import com.typesafe.config.Config import com.typesafe.config.ConfigFactory object DeployTest { def main(args: Array[String]) { val conf = ConfigFactory.load(toona.conf) val redisNum = conf.getInt(redis.num) // Here will throw an `ConfigException` exception assert(redisNum == 4) } } {code} was: Spark configuration {{conf/spark-defaults.conf}}: {quote} spark.yarn.dist.archives /toona/conf spark.executor.extraClassPath ./conf spark.driver.extraClassPath ./conf {quote} HDFS directory {{hadoop dfs -cat /toona/conf/toona.conf}} : {quote} redis.num=4 {quote} The following command execution fails {code} YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --num-executors 2 --driver-memory 2g --executor-memory 2g --master yarn-cluster --class toona.DeployTest toona-assembly.jar {code} The following is testing the code {code} package toona import com.typesafe.config.Config import com.typesafe.config.ConfigFactory object DeployTest { def main(args: Array[String]) { val conf = ConfigFactory.load(toona.conf) val redisNum = conf.getInt(redis.num) // Here will throw an `ConfigException` exception assert(redisNum == 4) } } {code} In yarn.ClientBase spark.yarn.dist.* do not work Key: SPARK-2051 URL: https://issues.apache.org/jira/browse/SPARK-2051 Project: Spark Issue Type: Bug Components: YARN Reporter: Guoqiang Li Spark configuration {{conf/spark-defaults.conf}}: {quote} spark.yarn.dist.archives /toona/conf spark.executor.extraClassPath ./conf spark.driver.extraClassPath ./conf {quote} HDFS directory {{hadoop dfs -cat /toona/conf/toona.conf}} : {quote} redis.num=4 {quote} The following command execution fails {code} YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --num-executors 2 --driver-memory 2g --executor-memory 2g --master yarn-cluster --class toona.DeployTest toona-assembly.jar {code} The following is the test code {code} package toona import com.typesafe.config.Config import com.typesafe.config.ConfigFactory object DeployTest { def main(args: Array[String]) { val conf = ConfigFactory.load(toona.conf) val redisNum = conf.getInt(redis.num) // Here will throw an `ConfigException` exception assert(redisNum == 4) } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2051) In yarn.ClientBase spark.yarn.dist.* do not work
[ https://issues.apache.org/jira/browse/SPARK-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019560#comment-14019560 ] Guoqiang Li commented on SPARK-2051: The PR: https://github.com/apache/spark/pull/969 In yarn.ClientBase spark.yarn.dist.* do not work Key: SPARK-2051 URL: https://issues.apache.org/jira/browse/SPARK-2051 Project: Spark Issue Type: Bug Components: YARN Reporter: Guoqiang Li Spark configuration {{conf/spark-defaults.conf}}: {quote} spark.yarn.dist.archives /toona/conf spark.executor.extraClassPath ./conf spark.driver.extraClassPath ./conf {quote} HDFS directory {{hadoop dfs -cat /toona/conf/toona.conf}} : {quote} redis.num=4 {quote} The following command execution fails {code} YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --num-executors 2 --driver-memory 2g --executor-memory 2g --master yarn-cluster --class toona.DeployTest toona-assembly.jar {code} The following is the test code {code} package toona import com.typesafe.config.Config import com.typesafe.config.ConfigFactory object DeployTest { def main(args: Array[String]) { val conf = ConfigFactory.load(toona.conf) val redisNum = conf.getInt(redis.num) // Here will throw an `ConfigException` exception assert(redisNum == 4) } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2052) Add optimization for CaseConversionExpression's.
Takuya Ueshin created SPARK-2052: Summary: Add optimization for CaseConversionExpression's. Key: SPARK-2052 URL: https://issues.apache.org/jira/browse/SPARK-2052 Project: Spark Issue Type: Improvement Components: SQL Reporter: Takuya Ueshin Add optimization for {{CaseConversionExpression}}'s. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2052) Add optimization for CaseConversionExpression's.
[ https://issues.apache.org/jira/browse/SPARK-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019578#comment-14019578 ] Takuya Ueshin commented on SPARK-2052: -- PRed: https://github.com/apache/spark/pull/990 Add optimization for CaseConversionExpression's. Key: SPARK-2052 URL: https://issues.apache.org/jira/browse/SPARK-2052 Project: Spark Issue Type: Improvement Components: SQL Reporter: Takuya Ueshin Add optimization for {{CaseConversionExpression}}'s. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1704) java.lang.AssertionError: assertion failed: No plan for ExplainCommand (Project [*])
[ https://issues.apache.org/jira/browse/SPARK-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019599#comment-14019599 ] Reynold Xin commented on SPARK-1704: Explain should probably just print out the sql query plan in Spark SQL instead of delegating to Hive ... java.lang.AssertionError: assertion failed: No plan for ExplainCommand (Project [*]) Key: SPARK-1704 URL: https://issues.apache.org/jira/browse/SPARK-1704 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Environment: linux Reporter: Yangjp Labels: sql Fix For: 1.1.0 Original Estimate: 612h Remaining Estimate: 612h 14/05/03 22:08:40 INFO ParseDriver: Parsing command: explain select * from src 14/05/03 22:08:40 INFO ParseDriver: Parse Completed 14/05/03 22:08:40 WARN LoggingFilter: EXCEPTION : java.lang.AssertionError: assertion failed: No plan for ExplainCommand (Project [*]) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:263) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:263) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:264) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:264) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:260) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:248) at org.apache.spark.sql.hive.api.java.JavaHiveContext.hql(JavaHiveContext.scala:39) at org.apache.spark.examples.TimeServerHandler.messageReceived(TimeServerHandler.java:72) at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407) at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) at org.apache.mina.filter.logging.LoggingFilter.messageReceived(LoggingFilter.java:208) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109) at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) at org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410) at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710) at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664) at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653) at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) -- This message was sent by Atlassian JIRA (v6.2#6252)