[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116667#comment-14116667 ] Sean Owen commented on SPARK-3324: -- Let me try to sketch what's funky about the structure. We have yarn/alpha, yarn/common, yarn/stable. Understanding the purpose, I would expect each to be a module, and that each has a src/ directory, and that alpha and stable depend on common, and the Spark parent activates either yarn/alpha or yarn/stable depending on profiles. IntelliJ is fine with that. However what we have is that yarn/ is a module. But its source is in yarn/common. But it's a pom-only module. And yarn/alpha and yarn/stable list it as the parent and inherit all of their source directory info and dependencies from yarn/, which is not itself a module of code. So each compiles two source directories defined in different places. This plus profiles confused IntelliJ and required manual intervention. Maybe I overlook a reason this had to be done, but rejiggering this as three simple modules should work. Again I imagine the question is, is it worth it versus removing yarn/alpha at some point in the future? Because it's trivial to fix how IntelliJ reads the POMs once by hand in the IDE. YARN module has nonstandard structure which cause compile error In IntelliJ --- Key: SPARK-3324 URL: https://issues.apache.org/jira/browse/SPARK-3324 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Environment: Mac OS: 10.9.4 IntelliJ IDEA: 13.1.4 Scala Plugins: 0.41.2 Maven: 3.0.5 Reporter: Yi Tian Priority: Minor Labels: intellij, maven, yarn The YARN module has nonstandard path structure like: {code} ${SPARK_HOME} |--yarn |--alpha (contains yarn api support for 0.23 and 2.0.x) |--stable (contains yarn api support for 2.2 and later) | |--pom.xml (spark-yarn) |--common (Common codes not depending on specific version of Hadoop) |--pom.xml (yarn-parent) {code} When we use maven to compile yarn module, maven will import 'alpha' or 'stable' module according to profile setting. And the submodule like 'stable' use the build propertie defined in yarn/pom.xml to import common codes to sourcePath. It will cause IntelliJ can't directly recognize sources in common directory as sourcePath. I thought we should change the yarn module to a unified maven jar project, and add specify different version of yarn api via maven profile setting. It will resolve the compile error in IntelliJ and make the yarn module more simple and clear. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3328) --with-tachyon build is broken
Elijah Epifanov created SPARK-3328: -- Summary: --with-tachyon build is broken Key: SPARK-3328 URL: https://issues.apache.org/jira/browse/SPARK-3328 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: Elijah Epifanov cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or directory -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3328) --with-tachyon build is broken
[ https://issues.apache.org/jira/browse/SPARK-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elijah Epifanov updated SPARK-3328: --- Description: cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such file or directory was: cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or directory --with-tachyon build is broken -- Key: SPARK-3328 URL: https://issues.apache.org/jira/browse/SPARK-3328 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: Elijah Epifanov cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such file or directory -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116782#comment-14116782 ] Yi Tian commented on SPARK-3324: Thanks [~srowen] explain that. In my beginning idea, I'd like to remove pom.xml from yarn/stable and yarn/alpha, and change the packaging properties in yarn/pom.xml from pom to jar. When building this module, the pom.xml should dynamically add common/src and either yarn/alpha/src or yarn/stable/src to sourcePath. YARN module has nonstandard structure which cause compile error In IntelliJ --- Key: SPARK-3324 URL: https://issues.apache.org/jira/browse/SPARK-3324 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Environment: Mac OS: 10.9.4 IntelliJ IDEA: 13.1.4 Scala Plugins: 0.41.2 Maven: 3.0.5 Reporter: Yi Tian Priority: Minor Labels: intellij, maven, yarn The YARN module has nonstandard path structure like: {code} ${SPARK_HOME} |--yarn |--alpha (contains yarn api support for 0.23 and 2.0.x) |--stable (contains yarn api support for 2.2 and later) | |--pom.xml (spark-yarn) |--common (Common codes not depending on specific version of Hadoop) |--pom.xml (yarn-parent) {code} When we use maven to compile yarn module, maven will import 'alpha' or 'stable' module according to profile setting. And the submodule like 'stable' use the build propertie defined in yarn/pom.xml to import common codes to sourcePath. It will cause IntelliJ can't directly recognize sources in common directory as sourcePath. I thought we should change the yarn module to a unified maven jar project, and add specify different version of yarn api via maven profile setting. It will resolve the compile error in IntelliJ and make the yarn module more simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116785#comment-14116785 ] Yi Tian commented on SPARK-3324: BTW, [~pwendell] there is another problem for IntelliJ. The spark-streaming-flume-sink module need avro plugin compiled some avro files, but the outputDirectory of generated scala source is under target path which cause IntelliJ can't recognize them and throw error during compiling the spark project. May I make a PR to fix these problem? YARN module has nonstandard structure which cause compile error In IntelliJ --- Key: SPARK-3324 URL: https://issues.apache.org/jira/browse/SPARK-3324 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Environment: Mac OS: 10.9.4 IntelliJ IDEA: 13.1.4 Scala Plugins: 0.41.2 Maven: 3.0.5 Reporter: Yi Tian Priority: Minor Labels: intellij, maven, yarn The YARN module has nonstandard path structure like: {code} ${SPARK_HOME} |--yarn |--alpha (contains yarn api support for 0.23 and 2.0.x) |--stable (contains yarn api support for 2.2 and later) | |--pom.xml (spark-yarn) |--common (Common codes not depending on specific version of Hadoop) |--pom.xml (yarn-parent) {code} When we use maven to compile yarn module, maven will import 'alpha' or 'stable' module according to profile setting. And the submodule like 'stable' use the build propertie defined in yarn/pom.xml to import common codes to sourcePath. It will cause IntelliJ can't directly recognize sources in common directory as sourcePath. I thought we should change the yarn module to a unified maven jar project, and add specify different version of yarn api via maven profile setting. It will resolve the compile error in IntelliJ and make the yarn module more simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116801#comment-14116801 ] Sean Owen commented on SPARK-3324: -- [~tianyi] I seem to remember having a similar problem. I think that is straightforward to fix. It's a separate issue. But FWIW I would like to see that improved. YARN module has nonstandard structure which cause compile error In IntelliJ --- Key: SPARK-3324 URL: https://issues.apache.org/jira/browse/SPARK-3324 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Environment: Mac OS: 10.9.4 IntelliJ IDEA: 13.1.4 Scala Plugins: 0.41.2 Maven: 3.0.5 Reporter: Yi Tian Priority: Minor Labels: intellij, maven, yarn The YARN module has nonstandard path structure like: {code} ${SPARK_HOME} |--yarn |--alpha (contains yarn api support for 0.23 and 2.0.x) |--stable (contains yarn api support for 2.2 and later) | |--pom.xml (spark-yarn) |--common (Common codes not depending on specific version of Hadoop) |--pom.xml (yarn-parent) {code} When we use maven to compile yarn module, maven will import 'alpha' or 'stable' module according to profile setting. And the submodule like 'stable' use the build propertie defined in yarn/pom.xml to import common codes to sourcePath. It will cause IntelliJ can't directly recognize sources in common directory as sourcePath. I thought we should change the yarn module to a unified maven jar project, and add specify different version of yarn api via maven profile setting. It will resolve the compile error in IntelliJ and make the yarn module more simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings
William Benton created SPARK-3329: - Summary: HiveQuerySuite SET tests depend on map orderings Key: SPARK-3329 URL: https://issues.apache.org/jira/browse/SPARK-3329 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.2, 1.1.0 Reporter: William Benton Priority: Trivial The SET tests in HiveQuerySuite that return multiple values depend on the ordering in which map pairs are returned from Hive and can fail spuriously if this changes due to environment or library changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings
[ https://issues.apache.org/jira/browse/SPARK-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116821#comment-14116821 ] Apache Spark commented on SPARK-3329: - User 'willb' has created a pull request for this issue: https://github.com/apache/spark/pull/2220 HiveQuerySuite SET tests depend on map orderings Key: SPARK-3329 URL: https://issues.apache.org/jira/browse/SPARK-3329 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.2, 1.1.0 Reporter: William Benton Priority: Trivial The SET tests in HiveQuerySuite that return multiple values depend on the ordering in which map pairs are returned from Hive and can fail spuriously if this changes due to environment or library changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite
Sean Owen created SPARK-3330: Summary: Successive test runs with different profiles fail SparkSubmitSuite Key: SPARK-3330 URL: https://issues.apache.org/jira/browse/SPARK-3330 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Maven-based Jenkins builds have been failing for a while: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console One common cause is that on the second and subsequent runs of mvn clean test, at least two assembly JARs will exist in assembly/target. Because assembly is not a submodule of parent, mvn clean is not invoked for assembly. The presence of two assembly jars causes spark-submit to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3331) PEP8 tests fail in release because they check unzipped py4j code
Sean Owen created SPARK-3331: Summary: PEP8 tests fail in release because they check unzipped py4j code Key: SPARK-3331 URL: https://issues.apache.org/jira/browse/SPARK-3331 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor PEP8 tests run on files under ./python, but in the release packaging, py4j code is present in ./python/build/py4j. Py4J code fails style checks and thus release fails ./dev/run-tests now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code
[ https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3331: - Summary: PEP8 tests fail because they check unzipped py4j code (was: PEP8 tests fail in release because they check unzipped py4j code) PEP8 tests fail because they check unzipped py4j code - Key: SPARK-3331 URL: https://issues.apache.org/jira/browse/SPARK-3331 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor PEP8 tests run on files under ./python, but in the release packaging, py4j code is present in ./python/build/py4j. Py4J code fails style checks and thus release fails ./dev/run-tests now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code
[ https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3331: - Description: PEP8 tests run on files under ./python, but unzipped py4j code is found at ./python/build/py4j. Py4J code fails style checks and can fail ./dev/run-tests if this code is present locally. (was: PEP8 tests run on files under ./python, but in the release packaging, py4j code is present in ./python/build/py4j. Py4J code fails style checks and thus release fails ./dev/run-tests now.) PEP8 tests fail because they check unzipped py4j code - Key: SPARK-3331 URL: https://issues.apache.org/jira/browse/SPARK-3331 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor PEP8 tests run on files under ./python, but unzipped py4j code is found at ./python/build/py4j. Py4J code fails style checks and can fail ./dev/run-tests if this code is present locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite
[ https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116829#comment-14116829 ] Apache Spark commented on SPARK-3330: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/2221 Successive test runs with different profiles fail SparkSubmitSuite -- Key: SPARK-3330 URL: https://issues.apache.org/jira/browse/SPARK-3330 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Maven-based Jenkins builds have been failing for a while: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console One common cause is that on the second and subsequent runs of mvn clean test, at least two assembly JARs will exist in assembly/target. Because assembly is not a submodule of parent, mvn clean is not invoked for assembly. The presence of two assembly jars causes spark-submit to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code
[ https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116830#comment-14116830 ] Apache Spark commented on SPARK-3331: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/ PEP8 tests fail because they check unzipped py4j code - Key: SPARK-3331 URL: https://issues.apache.org/jira/browse/SPARK-3331 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor PEP8 tests run on files under ./python, but unzipped py4j code is found at ./python/build/py4j. Py4J code fails style checks and can fail ./dev/run-tests if this code is present locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116868#comment-14116868 ] Nicholas Chammas commented on SPARK-2870: - [~marmbrus], [~davies], [~yhuai] - We discussed this feature on the user list some weeks back. Just pinging you here to make sure this feature request is on your radar. Thorough schema inference directly on RDDs of Python dictionaries - Key: SPARK-2870 URL: https://issues.apache.org/jira/browse/SPARK-2870 Project: Spark Issue Type: Improvement Components: PySpark, SQL Reporter: Nicholas Chammas h4. Background I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. They process JSON text directly and infer a schema that covers the entire source data set. This is very important with semi-structured data like JSON since individual elements in the data set are free to have different structures. Matching fields across elements may even have different value types. For example: {code} {a: 5} {a: cow} {code} To get a queryable schema that covers the whole data set, you need to infer a schema by looking at the whole data set. The aforementioned {{SQLContext.json...()}} methods do this very well. h4. Feature Request What we need is for {{SQlContext.inferSchema()}} to do this, too. Alternatively, we need a new {{SQLContext}} method that works on RDDs of Python dictionaries and does something functionally equivalent to this: {code} SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) {code} As of 1.0.2, [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema] just looks at the first element in the data set. This won't help much when the structure of the elements in the target RDD is variable. h4. Example Use Case * You have some JSON text data that you want to analyze using Spark SQL. * You would use one of the {{SQLContext.json...()}} methods, but you need to do some filtering on the data first to remove bad elements--basically, some minimal schema validation. * You deserialize the JSON objects to Python {{dict}} s and filter out the bad ones. You now have an RDD of dictionaries. * From this RDD, you want a SchemaRDD that captures the schema for the whole data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters
Allan Douglas R. de Oliveira created SPARK-3332: --- Summary: Tags shouldn't be the main strategy for machine membership on clusters Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116890#comment-14116890 ] Apache Spark commented on SPARK-3332: - User 'douglaz' has created a pull request for this issue: https://github.com/apache/spark/pull/2223 Tags shouldn't be the main strategy for machine membership on clusters -- Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3010) fix redundant conditional
[ https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3010: - Assignee: wangfei fix redundant conditional - Key: SPARK-3010 URL: https://issues.apache.org/jira/browse/SPARK-3010 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.2 Reporter: wangfei Assignee: wangfei Fix For: 1.1.0 there are some redundant conditional in spark, such as 1. private[spark] def codegenEnabled: Boolean = if (getConf(CODEGEN_ENABLED, false) == true) true else false 2. x = if (x == 2) true else false ... etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3010) fix redundant conditional
[ https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3010. -- Resolution: Fixed Fix Version/s: (was: 1.1.0) 1.2.0 Target Version/s: 1.2.0 (was: 1.1.0) fix redundant conditional - Key: SPARK-3010 URL: https://issues.apache.org/jira/browse/SPARK-3010 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.2 Reporter: wangfei Assignee: wangfei Fix For: 1.2.0 there are some redundant conditional in spark, such as 1. private[spark] def codegenEnabled: Boolean = if (getConf(CODEGEN_ENABLED, false) == true) true else false 2. x = if (x == 2) true else false ... etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3010) fix redundant conditional
[ https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3010: - Priority: Trivial (was: Major) fix redundant conditional - Key: SPARK-3010 URL: https://issues.apache.org/jira/browse/SPARK-3010 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.2 Reporter: wangfei Assignee: wangfei Priority: Trivial Fix For: 1.2.0 there are some redundant conditional in spark, such as 1. private[spark] def codegenEnabled: Boolean = if (getConf(CODEGEN_ENABLED, false) == true) true else false 2. x = if (x == 2) true else false ... etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3333) Large number of partitions causes OOM
Nicholas Chammas created SPARK-: --- Summary: Large number of partitions causes OOM Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) java.lang.OutOfMemoryError: Java heap space at
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116895#comment-14116895 ] Nicholas Chammas commented on SPARK-: - Note: I have not yet confirmed that 1.1.0-rc3 yields the exact same stack trace as the one provided above (which is for 1.1.0-rc2), though I expect them to be the same. I _can_ confirm that it takes a very, very long time to run, as it is running right now on rc3 and has been for about 45 minutes. Since I have to be offline for a bit, I thought I'd report this issue ASAP with the rc2 stack trace and update it later with a stack trace from rc3. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to
[jira] [Updated] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-: Description: Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116902#comment-14116902 ] Patrick Wendell commented on SPARK-: Hey [~nchammas] - I don't think anything relevant to this issue has changed between RC2 and RC3, so the RC2 trace is probably sufficient. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at
[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3332: --- Summary: Tagging is not atomic with launching instances on EC2 (was: Tags shouldn't be the main strategy for machine membership on clusters) Tagging is not atomic with launching instances on EC2 - Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116906#comment-14116906 ] Patrick Wendell commented on SPARK-3332: I changed the title slightly - I think the underlying problem here is that tagging is not atomic with launching instances. Removing the use of tagging entirely is one potential solution for this. Another is that we just print better errors if tagging does not succeed and explain there might be orphaned instances. The reason why SPARK-2333 was added is that the current approach can lead to too many security groups, so we can't just revert this without any cost. In the mean time I'd like to revert SPARK-2333 in branch-1.1 so we can defer the design decision to the 1.2 release timeframe. Thanks [~douglaz] for highlighting this issue. Tagging is not atomic with launching instances on EC2 - Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3332: --- Target Version/s: 1.2.0 Tagging is not atomic with launching instances on EC2 - Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116910#comment-14116910 ] Apache Spark commented on SPARK-3332: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/2225 Tagging is not atomic with launching instances on EC2 - Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2870: Target Version/s: 1.2.0 Thorough schema inference directly on RDDs of Python dictionaries - Key: SPARK-2870 URL: https://issues.apache.org/jira/browse/SPARK-2870 Project: Spark Issue Type: Improvement Components: PySpark, SQL Reporter: Nicholas Chammas h4. Background I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. They process JSON text directly and infer a schema that covers the entire source data set. This is very important with semi-structured data like JSON since individual elements in the data set are free to have different structures. Matching fields across elements may even have different value types. For example: {code} {a: 5} {a: cow} {code} To get a queryable schema that covers the whole data set, you need to infer a schema by looking at the whole data set. The aforementioned {{SQLContext.json...()}} methods do this very well. h4. Feature Request What we need is for {{SQlContext.inferSchema()}} to do this, too. Alternatively, we need a new {{SQLContext}} method that works on RDDs of Python dictionaries and does something functionally equivalent to this: {code} SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) {code} As of 1.0.2, [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema] just looks at the first element in the data set. This won't help much when the structure of the elements in the target RDD is variable. h4. Example Use Case * You have some JSON text data that you want to analyze using Spark SQL. * You would use one of the {{SQLContext.json...()}} methods, but you need to do some filtering on the data first to remove bad elements--basically, some minimal schema validation. * You deserialize the JSON objects to Python {{dict}} s and filter out the bad ones. You now have an RDD of dictionaries. * From this RDD, you want a SchemaRDD that captures the schema for the whole data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116915#comment-14116915 ] Michael Armbrust commented on SPARK-2870: - Yeah, thanks for pinging me. I've targeted this JIRA for 1.2. Thorough schema inference directly on RDDs of Python dictionaries - Key: SPARK-2870 URL: https://issues.apache.org/jira/browse/SPARK-2870 Project: Spark Issue Type: Improvement Components: PySpark, SQL Reporter: Nicholas Chammas h4. Background I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. They process JSON text directly and infer a schema that covers the entire source data set. This is very important with semi-structured data like JSON since individual elements in the data set are free to have different structures. Matching fields across elements may even have different value types. For example: {code} {a: 5} {a: cow} {code} To get a queryable schema that covers the whole data set, you need to infer a schema by looking at the whole data set. The aforementioned {{SQLContext.json...()}} methods do this very well. h4. Feature Request What we need is for {{SQlContext.inferSchema()}} to do this, too. Alternatively, we need a new {{SQLContext}} method that works on RDDs of Python dictionaries and does something functionally equivalent to this: {code} SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) {code} As of 1.0.2, [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema] just looks at the first element in the data set. This won't help much when the structure of the elements in the target RDD is variable. h4. Example Use Case * You have some JSON text data that you want to analyze using Spark SQL. * You would use one of the {{SQLContext.json...()}} methods, but you need to do some filtering on the data first to remove bad elements--basically, some minimal schema validation. * You deserialize the JSON objects to Python {{dict}} s and filter out the bad ones. You now have an RDD of dictionaries. * From this RDD, you want a SchemaRDD that captures the schema for the whole data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3317) The loss of regularization in Updater should use the oldWeights
[ https://issues.apache.org/jira/browse/SPARK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai closed SPARK-3317. -- Resolution: Won't Fix The loss of regularization in Updater should use the oldWeights --- Key: SPARK-3317 URL: https://issues.apache.org/jira/browse/SPARK-3317 Project: Spark Issue Type: Bug Components: MLlib Reporter: DB Tsai The current loss of the regularization is computed from the newWeights which is not correct. The loss, R(w) = 1/2 ||w||^2 should be computed with the oldWeights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116923#comment-14116923 ] Matei Zaharia commented on SPARK-: -- The slowdown might be partly due to adding external spilling in Python, but it's weird that this would crash the driver. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at
[jira] [Resolved] (SPARK-3320) Batched in-memory column buffer building doesn't work for SchemaRDDs with empty partitions
[ https://issues.apache.org/jira/browse/SPARK-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3320. - Resolution: Fixed Fix Version/s: 1.1.0 Assignee: Cheng Lian Batched in-memory column buffer building doesn't work for SchemaRDDs with empty partitions -- Key: SPARK-3320 URL: https://issues.apache.org/jira/browse/SPARK-3320 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.2 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Blocker Fix For: 1.1.0 Empty partition iterator is not properly handled in [#1880|https://github.com/apache/spark/pull/1880/files#diff-b47dac3d98014877d5879f5cf37ab0d1R115], and throws exception when accessing empty partition of the target SchemaRDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116935#comment-14116935 ] Davies Liu edited comment on SPARK- at 8/31/14 11:26 PM: - [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). PS: I am running on master. was (Author: davies): [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at
[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116935#comment-14116935 ] Davies Liu edited comment on SPARK- at 8/31/14 11:27 PM: - [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). The CPU usage of JVM is 250%, memory is about 655M, it may trigger OOM somewhere. PS: I am running on master. was (Author: davies): [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). PS: I am running on master. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116936#comment-14116936 ] Josh Rosen commented on SPARK-: --- I agree with Davies; I think this is a more general Spark issue, perhaps related to {{repartition()}}. I just tried testing this locally with commit eff9714e1c88e39e28317358ca9ec87677f121dc, which is the commit immediately prior to [14174abd421318e71c16edd24224fd5094bdfed4|https://github.com/apache/spark/commit/14174abd421318e71c16edd24224fd5094bdfed4], Davies' patch that adds hash-based disk spilling aggregation to PySpark, and I still saw the same slowdown there. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116942#comment-14116942 ] Matei Zaharia commented on SPARK-: -- I see, that makes sense. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116948#comment-14116948 ] Josh Rosen commented on SPARK-: --- Working on doing some manual bisecting to find the patch that introduced the slowdown. It's still slow as early as 8d338f64c4eda45d22ae33f61ef7928011cc2846. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at
[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116950#comment-14116950 ] Allan Douglas R. de Oliveira commented on SPARK-3332: - [~pwendell], yes this is a good reword and I agree to revert it for now. You mentioned the two potential solutions but I think a good compromise is the one implemented by the PR which keeps the flag allowing reuse of the same security group but also allowing to match the machines by the security group in the other cases. Tagging is not atomic with launching instances on EC2 - Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116950#comment-14116950 ] Allan Douglas R. de Oliveira edited comment on SPARK-3332 at 9/1/14 12:49 AM: -- [~pwendell], yes this is a good reword and I agree to revert it for now. You mentioned the two potential solutions but I think a good compromise is the one implemented by the PR which keeps the flag allowing reuse of the same security group but also allowing to match the machines by the security group in the other cases. Perhaps more messages could be added when the flag has been used and the tagging failed. was (Author: douglaz): [~pwendell], yes this is a good reword and I agree to revert it for now. You mentioned the two potential solutions but I think a good compromise is the one implemented by the PR which keeps the flag allowing reuse of the same security group but also allowing to match the machines by the security group in the other cases. Tagging is not atomic with launching instances on EC2 - Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be invisible to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116969#comment-14116969 ] Josh Rosen commented on SPARK-: --- Looks like the issue was introduced somewhere between 273afcb and 62d4a0f. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at
[jira] [Commented] (SPARK-3168) The ServletContextHandler of webui lacks a SessionManager
[ https://issues.apache.org/jira/browse/SPARK-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116974#comment-14116974 ] meiyoula commented on SPARK-3168: - To use CAS for single sign-on, i add some filters of webui in configuration.For History Server, I set the following configuration: export SPARK_HISTORY_OPTS=$SPARK_HISTORY_OPTS -Dspark.ui.filters=org.jasig.cas.client.authentication.Saml11AuthenticationFilter,org.jasig.cas.client.validation.Saml11TicketValidationFilter,org.jasig.cas.client.util.HttpServletRequestWrapperFilter -Dspark.org.jasig.cas.client.authentication.Saml11AuthenticationFilter.params=casServerLoginUrl=https://9.91.11.120:8443/cas/login,serverName=http://9.91.11.171:18080 -Dspark.org.jasig.cas.client.validation.Saml11TicketValidationFilter.params=casServerUrlPrefix=https://9.91.11.120:8443/cas/,serverName=http://9.91.11.171:18080,hostnameVerifier=org.jasig.cas.client.ssl.AnyHostnameVerifier The ServletContextHandler of webui lacks a SessionManager - Key: SPARK-3168 URL: https://issues.apache.org/jira/browse/SPARK-3168 Project: Spark Issue Type: Bug Components: Spark Core Environment: CAS Reporter: meiyoula When i use CAS to realize single sign of webui, it occurs a exception: {code} WARN [qtp1076146544-24] / org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:561) java.lang.IllegalStateException: No SessionManager at org.eclipse.jetty.server.Request.getSession(Request.java:1269) at org.eclipse.jetty.server.Request.getSession(Request.java:1248) at org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:178) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.jasig.cas.client.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:116) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.jasig.cas.client.session.SingleSignOutFilter.doFilter(SingleSignOutFilter.java:76) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116978#comment-14116978 ] Nicholas Chammas commented on SPARK-: - For the record, I got the OOM in a relatively short amount of time (less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of y'all can replicate the OOM with that kind of environment. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at
[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116978#comment-14116978 ] Nicholas Chammas edited comment on SPARK- at 9/1/14 2:09 AM: - For the record, I got the OOM in [my original report|http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-large-of-partitions-causes-OOM-td13155.html] (which I duplicated here in this JIRA) in a relatively short amount of time (less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of y'all can replicate the OOM with that kind of environment. was (Author: nchammas): For the record, I got the OOM in a relatively short amount of time (less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of y'all can replicate the OOM with that kind of environment. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116985#comment-14116985 ] Josh Rosen commented on SPARK-: --- I'll resume work on this later tonight, but just wanted to note that things run fast as recently as commit 5ad5e34 and slow down as long ago as 6587ef7. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at
[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak
[ https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iven Hsu updated SPARK-3334: Description: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. was: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, it's memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. Spark causes mesos-master memory leak - Key: SPARK-3334 URL: https://issues.apache.org/jira/browse/SPARK-3334 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Environment: Mesos 0.16.0/0.19.0 CentOS 6.4 Reporter: Iven Hsu The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3334) Spark causes mesos-master memory leak
Iven Hsu created SPARK-3334: --- Summary: Spark causes mesos-master memory leak Key: SPARK-3334 URL: https://issues.apache.org/jira/browse/SPARK-3334 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Environment: Mesos 0.16.0/0.19.0 CentOS 6.4 Reporter: Iven Hsu The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, it's memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak
[ https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iven Hsu updated SPARK-3334: Description: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. was: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. Spark causes mesos-master memory leak - Key: SPARK-3334 URL: https://issues.apache.org/jira/browse/SPARK-3334 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Environment: Mesos 0.16.0/0.19.0 CentOS 6.4 Reporter: Iven Hsu The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reassigned SPARK-3324: -- Assignee: Patrick Wendell YARN module has nonstandard structure which cause compile error In IntelliJ --- Key: SPARK-3324 URL: https://issues.apache.org/jira/browse/SPARK-3324 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Environment: Mac OS: 10.9.4 IntelliJ IDEA: 13.1.4 Scala Plugins: 0.41.2 Maven: 3.0.5 Reporter: Yi Tian Assignee: Patrick Wendell Priority: Minor Labels: intellij, maven, yarn The YARN module has nonstandard path structure like: {code} ${SPARK_HOME} |--yarn |--alpha (contains yarn api support for 0.23 and 2.0.x) |--stable (contains yarn api support for 2.2 and later) | |--pom.xml (spark-yarn) |--common (Common codes not depending on specific version of Hadoop) |--pom.xml (yarn-parent) {code} When we use maven to compile yarn module, maven will import 'alpha' or 'stable' module according to profile setting. And the submodule like 'stable' use the build propertie defined in yarn/pom.xml to import common codes to sourcePath. It will cause IntelliJ can't directly recognize sources in common directory as sourcePath. I thought we should change the yarn module to a unified maven jar project, and add specify different version of yarn api via maven profile setting. It will resolve the compile error in IntelliJ and make the yarn module more simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117016#comment-14117016 ] Josh Rosen commented on SPARK-: --- It looks like https://github.com/apache/spark/pull/1138 may be the culprit, since this job runs quickly immediately prior to that commit. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117019#comment-14117019 ] Davies Liu commented on SPARK-: --- @joserosen This should not be the culprit, it just show the bad things up in PySpark. Before it, the default partitions of reduceByKey() could be something much smaller, such as 4. The root cause should be inside Scala, you should use the Scala one to test it. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117021#comment-14117021 ] Josh Rosen commented on SPARK-: --- Good point. I guess culprit was the wrong word, but that commit helps to narrow down the problem. Large number of partitions causes OOM - Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at
[jira] [Resolved] (SPARK-2536) Update the MLlib page of Spark website
[ https://issues.apache.org/jira/browse/SPARK-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2536. -- Resolution: Done Update the MLlib page of Spark website -- Key: SPARK-2536 URL: https://issues.apache.org/jira/browse/SPARK-2536 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Reporter: Xiangrui Meng Assignee: Xiangrui Meng It still shows v0.9. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3205) input format for text records saved with in-record delimiter and newline characters escaped
[ https://issues.apache.org/jira/browse/SPARK-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-3205. -- Resolution: Later Moved the implementation to https://github.com/mengxr/redshift-input-format. If people feel this input format is common, we can move it to Spark Core later. input format for text records saved with in-record delimiter and newline characters escaped --- Key: SPARK-3205 URL: https://issues.apache.org/jira/browse/SPARK-3205 Project: Spark Issue Type: New Feature Components: Spark Core, SQL Reporter: Xiangrui Meng Assignee: Xiangrui Meng Text records may contain in-record delimiter or newline characters. In such cases, we can either encode them or escape them. The latter is simpler and used by Redshift's UNLOAD with the ESCAPE option. The problem is that a record will span multiple lines. We need an input format for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3090) Avoid not stopping SparkContext with YARN Client mode
[ https://issues.apache.org/jira/browse/SPARK-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117036#comment-14117036 ] Kousuke Saruta commented on SPARK-3090: --- It sound good idea that SparkContext register shutdown-hook itself. Avoid not stopping SparkContext with YARN Client mode -- Key: SPARK-3090 URL: https://issues.apache.org/jira/browse/SPARK-3090 Project: Spark Issue Type: Bug Components: Spark Core, YARN Affects Versions: 1.1.0 Reporter: Kousuke Saruta When we use YARN Cluster mode, ApplicationMaser register a shutdown hook, stopping SparkContext. Thanks to this, SparkContext can stop even if Application forgets to stop SparkContext itself. But, unfortunately, YARN Client mode doesn't have such mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org