date:20140831

[
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116667#comment-14116667
]

Sean Owen commented on SPARK-3324:
--

Let me try to sketch what's funky about the structure. We have yarn/alpha,
yarn/common, yarn/stable. Understanding the purpose, I would expect each to be
a module, and that each has a src/ directory, and that alpha and stable depend
on common, and the Spark parent activates either yarn/alpha or yarn/stable
depending on profiles. IntelliJ is fine with that.

However what we have is that yarn/ is a module. But its source is in
yarn/common. But it's a pom-only module. And yarn/alpha and yarn/stable list it
as the parent and inherit all of their source directory info and dependencies
from yarn/, which is not itself a module of code. So each compiles two source
directories defined in different places. This plus profiles confused IntelliJ
and required manual intervention.

Maybe I overlook a reason this had to be done, but rejiggering this as three
simple modules should work.
Again I imagine the question is, is it worth it versus removing yarn/alpha at
some point in the future? Because it's trivial to fix how IntelliJ reads the
POMs once by hand in the IDE.

YARN module has nonstandard structure which cause compile error In IntelliJ
---

Key: SPARK-3324
URL: https://issues.apache.org/jira/browse/SPARK-3324
Project: Spark
Issue Type: Bug
Components: YARN
Affects Versions: 1.1.0
Environment: Mac OS: 10.9.4
IntelliJ IDEA: 13.1.4
Scala Plugins: 0.41.2
Maven: 3.0.5
Reporter: Yi Tian
Priority: Minor
Labels: intellij, maven, yarn

The YARN module has nonstandard path structure like:
{code}
${SPARK_HOME}
|--yarn
|--alpha (contains yarn api support for 0.23 and 2.0.x)
|--stable (contains yarn api support for 2.2 and later)
| |--pom.xml (spark-yarn)
|--common (Common codes not depending on specific version of Hadoop)
|--pom.xml (yarn-parent)
{code}
When we use maven to compile yarn module, maven will import 'alpha' or
'stable' module according to profile setting.
And the submodule like 'stable' use the build propertie defined in
yarn/pom.xml to import common codes to sourcePath.
It will cause IntelliJ can't directly recognize sources in common directory
as sourcePath.
I thought we should change the yarn module to a unified maven jar project,
and add specify different version of yarn api via maven profile setting.
It will resolve the compile error in IntelliJ and make the yarn module more
simple and clear.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3328) --with-tachyon build is broken

2014-08-31 Thread Elijah Epifanov (JIRA)

Elijah Epifanov created SPARK-3328:
--

 Summary: --with-tachyon build is broken
 Key: SPARK-3328
 URL: https://issues.apache.org/jira/browse/SPARK-3328
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.1.0
Reporter: Elijah Epifanov


cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or 
directory




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3328) --with-tachyon build is broken

2014-08-31 Thread Elijah Epifanov (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elijah Epifanov updated SPARK-3328:
---

Description: 
cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such file 
or directory


  was:
cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or 
directory



 --with-tachyon build is broken
 --

 Key: SPARK-3328
 URL: https://issues.apache.org/jira/browse/SPARK-3328
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.1.0
Reporter: Elijah Epifanov

 cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such 
 file or directory



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Yi Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116782#comment-14116782
 ] 

Yi Tian commented on SPARK-3324:


Thanks [~srowen] explain that. 
In my beginning idea, I'd like to remove pom.xml from yarn/stable and 
yarn/alpha, and change the packaging properties in yarn/pom.xml from pom to 
jar. 
When building this module, the pom.xml should dynamically add common/src and 
either yarn/alpha/src or yarn/stable/src to sourcePath. 

 YARN module has nonstandard structure which cause compile error In IntelliJ
 ---

 Key: SPARK-3324
 URL: https://issues.apache.org/jira/browse/SPARK-3324
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.0
 Environment: Mac OS: 10.9.4
 IntelliJ IDEA: 13.1.4
 Scala Plugins: 0.41.2
 Maven: 3.0.5
Reporter: Yi Tian
Priority: Minor
  Labels: intellij, maven, yarn

 The YARN module has nonstandard path structure like:
 {code}
 ${SPARK_HOME}
   |--yarn
  |--alpha (contains yarn api support for 0.23 and 2.0.x)
  |--stable (contains yarn api support for 2.2 and later)
  | |--pom.xml (spark-yarn)
  |--common (Common codes not depending on specific version of Hadoop)
  |--pom.xml (yarn-parent)
 {code}
 When we use maven to compile yarn module, maven will import 'alpha' or 
 'stable' module according to profile setting.
 And the submodule like 'stable' use the build propertie defined in 
 yarn/pom.xml to import common codes to sourcePath.
 It will cause IntelliJ can't directly recognize sources in common directory 
 as sourcePath. 
 I thought we should change the yarn module to a unified maven jar project, 
 and add specify different version of yarn api via maven profile setting.
 It will resolve the compile error in IntelliJ and make the yarn module more 
 simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Yi Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116785#comment-14116785
 ] 

Yi Tian commented on SPARK-3324:


BTW, [~pwendell] there is another problem for IntelliJ. 
The spark-streaming-flume-sink module need avro plugin compiled some avro 
files, but the outputDirectory of generated scala source is under target path 
which cause IntelliJ can't recognize them and throw error during compiling the 
spark project.

May I make a PR to fix these problem?

 YARN module has nonstandard structure which cause compile error In IntelliJ
 ---

 Key: SPARK-3324
 URL: https://issues.apache.org/jira/browse/SPARK-3324
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.0
 Environment: Mac OS: 10.9.4
 IntelliJ IDEA: 13.1.4
 Scala Plugins: 0.41.2
 Maven: 3.0.5
Reporter: Yi Tian
Priority: Minor
  Labels: intellij, maven, yarn

 The YARN module has nonstandard path structure like:
 {code}
 ${SPARK_HOME}
   |--yarn
  |--alpha (contains yarn api support for 0.23 and 2.0.x)
  |--stable (contains yarn api support for 2.2 and later)
  | |--pom.xml (spark-yarn)
  |--common (Common codes not depending on specific version of Hadoop)
  |--pom.xml (yarn-parent)
 {code}
 When we use maven to compile yarn module, maven will import 'alpha' or 
 'stable' module according to profile setting.
 And the submodule like 'stable' use the build propertie defined in 
 yarn/pom.xml to import common codes to sourcePath.
 It will cause IntelliJ can't directly recognize sources in common directory 
 as sourcePath. 
 I thought we should change the yarn module to a unified maven jar project, 
 and add specify different version of yarn api via maven profile setting.
 It will resolve the compile error in IntelliJ and make the yarn module more 
 simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ


[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116801#comment-14116801
 ] 

Sean Owen commented on SPARK-3324:
--

[~tianyi] I seem to remember having a similar problem. I think that is 
straightforward to fix. It's a separate issue. But FWIW I would like to see 
that improved.

 YARN module has nonstandard structure which cause compile error In IntelliJ
 ---

 Key: SPARK-3324
 URL: https://issues.apache.org/jira/browse/SPARK-3324
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.0
 Environment: Mac OS: 10.9.4
 IntelliJ IDEA: 13.1.4
 Scala Plugins: 0.41.2
 Maven: 3.0.5
Reporter: Yi Tian
Priority: Minor
  Labels: intellij, maven, yarn

 The YARN module has nonstandard path structure like:
 {code}
 ${SPARK_HOME}
   |--yarn
  |--alpha (contains yarn api support for 0.23 and 2.0.x)
  |--stable (contains yarn api support for 2.2 and later)
  | |--pom.xml (spark-yarn)
  |--common (Common codes not depending on specific version of Hadoop)
  |--pom.xml (yarn-parent)
 {code}
 When we use maven to compile yarn module, maven will import 'alpha' or 
 'stable' module according to profile setting.
 And the submodule like 'stable' use the build propertie defined in 
 yarn/pom.xml to import common codes to sourcePath.
 It will cause IntelliJ can't directly recognize sources in common directory 
 as sourcePath. 
 I thought we should change the yarn module to a unified maven jar project, 
 and add specify different version of yarn api via maven profile setting.
 It will resolve the compile error in IntelliJ and make the yarn module more 
 simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings

2014-08-31 Thread William Benton (JIRA)

William Benton created SPARK-3329:
-

 Summary: HiveQuerySuite SET tests depend on map orderings
 Key: SPARK-3329
 URL: https://issues.apache.org/jira/browse/SPARK-3329
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2, 1.1.0
Reporter: William Benton
Priority: Trivial


The SET tests in HiveQuerySuite that return multiple values depend on the 
ordering in which map pairs are returned from Hive and can fail spuriously if 
this changes due to environment or library changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings


[ 
https://issues.apache.org/jira/browse/SPARK-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116821#comment-14116821
 ] 

Apache Spark commented on SPARK-3329:
-

User 'willb' has created a pull request for this issue:
https://github.com/apache/spark/pull/2220

 HiveQuerySuite SET tests depend on map orderings
 

 Key: SPARK-3329
 URL: https://issues.apache.org/jira/browse/SPARK-3329
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2, 1.1.0
Reporter: William Benton
Priority: Trivial

 The SET tests in HiveQuerySuite that return multiple values depend on the 
 ordering in which map pairs are returned from Hive and can fail spuriously if 
 this changes due to environment or library changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite

Sean Owen created SPARK-3330:


 Summary: Successive test runs with different profiles fail 
SparkSubmitSuite
 Key: SPARK-3330
 URL: https://issues.apache.org/jira/browse/SPARK-3330
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen


Maven-based Jenkins builds have been failing for a while:
https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console

One common cause is that on the second and subsequent runs of mvn clean test, 
at least two assembly JARs will exist in assembly/target. Because assembly is 
not a submodule of parent, mvn clean is not invoked for assembly. The 
presence of two assembly jars causes spark-submit to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3331) PEP8 tests fail in release because they check unzipped py4j code

Sean Owen created SPARK-3331:


 Summary: PEP8 tests fail in release because they check unzipped 
py4j code
 Key: SPARK-3331
 URL: https://issues.apache.org/jira/browse/SPARK-3331
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen
Priority: Minor


PEP8 tests run on files under ./python, but in the release packaging, py4j 
code is present in ./python/build/py4j. Py4J code fails style checks and thus 
release fails ./dev/run-tests now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code


 [ 
https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3331:
-
Summary: PEP8 tests fail because they check unzipped py4j code  (was: PEP8 
tests fail in release because they check unzipped py4j code)

 PEP8 tests fail because they check unzipped py4j code
 -

 Key: SPARK-3331
 URL: https://issues.apache.org/jira/browse/SPARK-3331
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen
Priority: Minor

 PEP8 tests run on files under ./python, but in the release packaging, py4j 
 code is present in ./python/build/py4j. Py4J code fails style checks and 
 thus release fails ./dev/run-tests now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code


 [ 
https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3331:
-
Description: PEP8 tests run on files under ./python, but unzipped py4j 
code is found at ./python/build/py4j. Py4J code fails style checks and can 
fail ./dev/run-tests if this code is present locally.  (was: PEP8 tests run on 
files under ./python, but in the release packaging, py4j code is present in 
./python/build/py4j. Py4J code fails style checks and thus release fails 
./dev/run-tests now.)

 PEP8 tests fail because they check unzipped py4j code
 -

 Key: SPARK-3331
 URL: https://issues.apache.org/jira/browse/SPARK-3331
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen
Priority: Minor

 PEP8 tests run on files under ./python, but unzipped py4j code is found at 
 ./python/build/py4j. Py4J code fails style checks and can fail 
 ./dev/run-tests if this code is present locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite


[ 
https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116829#comment-14116829
 ] 

Apache Spark commented on SPARK-3330:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2221

 Successive test runs with different profiles fail SparkSubmitSuite
 --

 Key: SPARK-3330
 URL: https://issues.apache.org/jira/browse/SPARK-3330
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen

 Maven-based Jenkins builds have been failing for a while:
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console
 One common cause is that on the second and subsequent runs of mvn clean 
 test, at least two assembly JARs will exist in assembly/target. Because 
 assembly is not a submodule of parent, mvn clean is not invoked for 
 assembly. The presence of two assembly jars causes spark-submit to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code


[ 
https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116830#comment-14116830
 ] 

Apache Spark commented on SPARK-3331:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/

 PEP8 tests fail because they check unzipped py4j code
 -

 Key: SPARK-3331
 URL: https://issues.apache.org/jira/browse/SPARK-3331
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen
Priority: Minor

 PEP8 tests run on files under ./python, but unzipped py4j code is found at 
 ./python/build/py4j. Py4J code fails style checks and can fail 
 ./dev/run-tests if this code is present locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries


[ 
https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116868#comment-14116868
 ] 

Nicholas Chammas commented on SPARK-2870:
-

[~marmbrus], [~davies], [~yhuai] - We discussed this feature on the user list 
some weeks back. Just pinging you here to make sure this feature request is on 
your radar.

 Thorough schema inference directly on RDDs of Python dictionaries
 -

 Key: SPARK-2870
 URL: https://issues.apache.org/jira/browse/SPARK-2870
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Reporter: Nicholas Chammas

 h4. Background
 I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. 
 They process JSON text directly and infer a schema that covers the entire 
 source data set. 
 This is very important with semi-structured data like JSON since individual 
 elements in the data set are free to have different structures. Matching 
 fields across elements may even have different value types.
 For example:
 {code}
 {a: 5}
 {a: cow}
 {code}
 To get a queryable schema that covers the whole data set, you need to infer a 
 schema by looking at the whole data set. The aforementioned 
 {{SQLContext.json...()}} methods do this very well. 
 h4. Feature Request
 What we need is for {{SQlContext.inferSchema()}} to do this, too. 
 Alternatively, we need a new {{SQLContext}} method that works on RDDs of 
 Python dictionaries and does something functionally equivalent to this:
 {code}
 SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
 {code}
 As of 1.0.2, 
 [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema]
  just looks at the first element in the data set. This won't help much when 
 the structure of the elements in the target RDD is variable.
 h4. Example Use Case
 * You have some JSON text data that you want to analyze using Spark SQL. 
 * You would use one of the {{SQLContext.json...()}} methods, but you need to 
 do some filtering on the data first to remove bad elements--basically, some 
 minimal schema validation.
 * You deserialize the JSON objects to Python {{dict}} s and filter out the 
 bad ones. You now have an RDD of dictionaries.
 * From this RDD, you want a SchemaRDD that captures the schema for the whole 
 data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters

2014-08-31 Thread Allan Douglas R. de Oliveira (JIRA)

Allan Douglas R. de Oliveira created SPARK-3332:
---

 Summary: Tags shouldn't be the main strategy for machine 
membership on clusters
 Key: SPARK-3332
 URL: https://issues.apache.org/jira/browse/SPARK-3332
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Allan Douglas R. de Oliveira


The implementation for SPARK-2333 changed the machine membership mechanism from 
security groups to tags.

This is a fundamentally flawed strategy as there aren't guarantees at all the 
machines will have a tag (even with a retry mechanism).

For instance, if the script is killed after launching the instances but before 
setting the tags the machines will be invisible to a destroy command, leaving 
a unmanageable cluster behind.

The initial proposal is to go back to the previous behavior for all cases but 
when the new flag (--security-group-prefix) is used.

Also it's worthwhile to mention that SPARK-3180 introduced the 
--additional-security-group flag which is a reasonable solution to SPARK-2333 
(but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters


[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116890#comment-14116890
 ] 

Apache Spark commented on SPARK-3332:
-

User 'douglaz' has created a pull request for this issue:
https://github.com/apache/spark/pull/2223

 Tags shouldn't be the main strategy for machine membership on clusters
 --

 Key: SPARK-3332
 URL: https://issues.apache.org/jira/browse/SPARK-3332
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Allan Douglas R. de Oliveira

 The implementation for SPARK-2333 changed the machine membership mechanism 
 from security groups to tags.
 This is a fundamentally flawed strategy as there aren't guarantees at all the 
 machines will have a tag (even with a retry mechanism).
 For instance, if the script is killed after launching the instances but 
 before setting the tags the machines will be invisible to a destroy 
 command, leaving a unmanageable cluster behind.
 The initial proposal is to go back to the previous behavior for all cases but 
 when the new flag (--security-group-prefix) is used.
 Also it's worthwhile to mention that SPARK-3180 introduced the 
 --additional-security-group flag which is a reasonable solution to SPARK-2333 
 (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3010) fix redundant conditional


 [ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3010:
-
Assignee: wangfei

 fix redundant conditional
 -

 Key: SPARK-3010
 URL: https://issues.apache.org/jira/browse/SPARK-3010
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: wangfei
Assignee: wangfei
 Fix For: 1.1.0


 there are some redundant conditional in spark, such as 
 1.
 private[spark] def codegenEnabled: Boolean =
   if (getConf(CODEGEN_ENABLED, false) == true) true else false
 2.
 x = if (x == 2) true else false
 ... etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3010) fix redundant conditional


 [ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-3010.
--
  Resolution: Fixed
   Fix Version/s: (was: 1.1.0)
  1.2.0
Target Version/s: 1.2.0  (was: 1.1.0)

 fix redundant conditional
 -

 Key: SPARK-3010
 URL: https://issues.apache.org/jira/browse/SPARK-3010
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: wangfei
Assignee: wangfei
 Fix For: 1.2.0


 there are some redundant conditional in spark, such as 
 1.
 private[spark] def codegenEnabled: Boolean =
   if (getConf(CODEGEN_ENABLED, false) == true) true else false
 2.
 x = if (x == 2) true else false
 ... etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3010) fix redundant conditional


 [ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3010:
-
Priority: Trivial  (was: Major)

 fix redundant conditional
 -

 Key: SPARK-3010
 URL: https://issues.apache.org/jira/browse/SPARK-3010
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: wangfei
Assignee: wangfei
Priority: Trivial
 Fix For: 1.2.0


 there are some redundant conditional in spark, such as 
 1.
 private[spark] def codegenEnabled: Boolean =
   if (getConf(CODEGEN_ENABLED, false) == true) true else false
 2.
 x = if (x == 2) true else false
 ... etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3333) Large number of partitions causes OOM

Nicholas Chammas created SPARK-:
---

 Summary: Large number of partitions causes OOM
 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas


Here’s a repro for PySpark:

{code}
a = sc.parallelize([Nick, John, Bob])
a = a.repartition(24000)
a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
{code}

This code runs fine on 1.0.2. It returns the following result in just over a 
minute:

{code}
[(4, 'NickJohn')]
{code}

However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
runs for a very, very long time and then fails with 
{{java.lang.OutOfMemoryError: Java heap space}}.

Here is a stack trace taken from a run on 1.1.0-rc2:

{code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart 
beats: 175143ms exceeds 45000ms
14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
heart beats: 175359ms exceeds 45000ms
14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
heart beats: 173061ms exceeds 45000ms
14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart 
beats: 176816ms exceeds 45000ms
14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
heart beats: 182241ms exceeds 45000ms
14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart 
beats: 178406ms exceeds 45000ms
14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
thread-3
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
SendingConnection: Exception while reading SendingConnection to 
ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
at 
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116895#comment-14116895
 ] 

Nicholas Chammas commented on SPARK-:
-

Note: I have not yet confirmed that 1.1.0-rc3 yields the exact same stack trace 
as the one provided above (which is for 1.1.0-rc2), though I expect them to be 
the same. I _can_ confirm that it takes a very, very long time to run, as it is 
running right now on rc3 and has been for about 45 minutes. Since I have to be 
offline for a bit, I thought I'd report this issue ASAP with the rc2 stack 
trace and update it later with a stack trace from rc3.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to

[jira] [Updated] (SPARK-3333) Large number of partitions causes OOM


 [ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-:

Description: 
Here’s a repro for PySpark:

{code}
a = sc.parallelize([Nick, John, Bob])
a = a.repartition(24000)
a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
{code}

This code runs fine on 1.0.2. It returns the following result in just over a 
minute:

{code}
[(4, 'NickJohn')]
{code}

However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
runs for a very, very long time (at least  45 min) and then fails with 
{{java.lang.OutOfMemoryError: Java heap space}}.

Here is a stack trace taken from a run on 1.1.0-rc2:

{code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart 
beats: 175143ms exceeds 45000ms
14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
heart beats: 175359ms exceeds 45000ms
14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
heart beats: 173061ms exceeds 45000ms
14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart 
beats: 176816ms exceeds 45000ms
14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
heart beats: 182241ms exceeds 45000ms
14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart 
beats: 178406ms exceeds 45000ms
14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
thread-3
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
SendingConnection: Exception while reading SendingConnection to 
ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
at 
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116902#comment-14116902
 ] 

Patrick Wendell commented on SPARK-:


Hey [~nchammas] - I don't think anything relevant to this issue has changed 
between RC2 and RC3, so the RC2 trace is probably sufficient.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at

[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2


 [ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3332:
---
Summary: Tagging is not atomic with launching instances on EC2  (was: Tags 
shouldn't be the main strategy for machine membership on clusters)

 Tagging is not atomic with launching instances on EC2
 -

 Key: SPARK-3332
 URL: https://issues.apache.org/jira/browse/SPARK-3332
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Allan Douglas R. de Oliveira

 The implementation for SPARK-2333 changed the machine membership mechanism 
 from security groups to tags.
 This is a fundamentally flawed strategy as there aren't guarantees at all the 
 machines will have a tag (even with a retry mechanism).
 For instance, if the script is killed after launching the instances but 
 before setting the tags the machines will be invisible to a destroy 
 command, leaving a unmanageable cluster behind.
 The initial proposal is to go back to the previous behavior for all cases but 
 when the new flag (--security-group-prefix) is used.
 Also it's worthwhile to mention that SPARK-3180 introduced the 
 --additional-security-group flag which is a reasonable solution to SPARK-2333 
 (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2

[
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116906#comment-14116906
]

Patrick Wendell commented on SPARK-3332:

I changed the title slightly - I think the underlying problem here is that
tagging is not atomic with launching instances. Removing the use of tagging
entirely is one potential solution for this. Another is that we just print
better errors if tagging does not succeed and explain there might be orphaned
instances.

The reason why SPARK-2333 was added is that the current approach can lead to
too many security groups, so we can't just revert this without any cost.

In the mean time I'd like to revert SPARK-2333 in branch-1.1 so we can defer
the design decision to the 1.2 release timeframe. Thanks [~douglaz] for
highlighting this issue.

Tagging is not atomic with launching instances on EC2
-

Key: SPARK-3332
URL: https://issues.apache.org/jira/browse/SPARK-3332
Project: Spark
Issue Type: Bug
Components: EC2
Reporter: Allan Douglas R. de Oliveira

The implementation for SPARK-2333 changed the machine membership mechanism
from security groups to tags.
This is a fundamentally flawed strategy as there aren't guarantees at all the
machines will have a tag (even with a retry mechanism).
For instance, if the script is killed after launching the instances but
before setting the tags the machines will be invisible to a destroy
command, leaving a unmanageable cluster behind.
The initial proposal is to go back to the previous behavior for all cases but
when the new flag (--security-group-prefix) is used.
Also it's worthwhile to mention that SPARK-3180 introduced the
--additional-security-group flag which is a reasonable solution to SPARK-2333
(but isn't a full replacement to all use cases of --security-group-prefix).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2


 [ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3332:
---
Target Version/s: 1.2.0

 Tagging is not atomic with launching instances on EC2
 -

 Key: SPARK-3332
 URL: https://issues.apache.org/jira/browse/SPARK-3332
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Allan Douglas R. de Oliveira

 The implementation for SPARK-2333 changed the machine membership mechanism 
 from security groups to tags.
 This is a fundamentally flawed strategy as there aren't guarantees at all the 
 machines will have a tag (even with a retry mechanism).
 For instance, if the script is killed after launching the instances but 
 before setting the tags the machines will be invisible to a destroy 
 command, leaving a unmanageable cluster behind.
 The initial proposal is to go back to the previous behavior for all cases but 
 when the new flag (--security-group-prefix) is used.
 Also it's worthwhile to mention that SPARK-3180 introduced the 
 --additional-security-group flag which is a reasonable solution to SPARK-2333 
 (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2


[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116910#comment-14116910
 ] 

Apache Spark commented on SPARK-3332:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2225

 Tagging is not atomic with launching instances on EC2
 -

 Key: SPARK-3332
 URL: https://issues.apache.org/jira/browse/SPARK-3332
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Allan Douglas R. de Oliveira

 The implementation for SPARK-2333 changed the machine membership mechanism 
 from security groups to tags.
 This is a fundamentally flawed strategy as there aren't guarantees at all the 
 machines will have a tag (even with a retry mechanism).
 For instance, if the script is killed after launching the instances but 
 before setting the tags the machines will be invisible to a destroy 
 command, leaving a unmanageable cluster behind.
 The initial proposal is to go back to the previous behavior for all cases but 
 when the new flag (--security-group-prefix) is used.
 Also it's worthwhile to mention that SPARK-3180 introduced the 
 --additional-security-group flag which is a reasonable solution to SPARK-2333 
 (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2014-08-31 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2870:

Target Version/s: 1.2.0

 Thorough schema inference directly on RDDs of Python dictionaries
 -

 Key: SPARK-2870
 URL: https://issues.apache.org/jira/browse/SPARK-2870
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Reporter: Nicholas Chammas

 h4. Background
 I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. 
 They process JSON text directly and infer a schema that covers the entire 
 source data set. 
 This is very important with semi-structured data like JSON since individual 
 elements in the data set are free to have different structures. Matching 
 fields across elements may even have different value types.
 For example:
 {code}
 {a: 5}
 {a: cow}
 {code}
 To get a queryable schema that covers the whole data set, you need to infer a 
 schema by looking at the whole data set. The aforementioned 
 {{SQLContext.json...()}} methods do this very well. 
 h4. Feature Request
 What we need is for {{SQlContext.inferSchema()}} to do this, too. 
 Alternatively, we need a new {{SQLContext}} method that works on RDDs of 
 Python dictionaries and does something functionally equivalent to this:
 {code}
 SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
 {code}
 As of 1.0.2, 
 [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema]
  just looks at the first element in the data set. This won't help much when 
 the structure of the elements in the target RDD is variable.
 h4. Example Use Case
 * You have some JSON text data that you want to analyze using Spark SQL. 
 * You would use one of the {{SQLContext.json...()}} methods, but you need to 
 do some filtering on the data first to remove bad elements--basically, some 
 minimal schema validation.
 * You deserialize the JSON objects to Python {{dict}} s and filter out the 
 bad ones. You now have an RDD of dictionaries.
 * From this RDD, you want a SchemaRDD that captures the schema for the whole 
 data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2014-08-31 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116915#comment-14116915
 ] 

Michael Armbrust commented on SPARK-2870:
-

Yeah, thanks for pinging me.  I've targeted this JIRA for 1.2.

 Thorough schema inference directly on RDDs of Python dictionaries
 -

 Key: SPARK-2870
 URL: https://issues.apache.org/jira/browse/SPARK-2870
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Reporter: Nicholas Chammas

 h4. Background
 I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. 
 They process JSON text directly and infer a schema that covers the entire 
 source data set. 
 This is very important with semi-structured data like JSON since individual 
 elements in the data set are free to have different structures. Matching 
 fields across elements may even have different value types.
 For example:
 {code}
 {a: 5}
 {a: cow}
 {code}
 To get a queryable schema that covers the whole data set, you need to infer a 
 schema by looking at the whole data set. The aforementioned 
 {{SQLContext.json...()}} methods do this very well. 
 h4. Feature Request
 What we need is for {{SQlContext.inferSchema()}} to do this, too. 
 Alternatively, we need a new {{SQLContext}} method that works on RDDs of 
 Python dictionaries and does something functionally equivalent to this:
 {code}
 SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
 {code}
 As of 1.0.2, 
 [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema]
  just looks at the first element in the data set. This won't help much when 
 the structure of the elements in the target RDD is variable.
 h4. Example Use Case
 * You have some JSON text data that you want to analyze using Spark SQL. 
 * You would use one of the {{SQLContext.json...()}} methods, but you need to 
 do some filtering on the data first to remove bad elements--basically, some 
 minimal schema validation.
 * You deserialize the JSON objects to Python {{dict}} s and filter out the 
 bad ones. You now have an RDD of dictionaries.
 * From this RDD, you want a SchemaRDD that captures the schema for the whole 
 data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3317) The loss of regularization in Updater should use the oldWeights

2014-08-31 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai closed SPARK-3317.
--
Resolution: Won't Fix

 The loss of regularization in Updater should use the oldWeights
 ---

 Key: SPARK-3317
 URL: https://issues.apache.org/jira/browse/SPARK-3317
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: DB Tsai

 The current loss of the regularization is computed from the newWeights which 
 is not correct.  The loss, R(w) = 1/2 ||w||^2 should be computed with the 
 oldWeights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116923#comment-14116923
 ] 

Matei Zaharia commented on SPARK-:
--

The slowdown might be partly due to adding external spilling in Python, but 
it's weird that this would crash the driver.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
 at

[jira] [Resolved] (SPARK-3320) Batched in-memory column buffer building doesn't work for SchemaRDDs with empty partitions

2014-08-31 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3320.
-
   Resolution: Fixed
Fix Version/s: 1.1.0
 Assignee: Cheng Lian

 Batched in-memory column buffer building doesn't work for SchemaRDDs with 
 empty partitions
 --

 Key: SPARK-3320
 URL: https://issues.apache.org/jira/browse/SPARK-3320
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker
 Fix For: 1.1.0


 Empty partition iterator is not properly handled in 
 [#1880|https://github.com/apache/spark/pull/1880/files#diff-b47dac3d98014877d5879f5cf37ab0d1R115],
  and throws exception when accessing empty partition of the target SchemaRDD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116935#comment-14116935
 ] 

Davies Liu edited comment on SPARK- at 8/31/14 11:26 PM:
-

[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 

PS: I am running on master.



was (Author: davies):
[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 


 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at

[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116935#comment-14116935
 ] 

Davies Liu edited comment on SPARK- at 8/31/14 11:27 PM:
-

[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 

The CPU usage of JVM is 250%, memory is about 655M, it may trigger OOM 
somewhere. 

PS: I am running on master.



was (Author: davies):
[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 

PS: I am running on master.


 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116936#comment-14116936
 ] 

Josh Rosen commented on SPARK-:
---

I agree with Davies; I think this is a more general Spark issue, perhaps 
related to {{repartition()}}.

I just tried testing this locally with commit 
eff9714e1c88e39e28317358ca9ec87677f121dc, which is the commit immediately prior 
to 
[14174abd421318e71c16edd24224fd5094bdfed4|https://github.com/apache/spark/commit/14174abd421318e71c16edd24224fd5094bdfed4],
 Davies' patch that adds hash-based disk spilling aggregation to PySpark, and I 
still saw the same slowdown there.



 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116942#comment-14116942
 ] 

Matei Zaharia commented on SPARK-:
--

I see, that makes sense.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
 at 
 org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
 at

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116948#comment-14116948
 ] 

Josh Rosen commented on SPARK-:
---

Working on doing some manual bisecting to find the patch that introduced the 
slowdown.  It's still slow as early as 8d338f64c4eda45d22ae33f61ef7928011cc2846.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at

[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Allan Douglas R. de Oliveira (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116950#comment-14116950
]

Allan Douglas R. de Oliveira commented on SPARK-3332:
-

[~pwendell], yes this is a good reword and I agree to revert it for now. You
mentioned the two potential solutions but I think a good compromise is the one
implemented by the PR which keeps the flag allowing reuse of the same security
group but also allowing to match the machines by the security group in the
other cases.

Tagging is not atomic with launching instances on EC2
-

Key: SPARK-3332
URL: https://issues.apache.org/jira/browse/SPARK-3332
Project: Spark
Issue Type: Bug
Components: EC2
Reporter: Allan Douglas R. de Oliveira

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Allan Douglas R. de Oliveira (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116950#comment-14116950
]

Allan Douglas R. de Oliveira edited comment on SPARK-3332 at 9/1/14 12:49 AM:
--

Perhaps more messages could be added when the flag has been used and the
tagging failed.

was (Author: douglaz):
[~pwendell], yes this is a good reword and I agree to revert it for now. You
mentioned the two potential solutions but I think a good compromise is the one
implemented by the PR which keeps the flag allowing reuse of the same security
group but also allowing to match the machines by the security group in the
other cases.

Tagging is not atomic with launching instances on EC2
-

Key: SPARK-3332
URL: https://issues.apache.org/jira/browse/SPARK-3332
Project: Spark
Issue Type: Bug
Components: EC2
Reporter: Allan Douglas R. de Oliveira

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116969#comment-14116969
 ] 

Josh Rosen commented on SPARK-:
---

Looks like the issue was introduced somewhere between 273afcb and 62d4a0f.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
 at

[jira] [Commented] (SPARK-3168) The ServletContextHandler of webui lacks a SessionManager

2014-08-31 Thread meiyoula (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116974#comment-14116974
 ] 

meiyoula commented on SPARK-3168:
-

To use CAS for single sign-on, i add some filters of webui in configuration.For 
History Server, I set the following configuration:
export SPARK_HISTORY_OPTS=$SPARK_HISTORY_OPTS 
-Dspark.ui.filters=org.jasig.cas.client.authentication.Saml11AuthenticationFilter,org.jasig.cas.client.validation.Saml11TicketValidationFilter,org.jasig.cas.client.util.HttpServletRequestWrapperFilter
 
-Dspark.org.jasig.cas.client.authentication.Saml11AuthenticationFilter.params=casServerLoginUrl=https://9.91.11.120:8443/cas/login,serverName=http://9.91.11.171:18080
 
-Dspark.org.jasig.cas.client.validation.Saml11TicketValidationFilter.params=casServerUrlPrefix=https://9.91.11.120:8443/cas/,serverName=http://9.91.11.171:18080,hostnameVerifier=org.jasig.cas.client.ssl.AnyHostnameVerifier
 

 The ServletContextHandler of webui lacks a SessionManager
 -

 Key: SPARK-3168
 URL: https://issues.apache.org/jira/browse/SPARK-3168
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
 Environment: CAS
Reporter: meiyoula

 When i use CAS to realize single sign of webui, it occurs a exception:
 {code}
 WARN  [qtp1076146544-24] / 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:561)
 java.lang.IllegalStateException: No SessionManager
 at org.eclipse.jetty.server.Request.getSession(Request.java:1269)
 at org.eclipse.jetty.server.Request.getSession(Request.java:1248)
 at 
 org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:178)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
 at 
 org.jasig.cas.client.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:116)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
 at 
 org.jasig.cas.client.session.SingleSignOutFilter.doFilter(SingleSignOutFilter.java:76)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:370)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
 at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
 at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116978#comment-14116978
 ] 

Nicholas Chammas commented on SPARK-:
-

For the record, I got the OOM in a relatively short amount of time (less than 
30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of 
y'all can replicate the OOM with that kind of environment.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at

[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116978#comment-14116978
 ] 

Nicholas Chammas edited comment on SPARK- at 9/1/14 2:09 AM:
-

For the record, I got the OOM in [my original 
report|http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-large-of-partitions-causes-OOM-td13155.html]
 (which I duplicated here in this JIRA) in a relatively short amount of time 
(less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. 
Perhaps one of y'all can replicate the OOM with that kind of environment.


was (Author: nchammas):
For the record, I got the OOM in a relatively short amount of time (less than 
30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of 
y'all can replicate the OOM with that kind of environment.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116985#comment-14116985
 ] 

Josh Rosen commented on SPARK-:
---

I'll resume work on this later tonight, but just wanted to note that things run 
fast as recently as commit 5ad5e34 and slow down as long ago as 6587ef7.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at

[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak

2014-08-31 Thread Iven Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iven Hsu updated SPARK-3334:

Description: 
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, its memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.

  was:
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, it's memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.


 Spark causes mesos-master memory leak
 -

 Key: SPARK-3334
 URL: https://issues.apache.org/jira/browse/SPARK-3334
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.2
 Environment: Mesos 0.16.0/0.19.0
 CentOS 6.4
Reporter: Iven Hsu

 The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to 
 workaround SPARK-1112, this causes all serialized task result is sent using 
 Mesos TaskStatus.
 mesos-master stores TaskStatus in memory, and when running Spark, its memory 
 grows very fast, and will be OOM killed.
 See MESOS-1746 for more.
 I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
 however, the driver will block after success unless I use {{sc.stop()}} to 
 quit it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3334) Spark causes mesos-master memory leak

2014-08-31 Thread Iven Hsu (JIRA)

Iven Hsu created SPARK-3334:
---

 Summary: Spark causes mesos-master memory leak
 Key: SPARK-3334
 URL: https://issues.apache.org/jira/browse/SPARK-3334
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.2
 Environment: Mesos 0.16.0/0.19.0
CentOS 6.4
Reporter: Iven Hsu


The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, it's memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak

2014-08-31 Thread Iven Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iven Hsu updated SPARK-3334:

Description: 
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, its memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.

  was:
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, its memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.


 Spark causes mesos-master memory leak
 -

 Key: SPARK-3334
 URL: https://issues.apache.org/jira/browse/SPARK-3334
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.2
 Environment: Mesos 0.16.0/0.19.0
 CentOS 6.4
Reporter: Iven Hsu

 The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to 
 workaround SPARK-1112, this causes all serialized task result is sent using 
 Mesos TaskStatus.
 mesos-master stores TaskStatus in memory, and when running Spark, its memory 
 grows very fast, and will be OOM killed.
 See MESOS-1746 for more.
 I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
 however, the driver will block after success unless I use {{sc.stop()}} to 
 quit it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ


 [ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reassigned SPARK-3324:
--

Assignee: Patrick Wendell

 YARN module has nonstandard structure which cause compile error In IntelliJ
 ---

 Key: SPARK-3324
 URL: https://issues.apache.org/jira/browse/SPARK-3324
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.0
 Environment: Mac OS: 10.9.4
 IntelliJ IDEA: 13.1.4
 Scala Plugins: 0.41.2
 Maven: 3.0.5
Reporter: Yi Tian
Assignee: Patrick Wendell
Priority: Minor
  Labels: intellij, maven, yarn

 The YARN module has nonstandard path structure like:
 {code}
 ${SPARK_HOME}
   |--yarn
  |--alpha (contains yarn api support for 0.23 and 2.0.x)
  |--stable (contains yarn api support for 2.2 and later)
  | |--pom.xml (spark-yarn)
  |--common (Common codes not depending on specific version of Hadoop)
  |--pom.xml (yarn-parent)
 {code}
 When we use maven to compile yarn module, maven will import 'alpha' or 
 'stable' module according to profile setting.
 And the submodule like 'stable' use the build propertie defined in 
 yarn/pom.xml to import common codes to sourcePath.
 It will cause IntelliJ can't directly recognize sources in common directory 
 as sourcePath. 
 I thought we should change the yarn module to a unified maven jar project, 
 and add specify different version of yarn api via maven profile setting.
 It will resolve the compile error in IntelliJ and make the yarn module more 
 simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117016#comment-14117016
 ] 

Josh Rosen commented on SPARK-:
---

It looks like https://github.com/apache/spark/pull/1138 may be the culprit, 
since this job runs quickly immediately prior to that commit.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at org.apache.spark.network.SendingConnection.read(Connection.scala:390)

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117019#comment-14117019
 ] 

Davies Liu commented on SPARK-:
---

@joserosen This should not be the culprit, it just show the bad things up in 
PySpark. Before it, the default partitions of reduceByKey() could be something 
much smaller, such as 4.

The root cause should be inside Scala, you should use the Scala one to test it.

 Large number of partitions causes OOM
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas

 Here’s a repro for PySpark:
 {code}
 a = sc.parallelize([Nick, John, Bob])
 a = a.repartition(24000)
 a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 {code}
 This code runs fine on 1.0.2. It returns the following result in just over a 
 minute:
 {code}
 [(4, 'NickJohn')]
 {code}
 However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
 runs for a very, very long time (at least  45 min) and then fails with 
 {{java.lang.OutOfMemoryError: Java heap space}}.
 Here is a stack trace taken from a run on 1.1.0-rc2:
 {code}
  a = sc.parallelize([Nick, John, Bob])
  a = a.repartition(24000)
  a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
 heart beats: 175143ms exceeds 45000ms
 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
 heart beats: 175359ms exceeds 45000ms
 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
 heart beats: 173061ms exceeds 45000ms
 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
 heart beats: 176816ms exceeds 45000ms
 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
 heart beats: 182241ms exceeds 45000ms
 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
 heart beats: 178406ms exceeds 45000ms
 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
 thread-3
 java.lang.OutOfMemoryError: Java heap space
 at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Exception in thread Result resolver thread-3 14/08/29 21:56:26 ERROR 
 SendingConnection: Exception while reading SendingConnection to 
 ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
 java.nio.channels.ClosedChannelException
 at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM