[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-4104 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-4769 Can you try using the existing `addTaskCompletionListener` with `context.isInterrupted()`, roll back the other changes / listener interfaces, and add a streaming unit

[GitHub] spark pull request: [SPARK-5134] Bump default hadoop.version to 2....

2015-03-08 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3917#issuecomment-5142 Hey I commented on the JIRA, but some recent changes in the way we publish artifacts actually makes this more tenable of a change.

[GitHub] spark pull request: [GraphX] Improve LiveJournalPageRank example

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4917#issuecomment-77769395 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-4567 Actually, this just occurred to me: why not use `addTaskCompletionListener` and check `TaskContext.isInterrupted()` inside of your listener in order to decide whether

[GitHub] spark pull request: [GraphX] Improve LiveJournalPageRank example

2015-03-08 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/spark/pull/4917#discussion_r26007391 --- Diff: examples/src/main/scala/org/apache/spark/examples/graphx/LiveJournalPageRank.scala --- @@ -30,14 +25,14 @@ object LiveJournalPageRank { def

[GitHub] spark pull request: [GraphX] Improve LiveJournalPageRank example

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4917#issuecomment-77769390 [Test build #28373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28373/consoleFull) for PR 4917 at commit

[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-4362 I think we discussed this on the first PR, but it would be good to add an explicit note here summarizing why we're not using `TaskCompletionListener`, since its

[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-4344 [Test build #28374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28374/consoleFull) for PR 4135 at commit

[GitHub] spark pull request: [GraphX] Improve LiveJournalPageRank example

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4917#issuecomment-77765206 [Test build #28373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28373/consoleFull) for PR 4917 at commit

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/4940 [SPARK-6215][SQL] Shorten apply and update funcs in GenerateProjection Some codes in `GenerateProjection` look redundant and can be shortened. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4940#issuecomment-77741269 [Test build #28371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28371/consoleFull) for PR 4940 at commit

[GitHub] spark pull request: [EC2] [SPARK-6188] Instance types can be misla...

2015-03-08 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4916#issuecomment-77781120 @srowen I'm fine with the line either way and it shouldn't really matter. As I described above the problem of master_instance_type being different from slaves is a

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4923#discussion_r26011565 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -575,15 +583,32 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4935#issuecomment-77786580 I'm going to close this PR and re-open against master, since it looks like the conflicts shouldn't be too bad. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26011962 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/SparkKafkaUtils.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4923#issuecomment-77785375 Now that this has been updated to collect results via a socket, it looks like we may finally be able to close https://issues.apache.org/jira/browse/SPARK-677, one of

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786363 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786367 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786385 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786348 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26011997 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: Branch 1.3

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4942#issuecomment-77790044 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4940#discussion_r26011394 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateProjection.scala --- @@ -84,9 +84,15 @@ object

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786280 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4923#issuecomment-77786229 This looks really good to me overall. It would be great if you could update the PR description to reflect the most recent changes (collecting via a socket instead of

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-77787450 Hi @koeninger , thanks a lot for your review. I will the fix the all the comments you addressed. The reason why I put updating ZK in `StreamingListener` rather

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26012257 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6186] [EC2] Make Tachyon version config...

2015-03-08 Thread uronce-cc
Github user uronce-cc commented on a diff in the pull request: https://github.com/apache/spark/pull/4901#discussion_r26012420 --- Diff: ec2/spark_ec2.py --- @@ -872,9 +890,16 @@ def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules): if . in

[GitHub] spark pull request: [SPARK-6177][MLlib] LDA should check partition...

2015-03-08 Thread hhbyyh
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/4899#issuecomment-77789383 Thanks a lot for providing the feedback. Move it to comments as suggested. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: Branch 1.3

2015-03-08 Thread yejiming
GitHub user yejiming opened a pull request: https://github.com/apache/spark/pull/4942 Branch 1.3 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.3 Alternatively you can review and apply these changes as

[GitHub] spark pull request: Update RandomForest.scala

2015-03-08 Thread yejiming
GitHub user yejiming opened a pull request: https://github.com/apache/spark/pull/4943 Update RandomForest.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/yejiming/spark master Alternatively you can review and apply these

[GitHub] spark pull request: Update RandomForest.scala

2015-03-08 Thread yejiming
Github user yejiming closed the pull request at: https://github.com/apache/spark/pull/4943 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6157][CORE]Unroll unsuccessful memory_a...

2015-03-08 Thread suyanNone
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/4887#issuecomment-77791515 @srowen also be my fault, for lazy to make description more clear... I will update the description to make more sense --- If your project is set up for it, you can

[GitHub] spark pull request: Branch 1.3

2015-03-08 Thread yejiming
Github user yejiming closed the pull request at: https://github.com/apache/spark/pull/4942 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6219] [Build] Check that Python code co...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4941#issuecomment-77792236 [Test build #28375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28375/consoleFull) for PR 4941 at commit

[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-77792262 [Test build #28377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28377/consoleFull) for PR 4382 at commit

[GitHub] spark pull request: [SPARK-6219] [Build] Check that Python code co...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4941#issuecomment-77792239 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-6198][SQL] Support select current_data...

2015-03-08 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4926#discussion_r26013109 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala --- @@ -179,7 +179,12 @@ private[hive] case class

[GitHub] spark pull request: [SPARK-6198][SQL] Support select current_data...

2015-03-08 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4926#discussion_r26013144 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/UDFSuite.scala --- @@ -32,5 +32,6 @@ class UDFSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26013229 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -82,8 +83,12 @@ class

[GitHub] spark pull request: [SPARK-6183][Deploy] Skip bad workers when re-...

2015-03-08 Thread zhpengg
Github user zhpengg commented on a diff in the pull request: https://github.com/apache/spark/pull/4909#discussion_r26013467 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -467,7 +467,9 @@ private[spark] class Master( * two executors on the

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4940#issuecomment-77793875 @chenghao-intel These are the random accessors for the row (`SpecificRow`) objects produced by the projection `GenerateProjection`. So I think they are not resolved

[GitHub] spark pull request: [SPARK-6177][MLlib] LDA should check partition...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4899#issuecomment-77794351 [Test build #28376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28376/consoleFull) for PR 4899 at commit

[GitHub] spark pull request: [SPARK-6177][MLlib] LDA should check partition...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4899#issuecomment-77794358 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4940#discussion_r26013756 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateProjection.scala --- @@ -84,9 +84,15 @@ object

[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-8576 [Test build #28374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28374/consoleFull) for PR 4135 at commit

[GitHub] spark pull request: [SPARK-5205][Streaming]:Inconsistent behaviour...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4135#issuecomment-8577 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4923#discussion_r26011521 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -341,7 +342,7 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-6177][MLlib] LDA should check partition...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4899#issuecomment-77789229 [Test build #28376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28376/consoleFull) for PR 4899 at commit

[GitHub] spark pull request: [EC2] [SPARK-6188] Instance types can be misla...

2015-03-08 Thread shivaram
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/4916#discussion_r26010428 --- Diff: ec2/spark_ec2.py --- @@ -1259,6 +1259,15 @@ def real_main(): cluster_instances=(master_nodes + slave_nodes),

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4923#discussion_r26011616 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -575,15 +583,32 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4923#discussion_r26011596 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -575,15 +583,32 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-6219] [Build] Check that Python code co...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4941#issuecomment-77786854 [Test build #28375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28375/consoleFull) for PR 4941 at commit

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-03-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-77786791 Thanks @tdas for your review, maybe we should figure out a way to test the Kafka Python API at first. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-6219] [Build] Check that Python code co...

2015-03-08 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/4941 [SPARK-6219] [Build] Check that Python code compiles This PR expands the Python lint checks so that they check for obvious compilation errors in our Python code. This PR also bumps up the

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4944#issuecomment-77798728 [Test build #28378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28378/consoleFull) for PR 4944 at commit

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-77800424 [Test build #28379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28379/consoleFull) for PR 4805 at commit

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-77801344 Hi @koeninger , would you please review this again? Thanks a lot and appreciate your time. Here I still keep using the HashMap for Time - offset relation

[GitHub] spark pull request: [SPARK-6185][SQL] Deltele repeated TOKEN. TOK...

2015-03-08 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4907#issuecomment-77801338 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-77798081 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-77798078 [Test build #28377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28377/consoleFull) for PR 4382 at commit

[GitHub] spark pull request: [SPARK-5651][SQL] Add input64 in blacklist and...

2015-03-08 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4427#issuecomment-77801470 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4944#issuecomment-77803785 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4944#issuecomment-77803782 [Test build #28378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28378/consoleFull) for PR 4944 at commit

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread JoshRosen
Github user JoshRosen closed the pull request at: https://github.com/apache/spark/pull/4935 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4935#issuecomment-77798705 I've opened an updated PR against the master branch at #4944 and added a regression test (which was slightly non-trivial to write). --- If your project is set up for

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-08 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/4944 [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR) ExecutorClassLoader does not ensure proper cleanup of network connections that it opens.

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-77800707 [Test build #28380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28380/consoleFull) for PR 4805 at commit

[GitHub] spark pull request: [SPARK-1503][MLLIB] Initial AcceleratedGradien...

2015-03-08 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/4934#issuecomment-77801646 Thank you for this PR @staple ! @mengxr I suggested to @staple to first implement without backtracking to keep the PR as simple as possible. According to his

[GitHub] spark pull request: [SPARK-6185][SQL] Deltele repeated TOKEN. TOK...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4907#issuecomment-77801605 [Test build #28381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28381/consoleFull) for PR 4907 at commit

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2015-03-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-77749877 In order to close this out, is it worth just merging this into 1.0 as @JoshRosen suggests? Same for some of these other back-ports to 0.9 or 1.0. --- If your

[GitHub] spark pull request: [GraphX] Improve LiveJournalPageRank example

2015-03-08 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4917#discussion_r26005247 --- Diff: examples/src/main/scala/org/apache/spark/examples/graphx/LiveJournalPageRank.scala --- @@ -30,14 +25,14 @@ object LiveJournalPageRank { def

[GitHub] spark pull request: [SPARK-6193] [EC2] Push group filter up to EC2

2015-03-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4922#issuecomment-77750283 Sounds like a good improvement, changes look OK to my mildly informed eyes, and you have both reviewed and tested the change. LGTM. --- If your project is set up for it,

[GitHub] spark pull request: [EC2] [SPARK-6188] Instance types can be misla...

2015-03-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4916#issuecomment-77750513 I'll wait another day in case the consensus is that this line should indeed change, but it sounds like it is likely fine as is. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-6193] [EC2] Push group filter up to EC2

2015-03-08 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4922 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4940#issuecomment-77743415 [Test build #28371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28371/consoleFull) for PR 4940 at commit

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4940#issuecomment-77743416 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [GraphX] Improve LiveJournalPageRank example

2015-03-08 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4917#discussion_r26005234 --- Diff: bin/pyspark --- @@ -60,6 +60,9 @@ fi # # For backwards-compatibility, we retain the old IPYTHON and IPYTHON_OPTS variables.

[GitHub] spark pull request: SPARK-6205 [CORE] UISeleniumSuite fails for Ha...

2015-03-08 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4933 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6186] [EC2] Make Tachyon version config...

2015-03-08 Thread shivaram
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/4901#discussion_r26010373 --- Diff: ec2/spark_ec2.py --- @@ -872,9 +890,16 @@ def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules): if . in

[GitHub] spark pull request: [SPARK-6215][SQL] Shorten apply and update fun...

2015-03-08 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4940#issuecomment-77785497 @viirya I am not sure if we really need this change, seems the checking `if (i 0 || i = this.length)` is not necessary to me, because the `ordinal` should have

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786424 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786413 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-08 Thread kellyzly
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-77786426 @srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted

[GitHub] spark pull request: [SPARK-6219] [Build] Check that Python code co...

2015-03-08 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4941#issuecomment-77787089 cc @JoshRosen @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...

2015-03-08 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4930#issuecomment-77787040 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26012160 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6177][MLlib] LDA should check partition...

2015-03-08 Thread hhbyyh
GitHub user hhbyyh reopened a pull request: https://github.com/apache/spark/pull/4899 [SPARK-6177][MLlib] LDA should check partitions size of the input JIRA: https://issues.apache.org/jira/browse/SPARK-6177 Add coalesce to LDA example to avoid the possible massive partitions

[GitHub] spark pull request: [SPARK-6145][SQL] fix ORDER BY on nested field...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4904#issuecomment-77756133 [Test build #28372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28372/consoleFull) for PR 4904 at commit

[GitHub] spark pull request: [SPARK-6145][SQL] fix ORDER BY on nested field...

2015-03-08 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4904#issuecomment-77756049 Hi @marmbrus , it feels hard for me to resolve the base attribute but not the GetFields that are on top. When we get into `LogicalPlan#resolve`, the `Attribute`s are

[GitHub] spark pull request: [SPARK-6145][SQL] fix ORDER BY on nested field...

2015-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4904#issuecomment-77760987 [Test build #28372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28372/consoleFull) for PR 4904 at commit

[GitHub] spark pull request: [SPARK-6145][SQL] fix ORDER BY on nested field...

2015-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4904#issuecomment-77760988 Test PASSed. Refer to this link for build results (access rights to CI server needed):