[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...

2015-03-30 Thread florianverhein
Github user florianverhein commented on the pull request:

https://github.com/apache/spark/pull/5257#issuecomment-87570414
  
Some things to think about:
- Do we want an option for this (e.g. as for ganglia)? I haven't done this 
as I think it would be confusing at the moment, since a user would assume that 
the option would enable the hdfs nfs gateway on the cluster. However as far as 
I'm aware, spark-ec2 doesn't do this yet (#6601). So I think it would be better 
if the option were added as part of that work.
- Further, since the ports are opened to the authorized address, I don't 
see a problem in having this done by default for now.

I have tested this with a spark-ec2 cluster running the gateway (i.e. with 
these settings, I can mount the hfds on my local machine - which is really 
handy!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...

2015-03-30 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/5256#issuecomment-87570358
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-87572136
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29390/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-87577323
  
  [Test build #29391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29391/consoleFull)
 for   PR 5178 at commit 
[`6c5c1d4`](https://github.com/apache/spark/commit/6c5c1d4d143e4806edd6cf747b84c56f992f14a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5203][SQL] fix union with different dec...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4004#issuecomment-87570883
  
  [Test build #29388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29388/consoleFull)
 for   PR 4004 at commit 
[`e6614e8`](https://github.com/apache/spark/commit/e6614e828473633847f50927b090e33480699486).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-30 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-87571847
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-87566631
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29385/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-sql] a better exception message than s...

2015-03-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5235#issuecomment-87572423
  
Thanks. I'm going to merge this in master  branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-sql] a better exception message than s...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5235#issuecomment-87571212
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29386/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Master

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5258#issuecomment-87579377
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Master

2015-03-30 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/5258#issuecomment-87581081
  
Did you mean to submit this as a pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Specify ip of python server scoket

2015-03-30 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5256

Specify ip of python server scoket

In driver now will start a server socket and use a wildcard ip, use 
127.0.0.0 is more reasonable, as we only use it by local Python process.
/cc @davies

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-6604

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5256


commit c88bee9819eef5a8091357d6a239e9ab61da0050
Author: unknown l00251...@hghy1l002515991.china.huawei.com
Date:   2015-03-30T06:21:07Z

Specify ip of python server scoket




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...

2015-03-30 Thread florianverhein
GitHub user florianverhein opened a pull request:

https://github.com/apache/spark/pull/5257

[EC2] [SPARK-6600] Open ports in ec2/spark_ec2.py to allow HDFS NFS gateway

Authorizes incoming access to master on the ports required to use the 
hadoop hdfs nfs gateway from outside the cluster.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/florianverhein/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5257


commit 72a586a68491608a32cbd5e83d0268cba8b1c18a
Author: Florian Verhein florian.verh...@gmail.com
Date:   2015-03-30T04:23:40Z

[EC2] [SPARK-6600] initial impl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...

2015-03-30 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/5256#issuecomment-87571479
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Master

2015-03-30 Thread nbawzl2004
GitHub user nbawzl2004 opened a pull request:

https://github.com/apache/spark/pull/5258

Master



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sparkmatrix/spark-matrix master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5258.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5258


commit 25168a787d04ac8d01dc11d24834d66567f9c602
Author: Xiaoran Xu xiaora...@ucla.edu
Date:   2015-02-24T23:04:03Z

Schedule

commit cad7dc008f6997f7aa2e9f5e1e343431057743ba
Author: Xiaoran Xu xiaora...@ucla.edu
Date:   2015-02-24T23:06:00Z

Schedule

commit d1cc6113db4508256e53fd341b8961329fc1df3e
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-04T14:28:10Z

Merge remote-tracking branch 'upstream/master'

commit e2594264324083f7fa5a6fa123de4377686b46ba
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-04T19:07:51Z

Coding analysis for BlockMatrix.scala

commit 37d1fbf5cf80051e7160eb195a655873b94d7eac
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-04T19:14:36Z

Update README.md

commit b89fe0beb99671539933180ab72f41714173e8db
Author: John Davis j...@foobox.com
Date:   2015-03-05T15:44:28Z

Fixed indentation

commit 0b3a9a63c10b474bb8d3ccefe7b03289b65fcc8c
Author: John Davis j...@foobox.com
Date:   2015-03-05T17:30:36Z

Update comments_BlockMatrix.txt

commit 5946c2a366e4ccf68b38b7733757f0a209f61f23
Author: John Davis j...@foobox.com
Date:   2015-03-10T15:41:27Z

Update comments_BlockMatrix.txt

commit 871289df3ce9fd1cf7239341c44e02b143ab0105
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T03:33:13Z

new comments

commit 8bd1594c6e1a6ae74914580dcec7ffe43340b265
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T05:12:40Z

new comment file written in markdown

commit bd418a342a5032288003d192d961ac3532e6cefd
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T05:14:34Z

README.md

commit 4074e83ca4f574316f6c161f84d0ba8063434ea8
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T05:16:36Z

new comments

commit 7f8de1cf8d71aadf52eb055d983ee8c4cc431e46
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T05:18:59Z

new comments

commit 4e292ff06d79a86e14db1f79b6beadc3282f8739
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T14:40:08Z

Add comments in BlockMatrix.md

commit a90b01abdb81107c266a2b0a896beffef848bff7
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-11T18:47:33Z

new Blockmatrix.md

commit e58fedc5c54858696a1ee378c55e14d8f02f4850
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T03:49:19Z

new Blockmatrix.md

commit fe33902ad79ae0d0408d297251e2c84b6e875c64
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T03:52:33Z

new Blockmatrix.md

commit 0b8d67a7b4e46f5efd876b25df70641c6e932912
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T03:55:01Z

new Blockmatrix.md

commit e56d450e4f0ed3bcdd1e19fb93c9cea80e0010b3
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T06:39:06Z

new Blockmatrix.md

commit 4126bc19db6d4d254d4ce4d7b0cf9e7819520d45
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T06:40:55Z

new Blockmatrix.md

commit c76608b767da56f129fa783df3bbe2e0a7157d32
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T06:42:31Z

new Blockmatrix.md

commit 24154fd4ac7a3a0a58352da8905eb39ef873a96f
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T06:44:55Z

new Blockmatrix.md

commit f3f1804a7607e37ff7af5edabec9c5dd3603dbe2
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T06:48:20Z

new Blockmatrix.md

commit f1a3a1470c2c0174767329d49b5f81b7a6186b26
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-12T07:54:29Z

new Blockmatrix.md

commit a5a25e9fdb74386aed77da1f90473acbad752c8c
Author: BruinBear ljy1...@gmail.com
Date:   2015-03-24T04:48:45Z

started reading on rowmatrix

commit 060c882bebca1b0b557df7101810d891562bacb3
Author: netpaladinx xiaora...@ucla.edu
Date:   2015-03-26T11:49:44Z

new file added

commit b9841d5a191851ef5a55843b7d96d014304a0e50
Author: Zhengliang Wu nbawzl2...@gmail.com
Date:   2015-03-30T07:26:44Z

add_IndexedRowMatrix.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6119][SQL] DataFrame support for missin...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5248#issuecomment-87578913
  
  [Test build #29392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29392/consoleFull)
 for   PR 5248 at commit 
[`914a374`](https://github.com/apache/spark/commit/914a3743801c7e1637fb43ef841d2d76fc3e4ce7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5257#issuecomment-87568532
  
  [Test build #29389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29389/consoleFull)
 for   PR 5257 at commit 
[`72a586a`](https://github.com/apache/spark/commit/72a586a68491608a32cbd5e83d0268cba8b1c18a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5256#issuecomment-87568005
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5203][SQL] fix union with different dec...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4004#issuecomment-87570896
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29388/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-sql] a better exception message than s...

2015-03-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5260#issuecomment-87586704
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6341][mllib] Upgrade breeze from 0.11.1...

2015-03-30 Thread yu-iskw
Github user yu-iskw closed the pull request at:

https://github.com/apache/spark/pull/5222


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6341][mllib] Upgrade breeze from 0.11.1...

2015-03-30 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/5222#issuecomment-87589402
  
@mengxr and @srowen, alright. There was some delay because github was 
attacked.  I'm closing this PR. Thank you for your helping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-87611637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29391/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5250#discussion_r27378646
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -246,6 +249,15 @@ class HadoopRDD[K, V](
 } catch {
   case eof: EOFException =
 finished = true
+  case e: Exception =
--- End diff --

Yes, but it calls into question when you would turn it on. You can't 
actually handle _just_ the situation you describe reliably even with 
`IOException`. I think this is band-aiding over an input problem that just 
isn't properly handled two more steps down the pipeline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Accumulator deserialized twice because the Nar...

2015-03-30 Thread suyanNone
GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/5259

Accumulator deserialized twice because the NarrowCoGroupSplitDep contains 
rdd object.

1. Use code like belows, will found accumulator deserialized twice.
first:
```
task = ser.deserialize[Task[Any]](taskBytes, 
Thread.currentThread.getContextClassLoader)
```
second:
```
val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
  ByteBuffer.wrap(taskBinary.value), 
Thread.currentThread.getContextClassLoader)
```
which the first deserialized is not what expected.
because ResultTask or ShuffleMapTask will have a partition object.
in class
```
CoGroupedRDD[K](@transient var rdds: Seq[RDD[_ : Product2[K, _]]], part: 
Partitioner)
, the CogroupPartition may contains a CoGroupDep:
NarrowCoGroupSplitDep(
rdd: RDD[_],
splitIndex: Int,
var split: Partition
  ) extends CoGroupSplitDep {
```
in that NarrowCoGroupSplitDep, it will bring into rdd object, which result 
into the first deserialized.

example:
```
   val acc1 = sc.accumulator(0, test1)
val acc2 = sc.accumulator(0, test2)
val rdd1 = sc.parallelize((1 to 10).toSeq, 3)
val rdd2 = sc.parallelize((1 to 10).toSeq, 3)
val combine1 = rdd1.map { case a = (a, 1)}.combineByKey(a = {
  acc1 += 1
  a
}, (a: Int, b: Int) = {
  a + b
},
  (a: Int, b: Int) = {
a + b
  }, new HashPartitioner(3), mapSideCombine = false)

val combine2 = rdd2.map { case a = (a, 1)}.combineByKey(
  a = {
acc2 += 1
a
  },
  (a: Int, b: Int) = {
a + b
  },
  (a: Int, b: Int) = {
a + b
  }, new HashPartitioner(3), mapSideCombine = false)

combine1.cogroup(combine2, new HashPartitioner(3)).count()
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark fix-acc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5259


commit 2fde0669f62053d86adbbb37196fb161fb5ac1c8
Author: hushan[胡珊] hus...@xiaomi.com
Date:   2015-03-30T08:05:02Z

Fix twice deserialized accumulators with CoGroup




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6553] [pyspark] Support functools.parti...

2015-03-30 Thread ksonj
Github user ksonj commented on the pull request:

https://github.com/apache/spark/pull/5206#issuecomment-87588299
  
I've added two tests for UDFs with partial functions and callable objects. 
Thanks for the hint, I'll open future PRs against `master` then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5259#issuecomment-87615549
  
  [Test build #29393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29393/consoleFull)
 for   PR 5259 at commit 
[`2fde066`](https://github.com/apache/spark/commit/2fde0669f62053d86adbbb37196fb161fb5ac1c8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CoGroupedRDD[K](var rdds: Seq[RDD[_ : Product2[K, _]]], part: 
Partitioner)`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5260#issuecomment-87623062
  
No, this is wrong. You can see that the variable is used a few lines later. 
Mind closing this PR? In the future, a more descriptive title than Update 
start-slave.sh is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Master

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5258#issuecomment-87623213
  
Mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5257#issuecomment-87582560
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29389/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5257#issuecomment-87582550
  
  [Test build #29389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29389/consoleFull)
 for   PR 5257 at commit 
[`72a586a`](https://github.com/apache/spark/commit/72a586a68491608a32cbd5e83d0268cba8b1c18a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update load-spark-env.sh

2015-03-30 Thread raschild
GitHub user raschild opened a pull request:

https://github.com/apache/spark/pull/5261

Update load-spark-env.sh

Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME 
variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this 
when trying to start the master: sbin/start-master.sh

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/raschild/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5261.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5261


commit b9babcdc7f178b93a44efccdff38dcd3bab9adbb
Author: raschild rasch...@users.noreply.github.com
Date:   2015-03-30T08:44:41Z

Update load-spark-env.sh

Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME 
variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this 
when trying to start the master: sbin/start-master.sh




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update load-spark-env.sh

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5261#issuecomment-87594029
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5259#issuecomment-87615560
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29393/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin

2015-03-30 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/4419#issuecomment-87619172
  
Thanks for the informative feedback. And I sincerely like it when you tell 
me what's recommended and what should be changed. 

# 1. First thing is API. 

One thing great about Online LDA is that it can avoid loading the entire 
corpus, since it only need to process one mini batch each time. Thus I kinda 
feel it's necessary to have an API that can support the usage.
In current edition, user can write some code like
```
   // corpus does not need to be ready before this
val onlineLDA = new OnlineLDAOptimizer(k, D, vocabSize)
for(i - 1 to batchNumber){
  val batch =  // ... convert dynamically or read libsvm directly
  onlineLDA.submitMiniBatch(batch)
}
```
I think this will be especially necessary and helpful for larger data set 
since doc2vec at large scale is resource intensive. And having a stream of mini 
`documents: RDD[(Long, Vector)]` rather than an integrated corpus will be a key 
factor that why OnlineLDA can handle larger dataset and be stream friendly.
This is why I leave optimizer public. I'd like to know your opinions.

# 2. Builder Pattern and Parameter parity

Sure it's doable. Originally I named `OnlineLDAOptimizer` just as 
`OnlineLDA`, and then I thought we talked about optimizer framework, so I 
changed it. If we can lock down API, it will be pretty clear how to proceed 
with these details.

# 3. About Scaling and correctness testing, can you please share a 
recommended dataset?

Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread josegom
GitHub user josegom opened a pull request:

https://github.com/apache/spark/pull/5260

Update start-slave.sh

add a comment in the line 22

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/josegom/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5260


commit db9da7459e7a8dd78c8b5e4b02c3c8d9f98299e3
Author: Jose Manuel Gomez jmgo...@stratio.com
Date:   2015-03-30T08:15:08Z

Update start-slave.sh

add a comment in the line 22




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5259#issuecomment-87585645
  
  [Test build #29393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29393/consoleFull)
 for   PR 5259 at commit 
[`2fde066`](https://github.com/apache/spark/commit/2fde0669f62053d86adbbb37196fb161fb5ac1c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4929#issuecomment-87587319
  
  [Test build #29394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29394/consoleFull)
 for   PR 4929 at commit 
[`220b67d`](https://github.com/apache/spark/commit/220b67d511cd25d908d8408fa9c59d78d8ad0f9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6119][SQL] DataFrame support for missin...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5248#issuecomment-87607213
  
  [Test build #29392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29392/consoleFull)
 for   PR 5248 at commit 
[`914a374`](https://github.com/apache/spark/commit/914a3743801c7e1637fb43ef841d2d76fc3e4ce7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AtLeastNNonNulls(n: Int, children: Seq[Expression]) extends 
Predicate `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6119][SQL] DataFrame support for missin...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5248#issuecomment-87607245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29392/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4929#issuecomment-87615844
  
  [Test build #29394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29394/consoleFull)
 for   PR 4929 at commit 
[`220b67d`](https://github.com/apache/spark/commit/220b67d511cd25d908d8408fa9c59d78d8ad0f9e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class With(child: LogicalPlan, cteRelations: Map[String, 
Subquery]) extends UnaryNode `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4929#issuecomment-87615858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29394/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread tigerquoll
Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/5250#discussion_r27377569
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -246,6 +249,15 @@ class HadoopRDD[K, V](
 } catch {
   case eof: EOFException =
 finished = true
+  case e: Exception =
--- End diff --

Having been on the receiving end of things I know that the gzip module 
throws an IOException, but unfortunately I have no knowledge over what the 
Hadoop input modules and what exceptions they throw, or if they propagate 
exceptions up from other 3rd party libraries.   Catching such a broad exception 
is mitigated by the fact that this particular option defaults to off, and 
should only be enabled when you are trying to parse files that you know are 
corrupt.  Given the situation, when you turn the option on, we should really 
try to finish processing files to the best of our ability, thus I think in this 
case catching 'Exception' might be appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5265#issuecomment-87662263
  
@rxin Is there a good reason that makes `DataFrame.rdd` have to be a 
function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-03-30 Thread yinxusen
GitHub user yinxusen opened a pull request:

https://github.com/apache/spark/pull/5266

[SPARK-6528][ML] Add IDF transformer

See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF 
transformer in ML package.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yinxusen/spark SPARK-6528

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5266


commit 4802c6769d3b4c89faec3a8f0264ecd03117ceed
Author: Xusen Yin yinxu...@gmail.com
Date:   2015-03-30T09:37:11Z

add IDF transformer and test suite

commit 2aa4be0e1d7ce052f8c901c6d9462c611c3a920a
Author: Xusen Yin yinxu...@gmail.com
Date:   2015-03-30T12:51:32Z

clean test suite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread tigerquoll
Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/5250#issuecomment-87681300
  
If a user can write scala codes that appropriately deals with the problem, 
why can't they write spark code to deal with it in parallel? Isn't this what 
spark is about? Isn't this a problem that can be readily parallelised? Spark is 
being put forward as data processing framework - bad data needs to be handled 
in some way better then just refusing to have anything to do with it.

I believe to parallelise your mentioned solution means adding to the public 
API, which takes time and consideration.  The option was considered as a 
scoped, quick fix solution to at least give users some ability to continue - 
the idea would be to retire the option once a new API was in place to 
gracefully deal with the problem.

In regards to the option being presented to the users as a fine thing to 
do when I don't believe it is - how about providing the information to the 
user a letting the users chose themselves? A good point about an option being a 
public API though - what is the understanding about how stable options are? No 
real Experimental or DeveloperAPI tags available here.

Your proposed solution was the same solution I ended up settling on when 
first confronted with the issue - but only after a number of frustrated 
attempts at getting spark to do what I wanted it to.  What you proposed and 
what I did In the end was to give up using spark and to bashing out some 
standalone code using hadoop libraries to do the job.  ie: Stopped using spark 
and used another tool that made my job easier.  I felt that it didn't have to 
be this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5257#issuecomment-87629549
  
I think this seems reasonable. I'll leave it open for comments for some 
time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5074


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6598] Python API for IDFModel

2015-03-30 Thread Lewuathe
GitHub user Lewuathe opened a pull request:

https://github.com/apache/spark/pull/5264

[SPARK-6598] Python API for IDFModel

This is the sub-task of SPARK-6254.
Wrapping IDFModel `idf` member function for pyspark.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Lewuathe/spark SPARK-6598

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5264


commit 1dc522cab1bdfe55f8245c687ba6b866ca07853e
Author: lewuathe lewua...@me.com
Date:   2015-03-30T12:21:45Z

[SPARK-6598] Python API for IDFModel




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/5265

[SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val

Before 1.3.0, `SchemaRDD.id` works as a unique identifier of each 
`SchemaRDD`. In 1.3.0, unlike `SchemaRDD`, `DataFrame` is no longer an RDD, and 
`DataFrame.rdd` is actually a function which always returns a new RDD instance. 
Making `DataFrame.rdd` a lazy val should bring the unique identifier back.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-6608

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5265


commit 7f37d2142a388e5717ae2c3e89152c8c735904cc
Author: Cheng Lian l...@databricks.com
Date:   2015-03-30T12:34:32Z

Makes DataFrame.rdd a lazy val




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5263#issuecomment-87661966
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29395/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6558] Utils.getCurrentUserName returns ...

2015-03-30 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/5229#issuecomment-87666829
  
we should pull this back into 1.3.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-87669352
  
  [Test build #29400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29400/consoleFull)
 for   PR 5266 at commit 
[`2aa4be0`](https://github.com/apache/spark/commit/2aa4be0e1d7ce052f8c901c6d9462c611c3a920a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread petro-rudenko
Github user petro-rudenko commented on the pull request:

https://github.com/apache/spark/pull/5265#issuecomment-87670835
  
+1 for this, since for example [the caching logic from ml 
package](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L64)
 doesn't work properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5262#issuecomment-87679636
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29397/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5262#issuecomment-87679590
  
  [Test build #29397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29397/consoleFull)
 for   PR 5262 at commit 
[`453af8b`](https://github.com/apache/spark/commit/453af8ba57fa65b32469beb969707aec4b713ee2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5263#issuecomment-87661942
  
  [Test build #29395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29395/consoleFull)
 for   PR 5263 at commit 
[`1de001d`](https://github.com/apache/spark/commit/1de001d375d06ec681a2ac4eb3a62f01310af21d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5265#issuecomment-87663657
  
  [Test build #29399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29399/consoleFull)
 for   PR 5265 at commit 
[`7f37d21`](https://github.com/apache/spark/commit/7f37d2142a388e5717ae2c3e89152c8c735904cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5245#issuecomment-87669636
  
  [Test build #29401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29401/consoleFull)
 for   PR 5245 at commit 
[`b70e7e1`](https://github.com/apache/spark/commit/b70e7e1d0b96c74f4adbe4ebd76442756c072313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-03-30 Thread yu-iskw
GitHub user yu-iskw opened a pull request:

https://github.com/apache/spark/pull/5267

[SPARK-6517][mllib] Implement the Algorithm of Hierarchical Clustering

I implemented a hierarchical clustering algorithm again.  This PR doesn't 
include examples, documentation and spark.ml APIs. I am going to send another 
PRs later.
https://issues.apache.org/jira/browse/SPARK-6517

- This implementation based on a bi-sectiong K-means clustering.
- It derives from the @freeman-lab 's implementation
- The basic idea is not changed from the previous version. (#2906)
- However, It is 1000x faster than the previous version through 
parallel processing.

Thank you for your great cooperation, RJ Nowling(@rnowling), Jeremy 
Freeman(@freeman-lab), Xiangrui Meng(@mengxr) and Sean Owen(@srowen).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yu-iskw/spark new-hierarchical-clustering

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5267


commit af0f65bb4726315c076b827d37276616c5218010
Author: Yu ISHIKAWA yuu.ishik...@gmail.com
Date:   2015-03-30T11:29:12Z

[SPARK-6517][mllib] Implement the Algorithm of Hierarchical Clustering

Thank you for your great cooperation, RJ Nowling(@rnowling), Jeremy 
Freeman(@freeman-lab), Xiangrui Meng(@mengxr) and Sean Owen(@srowen).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6598][MLLIB] Python API for IDFModel

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5264#issuecomment-87686584
  
  [Test build #29398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29398/consoleFull)
 for   PR 5264 at commit 
[`1dc522c`](https://github.com/apache/spark/commit/1dc522cab1bdfe55f8245c687ba6b866ca07853e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5267#issuecomment-87696607
  
  [Test build #29403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29403/consoleFull)
 for   PR 5267 at commit 
[`3df7f11`](https://github.com/apache/spark/commit/3df7f1157c67135b5dde451b540fe30deb730c99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5245#issuecomment-87700160
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29401/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5267#issuecomment-87686119
  
  [Test build #29402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29402/consoleFull)
 for   PR 5267 at commit 
[`af0f65b`](https://github.com/apache/spark/commit/af0f65bb4726315c076b827d37276616c5218010).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6598][MLLIB] Python API for IDFModel

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5264#issuecomment-87686598
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29398/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...

2015-03-30 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/4812#discussion_r27392069
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -619,10 +619,26 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Some(f) = nodeToRelation(f)
   case None = NoRelation
 }
- 
+
+val not = (?i)not.r
+val exists = (?i)exists.r
+
 val withWhere = whereClause.map { whereNode =
-  val Seq(whereExpr) = whereNode.getChildren.toSeq
-  Filter(nodeToExpr(whereExpr), relations)
+  val Seq(clause) = whereNode.getChildren.toSeq
+  clause match {
+case Token(not(),
+   Token(TOK_SUBQUERY_EXPR,
+ Token(TOK_SUBQUERY_OP, Token(exists(), Nil) :: Nil) 
::
+ subquery :: Nil) :: Nil) =
+  Exists(relations, nodeToPlan(subquery), false)
+case Token(TOK_SUBQUERY_EXPR,
+   Token(TOK_SUBQUERY_OP, Token(exists(), Nil) :: Nil) ::
+   subquery :: Nil) =
+  Exists(relations, nodeToPlan(subquery), true)
+// TODO add IN and NOT IN
+case whereExpr =
+  Filter(nodeToExpr(whereExpr), relations)
+  }
--- End diff --

Seems this do not support sql with both predicts and exists in where clause:
```
select * 
from src b 
where 
(not exists 
  (select a.key 
  from src a 
  where b.value = a.value  and a.key = b.key and a.value  'val_2'
  )
) and key  1
;
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5245#issuecomment-87700120
  
  [Test build #29401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29401/consoleFull)
 for   PR 5245 at commit 
[`b70e7e1`](https://github.com/apache/spark/commit/b70e7e1d0b96c74f4adbe4ebd76442756c072313).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PolynomialMapper extends UnaryTransformer[Vector, Vector, 
PolynomialMapper] `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6595][SQL] MetastoreRelation should be ...

2015-03-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5251


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5250#issuecomment-87688106
  
You can use Spark to do this too, sure. Functions can call the HDFS API to 
check and delete files in parallel. Roughly:

```
sc.parallelize(fs.listStatus(...).map(_.getPath.toString)).map { pathStr = 
  val path = new Path(pathStr)
  val in = new GZIPInputStream(fs.open(path))
  try {
in.read()
  } catch {
case e: ZipException = fs.delete(path, false)
  } finally {
in.close()
  }
}
```

I'm sure that's not 100% right but you see the idea. 

I am not proposing that this become a Spark API. It seems like an 
application-specific piece of logic that can be written using Spark. I don't 
claim Scala + Spark + Hadoop is easy, but it is directly doable with these 
tools.

I think the point stands that this change does not help solve the problem 
directly, as the above does. It ignores the problem, which is sometimes a fine 
strategy, but at the cost of significant side effects. The side effects are the 
non-starter, to me. But the upside is I think there is a direct solution too.

Well I've said enough so it's time to let others weigh in too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...

2015-03-30 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5263#issuecomment-87696048
  
This is a good point. Actually all these characters ` ,;{}()\n\t=` (note 
there is a space character at the beginning) can be problematic if they appear 
in field names, according to [`MessageTypeParser`] [1].

However, personally I think simply replacing these characters with 
legitimate ones like brackets might be confusing. On the other hand, similar 
problems can be worked around easily by assigning an alias. So how about this:

1. Check all field names for invalid characters in `convertFromAttributes`
2. Throw an error message when any invalid character is found
3. In the error message, suggest the user to add an alias to the field 
explicitly

[1]: 
https://github.com/apache/incubator-parquet-mr/blob/b8f5d89e0f4347ce54cf680bd7dffc9bc02f876a/parquet-column/src/main/java/parquet/schema/MessageTypeParser.java#L46


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5265#issuecomment-87691639
  
  [Test build #29399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29399/consoleFull)
 for   PR 5265 at commit 
[`7f37d21`](https://github.com/apache/spark/commit/7f37d2142a388e5717ae2c3e89152c8c735904cc).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5265#issuecomment-87691658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29399/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update load-spark-env.sh

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5261#issuecomment-87628770
  
(You could give this a more specific title than Update load-spark-env.sh)
This is borderline important enough for a JIRA, but I think we might 
consider this a minor add-on fix for SPARK-4924, maybe.

I'm not sure about this. For example `spark-class` sources this script with 
`. $SPARK_HOME/bin/load-spark-env.sh` and `pyspark` does similarly. So these 
have `SPARK_HOME` set.

However `run-example` uses `. $FWDIR/bin/load-spark-env.sh`, and scripts 
in `sbin` use `. $SPARK_PREFIX/bin/load-spark-env.sh` Clearly they don't 
expect `SPARK_HOME` necessarily.

CC @vanzin since this used to refer to `FWDIR` actually:

https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97

The lines you reference don't exist in 1.3.0 though. Are you sure you're 
using 1.3.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread josegom
GitHub user josegom opened a pull request:

https://github.com/apache/spark/pull/5262

Update start-slave.sh

wihtout this change the below error happens when I execute sbin/start-all.sh

localhost: /spark-1.3/sbin/start-slave.sh: line 32: unexpected EOF while 
looking for matching `'
localhost: /spark-1.3/sbin/start-slave.sh: line 33: syntax error: 
unexpected end of file

my operating system is Linux Mint 17.1 Rebecca

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/josegom/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5262


commit 2c456bd66555646a60529571d313ad392c6bd1f2
Author: Jose Manuel Gomez jmgo...@stratio.com
Date:   2015-03-30T10:32:01Z

Update start-slave.sh

wihtout this change the below error happens when I execute sbin/start-all.sh

localhost: /spark-1.3/sbin/start-slave.sh: line 32: unexpected EOF while 
looking for matching `'
localhost: /spark-1.3/sbin/start-slave.sh: line 33: syntax error: 
unexpected end of file

my operating system is Linux Mint 17.1 Rebecca




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5263#issuecomment-87637278
  
  [Test build #29395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29395/consoleFull)
 for   PR 5263 at commit 
[`1de001d`](https://github.com/apache/spark/commit/1de001d375d06ec681a2ac4eb3a62f01310af21d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5262#issuecomment-87630374
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread josegom
Github user josegom closed the pull request at:

https://github.com/apache/spark/pull/5260


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6597][Minor] Replace `input:checkbox` w...

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5254#issuecomment-87632095
  
This sounds fine. Since the tests won't test it, have you had a chance to 
try the affected controls locally to verify they still work as expected? A 
manual test would be good to double-check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5262#discussion_r27381022
  
--- Diff: sbin/start-slave.sh ---
@@ -19,7 +19,7 @@
 
 # Starts a slave on the machine this script is executed on.
 
-usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
+usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
--- End diff --

Are you going to update this one? it's minor but I think it not worth even 
dealing with double quotes in a quoted string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread josegom
Github user josegom commented on a diff in the pull request:

https://github.com/apache/spark/pull/5262#discussion_r27383584
  
--- Diff: sbin/start-slave.sh ---
@@ -19,7 +19,7 @@
 
 # Starts a slave on the machine this script is executed on.
 
-usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
+usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
--- End diff --

I change the quotations  form the correct place.

thanks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5262#issuecomment-87651382
  
  [Test build #29397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29397/consoleFull)
 for   PR 5262 at commit 
[`453af8b`](https://github.com/apache/spark/commit/453af8ba57fa65b32469beb969707aec4b713ee2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread josegom
Github user josegom commented on a diff in the pull request:

https://github.com/apache/spark/pull/5262#discussion_r27380158
  
--- Diff: sbin/start-slave.sh ---
@@ -19,7 +19,7 @@
 
 # Starts a slave on the machine this script is executed on.
 
-usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
+usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
--- End diff --

Ok I closed the other PR yet. 

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...

2015-03-30 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/5263

[SPARK-6607][SQL] Aggregation attribute name including special chars '(' 
and ')' should be replaced before generating Parquet schema

'(' and ')' are special characters used in Parquet schema for type 
annotation. When we run an aggregation query, we will obtain attribute name 
such as MAX(a).

If we directly store the generated DataFrame as Parquet file, it causes 
failure when reading and parsing the stored schema string.

Several methods can be adopted to solve this. This pr uses a simplest one 
to just replace attribute names before generating Parquet schema based on these 
attributes.

Another possible method might be modifying all aggregation expression names 
from func(column) to func[column].


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 parquet_aggregation_name

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5263


commit 1de001d375d06ec681a2ac4eb3a62f01310af21d
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2015-03-30T11:05:26Z

Replace special characters '(' and ')' of Parquet schema.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...

2015-03-30 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/4812#issuecomment-87654495
  
Hi @chenghao-intel can you rebase this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6598][MLLIB] Python API for IDFModel

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5264#issuecomment-87660274
  
  [Test build #29398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29398/consoleFull)
 for   PR 5264 at commit 
[`1dc522c`](https://github.com/apache/spark/commit/1dc522cab1bdfe55f8245c687ba6b866ca07853e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5262#discussion_r27379765
  
--- Diff: sbin/start-slave.sh ---
@@ -19,7 +19,7 @@
 
 # Starts a slave on the machine this script is executed on.
 
-usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
+usage=Usage: start-slave.sh worker# spark-master-URL where 
spark-master-URL is like spark://localhost:7077
--- End diff --

Ah I see, good catch. Actually, the quote in `spark` should just be 
removed. I think. You can close your other PR and push an update to this one 
and I'll merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6596] fix the instruction on building s...

2015-03-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5253


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6548 : Adding stddev to DataFrame ...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5228#issuecomment-87638533
  
  [Test build #29396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29396/consoleFull)
 for   PR 5228 at commit 
[`41a5768`](https://github.com/apache/spark/commit/41a5768b9eb5154b2f1af38199b3c121770a5367).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StdDeviation(child: Expression)`
  * `case class StdDeviationFunction(expr: Expression, base: 
AggregateExpression)`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6548 : Adding stddev to DataFrame ...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5228#issuecomment-87638517
  
  [Test build #29396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29396/consoleFull)
 for   PR 5228 at commit 
[`41a5768`](https://github.com/apache/spark/commit/41a5768b9eb5154b2f1af38199b3c121770a5367).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6548 : Adding stddev to DataFrame ...

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5228#issuecomment-87638534
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29396/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5262#issuecomment-87650270
  
LGTM as a hotfix for SPARK-6552. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread tigerquoll
Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/5250#issuecomment-87650287
  
Hi Sean, 
Thanks for your input - your views have helped me refine my thinking on the 
matter.  I believe that If you take a purist's point of view then yes you can 
say the source of the problem (likely) is with the data producer and should be 
fixed at the data producer's end. 

The point being is that this is a problem that is affecting many spark 
users right now, and many users are not in control of the source system of the 
data they are analysing and are forced to 'make do' with what they have.  You 
call this solution a band-aid - but many ETL solutions are a bandaid - but 
providing this functionality is useful and serves a purpose for the end-user.

Are you concerned that swallowing an exception could leave the hadoop input 
libraries in an inconsistent state, causing more data corruption?  This will 
not happen because swallowing the exception triggers the immediate finish of 
the file reading task and no more data will be read by the task.

Are you concerned that swallowing an exception indicates that something has 
potentially gone wrong earlier in the hadoop input read, and that previous data 
could have been corrupted?  The user already knows this is potentially the case 
because running the application without this option enabled has caused the 
application to terminate in the first place.

The fact that we are being more permissive of potentially corrupt data is a 
show stopper for this being default behaviour - but I'm not proposing this be 
default behaviour, I'm proposing this be a last-ditch option that an advanced 
user can knowingly enable when attempting to deal with corrupted data, with the 
understanding that their data could be made worse, but most likely corrupt data 
will be omitted. 

The alternative is to tell them that their data is not suitable for being 
loaded into spark and perhaps they should use another tool or tell the data 
system owner to fix their data feeds and get back to them with another data set 
some time in the future.  I know which option I would prefer if given the 
choice - don't let perfect be the enemy of good.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update start-slave.sh

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5262#issuecomment-87650305
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5250#issuecomment-87653690
  
I don't know if corrupted gzip files are such a common problem, but I'm not 
sure that would change the logic about where to fix things. It is a problem 
with the preceding ETL process, yes. Something else needs to explicitly check 
and/or fix the input first if this is a problem.

I suppose my point too is that this change does not just address the 
proposed problem with gzip files. It treats any error as recoverable.

It's nothing to do with inconsistent state. It's the presenting a 
successful result that is actually silently missing input, which might not even 
be deterministic. This seems way more problematic than reliably failing-fast 
and, yes, making you fix your upstream process.

Hiding behind a flag only goes so far. It's documented (or else how many 
people does it help?). It becomes a code path that has to be supported for a 
long time. It is presented to users as a fine thing to do when I don't believe 
it is. It's not the good being the enemy of the perfect, but the dangerous 
being the enemy of the good.

This is nothing to do with telling people they can't use Spark, or have to 
fix an unfixable upstream process. This is about appropriately dealing with bad 
upstream data in the right place, and this is not how to do it.

Specifically: why not write a process that just opens a stream on each 
input file in turn and tries to read a handful of bytes? if it fails, delete 
the file or do what you like with it. This is maybe 10 lines of code in your 
driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6592] fix filter for scaladoc to genera...

2015-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5252#issuecomment-87706210
  
  [Test build #29404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29404/consoleFull)
 for   PR 5252 at commit 
[`02098a4`](https://github.com/apache/spark/commit/02098a4667f251e7999c8f9cae3b3fa662513acb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6595][SQL] MetastoreRelation should be ...

2015-03-30 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5251#issuecomment-87704476
  
Merged to master and 1.3, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...

2015-03-30 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5265#issuecomment-87708936
  
@petro-rudenko Oops, that's a good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-87708916
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29400/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >