[GitHub] spark pull request: [SPARK-1170] Add histogram method to Python's ...

2014-08-06 Thread nrchandan
Github user nrchandan commented on the pull request:

https://github.com/apache/spark/pull/1783#issuecomment-51296786
  
@davies I'll go through your code today. I think you meant #1791 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1770#issuecomment-51296966
  
QA results for PR 1770:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17996/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-08-06 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1313#issuecomment-51296954
  
Alright, I've merged this in. Thanks Nan!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51297351
  
Tests passed according to 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17994/consoleFull.
 Reverting the commit Do not special case spark.ports.maxRetries during tests

test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2861] Fix Doc comment of histogram meth...

2014-08-06 Thread nrchandan
Github user nrchandan commented on the pull request:

https://github.com/apache/spark/pull/1786#issuecomment-51297349
  
@rxin The test that has failed is not related to the change. Could you 
resubmit for testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51297508
  
QA tests have started for PR 1777. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18002/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1313


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Tighten the visibility of various SQLCon...

2014-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1794


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Fix logging warn - debug

2014-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1800


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/1791#issuecomment-51297756
  
The histogram() had been implemented in pure Python, it will support 
integer better, also it will support RDD of strings and other comparable 
objects.

This was inspired by #1783 et, and much improved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...

2014-08-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1770#issuecomment-51297776
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-06 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1513#issuecomment-51297961
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1770#issuecomment-51298074
  
QA tests have started for PR 1770. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18004/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2874][SQL] Fixed usage messages of all ...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51298082
  
QA tests have started for PR 1801. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18003/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1733#issuecomment-51298178
  
@dorx I checked R's implementation and finally figured out what is going on.

1. When only a vector `x` is given, it is treated as a vector containing 
frequency counts for categories and tested against multinomial distribution.
2. When a matrix `x` is given, it is treated as a contingency table and the 
test is for independence. 
3. When both `x` and `y` are given, both vectors are treated as factors 
(categorical values) and the test is for independence.

I want to suggest the following APIs:

~~~
// test observed frequencies against multinomial distribution with
// `p = (1/n, 1/n, ..., 1/n)`
def chiSqTest(counts: Vector)

// test observed frequencies against the given multinomial distribution
def chiSqTest(counts: Vector, p: Vector)

// test independence using the given contingency table 
def chiSqTest(counts: Matrix)

// test independence using the given observed pairs (assuming categorical 
values)
def chiSqTest[V1, V2](observations: RDD[(V1, V2)])
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1656: Fix potential resource leaks

2014-08-06 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/577#issuecomment-51298306
  
Resolved the conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1733#discussion_r15857945
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.annotation.Experimental
+
+/**
+ * :: Experimental ::
+ * Trait for hypothesis test results.
+ */
+@Experimental
+trait TestResult {
+
+  def pValue: Double
+
+  def degreesOfFreedom: Array[Long]
--- End diff --

`df` should be an array of double or we can make it a generic type. In 
t-test and f-test, `df` are not integers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1513#issuecomment-51298388
  
QA tests have started for PR 1513. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18005/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1656: Fix potential resource leaks

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/577#issuecomment-51298416
  
QA tests have started for PR 577. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18007/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2874][SQL] Fixed usage messages of all ...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51298396
  
QA tests have started for PR 1801. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18006/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1022][Streaming][HOTFIX] Fixed zookeepe...

2014-08-06 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1797#issuecomment-51298644
  
LGTM - TD go ahead and merge as this is making the maven build angry :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1513#issuecomment-51298628
  
QA results for PR 1513:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18005/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLlib] Use this.type as return type in k-mean...

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1796#issuecomment-51298668
  
LGTM. Merged into both master and branch-1.1. Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-06 Thread tnachen
Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/1513#issuecomment-51298744
  
It looks it's not failing unit tests but can't even compile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2805] akka 2.3.4

2014-08-06 Thread avati
Github user avati commented on the pull request:

https://github.com/apache/spark/pull/1685#issuecomment-51298746
  
The latest updated patches depend on published packages built by 
https://github.com/avati/spark-shaded scripts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1170] Add histogram method to Python's ...

2014-08-06 Thread nrchandan
Github user nrchandan commented on the pull request:

https://github.com/apache/spark/pull/1783#issuecomment-51298908
  
@davies  #1791 looks good. Feel free to close this one as duplicate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1799#issuecomment-51299104
  
QA results for PR 1799:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18000/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1022][Streaming][HOTFIX] Fixed zookeepe...

2014-08-06 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1797#issuecomment-51299192
  
Okay @tdas actually I went ahead and merged this into master and 1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2875] [PySpark] [SQL] handle null in sc...

2014-08-06 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/1802

[SPARK-2875] [PySpark] [SQL] handle null in schemaRDD()

Handle null in schemaRDD during converting them into Python.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark json

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1802


commit 88e6b1fbea96519bc9eb81ca1cb54ad4c019245f
Author: Davies Liu davies@gmail.com
Date:   2014-08-06T06:42:33Z

handle null in schemaRDD()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2874][SQL] Fixed usage messages of all ...

2014-08-06 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51299486
  
Thanks cheng. This LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1791#issuecomment-51299595
  
QA results for PR 1791:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18001/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2875] [PySpark] [SQL] handle null in sc...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1802#issuecomment-51299624
  
QA tests have started for PR 1802. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18008/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-06 Thread nrchandan
Github user nrchandan commented on a diff in the pull request:

https://github.com/apache/spark/pull/1791#discussion_r15858417
  
--- Diff: python/pyspark/rdd.py ---
@@ -854,6 +884,97 @@ def redFunc(left_counter, right_counter):
 
 return self.mapPartitions(lambda i: 
[StatCounter(i)]).reduce(redFunc)
 
+def histogram(self, buckets, even=False):
+
+Compute a histogram using the provided buckets. The buckets
+are all open to the right except for the last which is closed.
+e.g. [1,10,20,50] means the buckets are [1,10) [10,20) [20,50],
+which means 1=x10, 10=x20, 20=x=50. And on the input of 1
+and 50 we would have a histogram of 1,0,1.
+
+If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
+this can be switched from an O(log n) inseration to O(1) per
+element(where n = # buckets), if you set `even` to True.
+
+Buckets must be sorted and not contain any duplicates, must be
+at least two elements.
+
+If `buckets` is a number, it will generates buckets which is
+evenly spaced between the minimum and maximum of the RDD. For
+example, if the min value is 0 and the max is 100, given buckets
+as 2, the resulting buckets will be [0,50) [50,100]. buckets must
+be at least 1 If the RDD contains infinity, NaN throws an exception
+If the elements in RDD do not vary (max == min) always returns
+a single bucket.
+
+It will return an tuple of buckets and histogram.
+
+ rdd = sc.parallelize(range(51))
+ rdd.histogram(2)
+([0, 25, 50], [25, 26])
+ rdd.histogram([0, 5, 25, 50])
+([0, 5, 25, 50], [5, 20, 26])
+ rdd.histogram([0, 15, 30, 45, 60], True)
+([0, 15, 30, 45, 60], [15, 15, 15, 6])
+
+
+if isinstance(buckets, (int, long)):
+if buckets  1:
+raise ValueError(buckets should not less than 1)
+
+# faster than stats()
+def minmax(it):
+minv, maxv = float(inf), float(-inf)
+for v in it:
+minv = min(minv, v)
+maxv = max(maxv, v)
+return [(minv, maxv)]
+
+def _merge(a, b):
+return (min(a[0], b[0]), max(a[1], b[1]))
+
+minv, maxv = self.mapPartitions(minmax).reduce(_merge)
+
+if minv == maxv or buckets == 1:
+return [minv, maxv], [self.count()]
+
+inc = (maxv - minv) / buckets
+# keep them as integer if possible
+if inc * buckets != maxv - minv:
--- End diff --

This was smart!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1170] Add histogram method to Python's ...

2014-08-06 Thread nrchandan
Github user nrchandan closed the pull request at:

https://github.com/apache/spark/pull/1783


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...

2014-08-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51299909
  
Hey @chenghao-intel, answers for your questions:

1. Actually `bin/spark-sql` doesn't output lots of logs by default. Did you 
set something like `hive.root.looger=INFO,console` in your `hive-site.xml` or 
elsewhere (similar to what `shark-withinfo` does in Shark)?
1. Hmm... I'm not very sure about this. Usually people just copy their 
`hive-site.xml` and `hive-log4j.properties` from their existing Hive 
installation. Maybe we can do it in another PR. One thing to note, 
[`hive-default.xml.template` in Hive 0.12 isn't a valid 
XML](https://github.com/apache/hive/blob/release-0.12.0/conf/hive-default.xml.template#L2000),
 needs some minor tweak before being added.
1. Need some more time to investigate this one. We can discuss this offline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-06 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/1513#issuecomment-51300042
  
it's becaude [SPARK-2260] update Command.scala, i will fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...

2014-08-06 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1799#issuecomment-51300053
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1799#issuecomment-51300244
  
QA tests have started for PR 1799. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18009/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX][Streaming] Handle port collisions in ...

2014-08-06 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/1803

[HOTFIX][Streaming] Handle port collisions in flume polling test

This is failing my tests in #1777. @tdas

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark fix-flaky-streaming-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1803


commit af3ddc9397cecd6eeb350778f8fcb5671891bbe6
Author: Andrew Or andrewo...@gmail.com
Date:   2014-08-06T06:59:11Z

Handle port collisions in flume polling test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51300382
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX][Streaming] Handle port collisions in ...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1803#issuecomment-51300578
  
QA tests have started for PR 1803. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18010/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51300564
  
QA tests have started for PR 1777. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18011/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51300754
  
Okay I'm gonna merge this. Just failing a streaming flume test...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2862] Use shorthand range notation to a...

2014-08-06 Thread nrchandan
Github user nrchandan commented on the pull request:

https://github.com/apache/spark/pull/1787#issuecomment-51300728
  
Added Scala bug ID. Fixed the coding convention. Ready to retest. Cc 
@davies @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLlib] DIMSUM: Dimension Independent Matrix S...

2014-08-06 Thread freeman-lab
Github user freeman-lab commented on the pull request:

https://github.com/apache/spark/pull/1778#issuecomment-51300825
  
@srowen agreed the core vs external library question is important. The 
requirements 
[here](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) 
seem reasonable, but there's still gray area. For example, we have lots of 
analyses that are known / accepted but should I think remain external because 
they are for specific data types (images  time series). Re: this particular 
algorithm, it's definitely something we're interested in using, sounds like 
others are too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51300902
  
QA tests have started for PR 1481. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18012/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLlib] Use this.type as return type in k-mean...

2014-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1796


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1022][Streaming][HOTFIX] Fixed zookeepe...

2014-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1777


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLlib] DIMSUM: Dimension Independent Matrix S...

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1778#issuecomment-51301688
  
@rezazadeh Do you mind creating a JIRA for this and then add `[SPARK-]` 
to the title? We also want to learn more about the theory, especially the 
relation between storage/computation complexity and failure rate.

Btw, to me, finding similar rows (observations) is more natural than 
finding similar columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2862] Use shorthand range notation to a...

2014-08-06 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1787#issuecomment-51302231
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1770#issuecomment-51302212
  
QA results for PR 1770:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18004/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1751#issuecomment-51303742
  
@tdas the problem is that the dependency is already there. Spark core uses 
Zookeeper classes directly in `SparkCuratorUtil`. If you remove Curator, the 
code no longer compiles. Now, granted, if you removed Curator, you'd be 
removing this class too. So it's kind of OK to let the transitive dependency 
happen in practice in Spark Core. 

However, it is not pulling in the version declared in `zookeeper.version` 
(necessarily). In fact, that property is not used at all.

It exists I think for the benefit of vendors who are building the whole 
thing for a system that uses a particular zookeeper version, as evidenced by 
its presence in the MapR build. I think the intent is to control the version 
Curator depends on. So I think there is an intent for Core to depend directly 
on ZK for this reason?

I wouldn't agree that letting the transitive dependency happen to cover 
this is a good idea for the Kafka test though. There, it uses Zookeeper 
independently of Curator. It's a test dependency too so doesn't affect the 
non-test artifacts.

It would be more correct/robust to simply express the dependency on 
zookeeper in this case. I'll open a PR that shows what that looks like.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51303827
  
QA results for PR 1801:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18006/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51303836
  
QA results for PR 1801:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18003/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51304438
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51304501
  
QA results for PR 1481:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18012/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX][Streaming] Handle port collisions in ...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1803#issuecomment-51305013
  
QA results for PR 1803:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18010/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2875] [PySpark] [SQL] handle null in sc...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1802#issuecomment-51305553
  
QA results for PR 1802:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18008/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...

2014-08-06 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/1804

SPARK-1022 [BUILD] Depend on Zookeeper from Kafka tests

Per discussion at https://github.com/apache/spark/pull/1751 I suggest that 
the more correct thing to do is to depend on Zookeeper from Kafka tests. The 
first commit does this.

The second commit reflects the explicit dependency on ZK in core, in order 
to make `zookeeper.version` do something. Another reasonable change would be to 
simply remove `zookeeper.version` to reflect that otherwise it does nothing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-1022

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1804.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1804


commit 109f80e23d2f9909afe57b8598f7cef9c0f6c310
Author: Sean Owen sro...@gmail.com
Date:   2014-08-06T08:21:14Z

Directly depend on zookeeper in external/kafka tests, because tests use ZK 
directly

commit 4cbd04d62eb5e93cab9a237aea7209b66b67b4be
Author: Sean Owen sro...@gmail.com
Date:   2014-08-06T08:22:01Z

Depend on ZK directly, with Curator, to pull in version specified by 
zookeeper.version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1751#issuecomment-51306477
  
Have a look at https://github.com/apache/spark/pull/1804


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2798 [BUILD] Jenkins build failing due t...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1726#issuecomment-51307848
  
QA tests have started for PR 1726. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18015/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2179][SQL] Public API for DataTypes and...

2014-08-06 Thread chutium
Github user chutium commented on a diff in the pull request:

https://github.com/apache/spark/pull/1346#discussion_r15862768
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -89,6 +88,44 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
 new SchemaRDD(this, 
SparkLogicalPlan(ExistingRdd.fromProductRdd(rdd))(self))
 
   /**
+   * :: DeveloperApi ::
+   * Creates a [[SchemaRDD]] from an [[RDD]] containing [[Row]]s by 
applying a schema to this RDD.
+   * It is important to make sure that the structure of every [[Row]] of 
the provided RDD matches
+   * the provided schema. Otherwise, there will be runtime exception.
+   * Example:
+   * {{{
+   *  import org.apache.spark.sql._
+   *  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
+   *
+   *  val schema =
+   *StructType(
+   *  StructField(name, StringType, false) ::
+   *  StructField(age, IntegerType, true) :: Nil)
+   *
--- End diff --

good, i merged the change and used this API ```applySchema(rowRDD, 
appliedSchema)``` in #1612


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2798 [BUILD] Jenkins build failing due t...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1726#issuecomment-51311768
  
QA results for PR 1726:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18015/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...

2014-08-06 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/1805

SPARK-2879 [BUILD] Use HTTPS to access Maven Central and other repos

Maven Central has just now enabled HTTPS access for everyone to Maven 
Central 
(http://central.sonatype.org/articles/2014/Aug/03/https-support-launching-now/) 
This is timely, as a reminder of how easily an attacker can slip malicious code 
into a build that's downloading artifacts over HTTP 
(http://blog.ontoillogical.com/blog/2014/07/28/how-to-take-over-any-java-developer/).

In the meantime, it looks like the Spring repo also now supports HTTPS, so 
can be used this way too.

I propose to use HTTPS to access these repos.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-2879

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1805.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1805


commit 7043a8e4d1576424068bf307abb315809696c690
Author: Sean Owen sro...@gmail.com
Date:   2014-08-06T09:46:16Z

Use HTTPS for Maven Central libs and plugins; use id 'central' to override 
parent properly; use HTTPS for Spring repo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1805#issuecomment-51313947
  
QA tests have started for PR 1805. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18016/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2798 [BUILD] Jenkins build failing due t...

2014-08-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1726#issuecomment-51313961
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2798 [BUILD] Correct several small error...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1726#issuecomment-51314361
  
QA tests have started for PR 1726. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18017/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2817] [SQL] add show create table sup...

2014-08-06 Thread tianyi
Github user tianyi commented on the pull request:

https://github.com/apache/spark/pull/1760#issuecomment-51318897
  
how about adding another rule in rewritePaths funciton in TestHive.scala?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2798 [BUILD] Correct several small error...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1726#issuecomment-51319235
  
QA results for PR 1726:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18017/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...

2014-08-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1804#issuecomment-51320964
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1804#issuecomment-51321089
  
QA tests have started for PR 1804. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18018/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1804#issuecomment-51321185
  
QA results for PR 1804:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18018/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1804#issuecomment-51325551
  
QA tests have started for PR 1804. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18019/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1804#issuecomment-51330035
  
QA results for PR 1804:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18019/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2872] Fix conflict between code and doc...

2014-08-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1684#issuecomment-51332852
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2872] Fix conflict between code and doc...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1684#issuecomment-51333176
  
QA tests have started for PR 1684. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18020/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1618#issuecomment-51334454
  
QA tests have started for PR 1618. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18022/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1618#issuecomment-51337161
  
QA tests have started for PR 1618. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18024/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1618#issuecomment-51344240
  
QA results for PR 1618:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18024/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1618#issuecomment-51344815
  
QA results for PR 1618:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18022/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...

2014-08-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1805#issuecomment-51349772
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1733#issuecomment-51347255
  
The previous proposal may be hard to implement in Python. Another solution 
would be separate goodness-of-fit test from independence test, e.g., 
`chiSqGofTest` and `chiSqIndTest`.

~~~
def chiSqGofTest(counts: Vector)

def chiSqGofTest(counts: Vector, p: Vector)

def chiSqIndTest(counts: Matrix)

def chiSqIndTest[V1, V2](observations: RDD[(V1, V2)])
~~~

We can also add direct RDD support, which may be unnecessary:

~~~
def chiSqGofTest[V](observations: RDD[V], p: Map[V, Double])
~~~

Since we only support `pearson`, we can hide `method` in the public API for 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1805#issuecomment-51350321
  
QA tests have started for PR 1805. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18025/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1744#issuecomment-51353319
  
QA tests have started for PR 1744. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18026/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [GraphX] initialmessage for pagerank should be...

2014-08-06 Thread luyi0619
Github user luyi0619 commented on the pull request:

https://github.com/apache/spark/pull/1128#issuecomment-51358241
  
sure, thanks for your advice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51358315
  
QA tests have started for PR 1481. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18027/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1805#issuecomment-51358446
  
QA results for PR 1805:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18025/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1744#issuecomment-51361188
  
QA results for PR 1744:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18026/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1801#issuecomment-51362332
  
QA tests have started for PR 1801. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18028/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51363900
  
QA results for PR 1481:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18027/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1671#discussion_r15887125
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import java.lang.{Iterable = JavaIterable}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.Utils
+
+/**
+ * :: Experimental ::
+ * Maps a sequence of terms to their term frequencies using the hashing 
trick.
+ *
+ * @param numFeatures number of features (default: 100)
+ */
+@Experimental
+class HashingTF(val numFeatures: Int) extends Serializable {
+
+  def this() = this(100)
+
--- End diff --

Good point. I will update the default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added support for accessing secured HDFS

2014-08-06 Thread dkanoafry
Github user dkanoafry commented on the pull request:

https://github.com/apache/spark/pull/265#issuecomment-51365398
  
hi, whatever happened to this PR? I am interested in reading data from 
secure HDFS into spark running on Mesos...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...

2014-08-06 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/1744#issuecomment-51367511
  
By the way, if there is something majorly wrong with this PR (e.g. it's too 
big; I took the wrong approach; etc.) I am more than happy to scrap it and 
start over, or otherwise redo large parts of it as required.

One way or the other, I'd like to see this through to the finish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...

2014-08-06 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1744#issuecomment-51367875
  
This looks good to me. What do you think, @JoshRosen / @davies ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2718] [yarn] Handle quotes and other ch...

2014-08-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1724#discussion_r15889526
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
@@ -148,4 +148,29 @@ object YarnSparkHadoopUtil {
 }
   }
 
+  /**
+   * Escapes a string for inclusion in a command line executed by Yarn. 
Yarn executes commands
+   * using `bash -c command arg1 arg2` and that means plain quoting 
doesn't really work. The
+   * argument is enclosed in single quotes and some key characters are 
escaped.
+   *
+   * @param arg A single argument.
+   * @return Argument quoted for execution via Yarn's generated shell 
script.
+   */
+  def escapeForShell(arg: String): String = {
+if (arg != null) {
+  val escaped = new StringBuilder(')
+  for (i - 0 to arg.length() - 1) {
+arg.charAt(i) match {
+  case '$' = escaped.append(\\$)
+  case '' = escaped.append(\\\)
+  case '\'' = escaped.append('\\\'\\\')
--- End diff --

Yeah, this is the tricky one. Escaping single quotes inside a single-quoted 
string does not work.

So what you have to do is close the previous string (remember the whole 
thing is wrapped in single quotes), and start a new string, delimited by double 
quotes, with a single single quote in it. So basically, for a string containing 
a single single quote, you're concatenating three different strings:

- An empty string at the start: ''
- The single quote, wrapped in double quotes: '
- An empty string at the end: ''

And since all this is already inside double quotes in the bash script 
itself, you need to also escape the double quotes. Fun.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2877] [SQL] MetastoreRelation should us...

2014-08-06 Thread yhuai
GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/1806

[SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when 
creating the tableDesc

JIRA: https://issues.apache.org/jira/browse/SPARK-2877

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-2877

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1806.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1806


commit 4142bcb38d1e7b219091f0f23b230da9f46e5bb0
Author: Yin Huai h...@cse.ohio-state.edu
Date:   2014-08-06T17:43:55Z

Use Spark's classloader.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2877] [SQL] MetastoreRelation should us...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1806#issuecomment-51370857
  
QA tests have started for PR 1806. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18029/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-08-06 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/1671#issuecomment-51371929
  
Fair enough.

Standard approach for same token appearing multiple times in a training
example is additive, so it would act the same as HashingTF computation for
text (and as a one-hot style encoder for categorical features, and just
normally for real features).

To deal with collisions, typically take signed features by either (1) apply
two hashes, one of which determines sign (cf VW), or (2) use just one
signed hash function and take sign of hash value as sign of feature (cf
scikit-learn). The idea being that collisions cancel each other out.


On Wed, Aug 6, 2014 at 7:01 PM, Xiangrui Meng notificati...@github.com
wrote:

 @MLnick https://github.com/MLnick FeatureHasher might be too general.
 One thing it is not clear from its name is how to resolve conflicts: same
 word appears more than once, or different words having the same hash 
value.
 We can add FeatureHasher (or called HashingVectorizer) later. I think it
 might be useful to keep HashingTF as a special case.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1671#issuecomment-51364477.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51372235
  
QA tests have started for PR 1481. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18030/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >