[GitHub] spark pull request: [SPARK-1170] Add histogram method to Python's ...
Github user nrchandan commented on the pull request: https://github.com/apache/spark/pull/1783#issuecomment-51296786 @davies I'll go through your code today. I think you meant #1791 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1770#issuecomment-51296966 QA results for PR 1770:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17996/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-51296954 Alright, I've merged this in. Thanks Nan! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51297351 Tests passed according to https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17994/consoleFull. Reverting the commit Do not special case spark.ports.maxRetries during tests test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2861] Fix Doc comment of histogram meth...
Github user nrchandan commented on the pull request: https://github.com/apache/spark/pull/1786#issuecomment-51297349 @rxin The test that has failed is not related to the change. Could you resubmit for testing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51297508 QA tests have started for PR 1777. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18002/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Tighten the visibility of various SQLCon...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1794 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Fix logging warn - debug
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1800 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51297756 The histogram() had been implemented in pure Python, it will support integer better, also it will support RDD of strings and other comparable objects. This was inspired by #1783 et, and much improved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1770#issuecomment-51297776 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1513#issuecomment-51297961 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1770#issuecomment-51298074 QA tests have started for PR 1770. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18004/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2874][SQL] Fixed usage messages of all ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51298082 QA tests have started for PR 1801. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18003/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-51298178 @dorx I checked R's implementation and finally figured out what is going on. 1. When only a vector `x` is given, it is treated as a vector containing frequency counts for categories and tested against multinomial distribution. 2. When a matrix `x` is given, it is treated as a contingency table and the test is for independence. 3. When both `x` and `y` are given, both vectors are treated as factors (categorical values) and the test is for independence. I want to suggest the following APIs: ~~~ // test observed frequencies against multinomial distribution with // `p = (1/n, 1/n, ..., 1/n)` def chiSqTest(counts: Vector) // test observed frequencies against the given multinomial distribution def chiSqTest(counts: Vector, p: Vector) // test independence using the given contingency table def chiSqTest(counts: Matrix) // test independence using the given observed pairs (assuming categorical values) def chiSqTest[V1, V2](observations: RDD[(V1, V2)]) ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1656: Fix potential resource leaks
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/577#issuecomment-51298306 Resolved the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15857945 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.annotation.Experimental + +/** + * :: Experimental :: + * Trait for hypothesis test results. + */ +@Experimental +trait TestResult { + + def pValue: Double + + def degreesOfFreedom: Array[Long] --- End diff -- `df` should be an array of double or we can make it a generic type. In t-test and f-test, `df` are not integers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1513#issuecomment-51298388 QA tests have started for PR 1513. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18005/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1656: Fix potential resource leaks
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/577#issuecomment-51298416 QA tests have started for PR 577. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18007/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2874][SQL] Fixed usage messages of all ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51298396 QA tests have started for PR 1801. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18006/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1022][Streaming][HOTFIX] Fixed zookeepe...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1797#issuecomment-51298644 LGTM - TD go ahead and merge as this is making the maven build angry :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1513#issuecomment-51298628 QA results for PR 1513:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18005/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] Use this.type as return type in k-mean...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1796#issuecomment-51298668 LGTM. Merged into both master and branch-1.1. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/1513#issuecomment-51298744 It looks it's not failing unit tests but can't even compile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2805] akka 2.3.4
Github user avati commented on the pull request: https://github.com/apache/spark/pull/1685#issuecomment-51298746 The latest updated patches depend on published packages built by https://github.com/avati/spark-shaded scripts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1170] Add histogram method to Python's ...
Github user nrchandan commented on the pull request: https://github.com/apache/spark/pull/1783#issuecomment-51298908 @davies #1791 looks good. Feel free to close this one as duplicate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1799#issuecomment-51299104 QA results for PR 1799:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18000/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1022][Streaming][HOTFIX] Fixed zookeepe...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1797#issuecomment-51299192 Okay @tdas actually I went ahead and merged this into master and 1.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2875] [PySpark] [SQL] handle null in sc...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/1802 [SPARK-2875] [PySpark] [SQL] handle null in schemaRDD() Handle null in schemaRDD during converting them into Python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark json Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1802.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1802 commit 88e6b1fbea96519bc9eb81ca1cb54ad4c019245f Author: Davies Liu davies@gmail.com Date: 2014-08-06T06:42:33Z handle null in schemaRDD() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2874][SQL] Fixed usage messages of all ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51299486 Thanks cheng. This LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1791#issuecomment-51299595 QA results for PR 1791:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18001/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2875] [PySpark] [SQL] handle null in sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1802#issuecomment-51299624 QA tests have started for PR 1802. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18008/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API
Github user nrchandan commented on a diff in the pull request: https://github.com/apache/spark/pull/1791#discussion_r15858417 --- Diff: python/pyspark/rdd.py --- @@ -854,6 +884,97 @@ def redFunc(left_counter, right_counter): return self.mapPartitions(lambda i: [StatCounter(i)]).reduce(redFunc) +def histogram(self, buckets, even=False): + +Compute a histogram using the provided buckets. The buckets +are all open to the right except for the last which is closed. +e.g. [1,10,20,50] means the buckets are [1,10) [10,20) [20,50], +which means 1=x10, 10=x20, 20=x=50. And on the input of 1 +and 50 we would have a histogram of 1,0,1. + +If your histogram is evenly spaced (e.g. [0, 10, 20, 30]), +this can be switched from an O(log n) inseration to O(1) per +element(where n = # buckets), if you set `even` to True. + +Buckets must be sorted and not contain any duplicates, must be +at least two elements. + +If `buckets` is a number, it will generates buckets which is +evenly spaced between the minimum and maximum of the RDD. For +example, if the min value is 0 and the max is 100, given buckets +as 2, the resulting buckets will be [0,50) [50,100]. buckets must +be at least 1 If the RDD contains infinity, NaN throws an exception +If the elements in RDD do not vary (max == min) always returns +a single bucket. + +It will return an tuple of buckets and histogram. + + rdd = sc.parallelize(range(51)) + rdd.histogram(2) +([0, 25, 50], [25, 26]) + rdd.histogram([0, 5, 25, 50]) +([0, 5, 25, 50], [5, 20, 26]) + rdd.histogram([0, 15, 30, 45, 60], True) +([0, 15, 30, 45, 60], [15, 15, 15, 6]) + + +if isinstance(buckets, (int, long)): +if buckets 1: +raise ValueError(buckets should not less than 1) + +# faster than stats() +def minmax(it): +minv, maxv = float(inf), float(-inf) +for v in it: +minv = min(minv, v) +maxv = max(maxv, v) +return [(minv, maxv)] + +def _merge(a, b): +return (min(a[0], b[0]), max(a[1], b[1])) + +minv, maxv = self.mapPartitions(minmax).reduce(_merge) + +if minv == maxv or buckets == 1: +return [minv, maxv], [self.count()] + +inc = (maxv - minv) / buckets +# keep them as integer if possible +if inc * buckets != maxv - minv: --- End diff -- This was smart! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1170] Add histogram method to Python's ...
Github user nrchandan closed the pull request at: https://github.com/apache/spark/pull/1783 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51299909 Hey @chenghao-intel, answers for your questions: 1. Actually `bin/spark-sql` doesn't output lots of logs by default. Did you set something like `hive.root.looger=INFO,console` in your `hive-site.xml` or elsewhere (similar to what `shark-withinfo` does in Shark)? 1. Hmm... I'm not very sure about this. Usually people just copy their `hive-site.xml` and `hive-log4j.properties` from their existing Hive installation. Maybe we can do it in another PR. One thing to note, [`hive-default.xml.template` in Hive 0.12 isn't a valid XML](https://github.com/apache/hive/blob/release-0.12.0/conf/hive-default.xml.template#L2000), needs some minor tweak before being added. 1. Need some more time to investigate this one. We can discuss this offline. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/1513#issuecomment-51300042 it's becaude [SPARK-2260] update Command.scala, i will fix it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1799#issuecomment-51300053 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1799#issuecomment-51300244 QA tests have started for PR 1799. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18009/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][Streaming] Handle port collisions in ...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/1803 [HOTFIX][Streaming] Handle port collisions in flume polling test This is failing my tests in #1777. @tdas You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark fix-flaky-streaming-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1803.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1803 commit af3ddc9397cecd6eeb350778f8fcb5671891bbe6 Author: Andrew Or andrewo...@gmail.com Date: 2014-08-06T06:59:11Z Handle port collisions in flume polling test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51300382 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][Streaming] Handle port collisions in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1803#issuecomment-51300578 QA tests have started for PR 1803. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18010/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51300564 QA tests have started for PR 1777. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18011/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51300754 Okay I'm gonna merge this. Just failing a streaming flume test... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2862] Use shorthand range notation to a...
Github user nrchandan commented on the pull request: https://github.com/apache/spark/pull/1787#issuecomment-51300728 Added Scala bug ID. Fixed the coding convention. Ready to retest. Cc @davies @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] DIMSUM: Dimension Independent Matrix S...
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51300825 @srowen agreed the core vs external library question is important. The requirements [here](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) seem reasonable, but there's still gray area. For example, we have lots of analyses that are known / accepted but should I think remain external because they are for specific data types (images time series). Re: this particular algorithm, it's definitely something we're interested in using, sounds like others are too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51300902 QA tests have started for PR 1481. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18012/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] Use this.type as return type in k-mean...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1796 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1022][Streaming][HOTFIX] Fixed zookeepe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1797 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1777 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] DIMSUM: Dimension Independent Matrix S...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51301688 @rezazadeh Do you mind creating a JIRA for this and then add `[SPARK-]` to the title? We also want to learn more about the theory, especially the relation between storage/computation complexity and failure rate. Btw, to me, finding similar rows (observations) is more natural than finding similar columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2862] Use shorthand range notation to a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1787#issuecomment-51302231 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2849] bin/spark-submit should respect s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1770#issuecomment-51302212 QA results for PR 1770:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18004/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1751#issuecomment-51303742 @tdas the problem is that the dependency is already there. Spark core uses Zookeeper classes directly in `SparkCuratorUtil`. If you remove Curator, the code no longer compiles. Now, granted, if you removed Curator, you'd be removing this class too. So it's kind of OK to let the transitive dependency happen in practice in Spark Core. However, it is not pulling in the version declared in `zookeeper.version` (necessarily). In fact, that property is not used at all. It exists I think for the benefit of vendors who are building the whole thing for a system that uses a particular zookeeper version, as evidenced by its presence in the MapR build. I think the intent is to control the version Curator depends on. So I think there is an intent for Core to depend directly on ZK for this reason? I wouldn't agree that letting the transitive dependency happen to cover this is a good idea for the Kafka test though. There, it uses Zookeeper independently of Curator. It's a test dependency too so doesn't affect the non-test artifacts. It would be more correct/robust to simply express the dependency on zookeeper in this case. I'll open a PR that shows what that looks like. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51303827 QA results for PR 1801:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18006/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51303836 QA results for PR 1801:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18003/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51304438 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51304501 QA results for PR 1481:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18012/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][Streaming] Handle port collisions in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1803#issuecomment-51305013 QA results for PR 1803:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18010/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2875] [PySpark] [SQL] handle null in sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1802#issuecomment-51305553 QA results for PR 1802:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18008/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1804 SPARK-1022 [BUILD] Depend on Zookeeper from Kafka tests Per discussion at https://github.com/apache/spark/pull/1751 I suggest that the more correct thing to do is to depend on Zookeeper from Kafka tests. The first commit does this. The second commit reflects the explicit dependency on ZK in core, in order to make `zookeeper.version` do something. Another reasonable change would be to simply remove `zookeeper.version` to reflect that otherwise it does nothing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-1022 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1804 commit 109f80e23d2f9909afe57b8598f7cef9c0f6c310 Author: Sean Owen sro...@gmail.com Date: 2014-08-06T08:21:14Z Directly depend on zookeeper in external/kafka tests, because tests use ZK directly commit 4cbd04d62eb5e93cab9a237aea7209b66b67b4be Author: Sean Owen sro...@gmail.com Date: 2014-08-06T08:22:01Z Depend on ZK directly, with Curator, to pull in version specified by zookeeper.version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1751#issuecomment-51306477 Have a look at https://github.com/apache/spark/pull/1804 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2798 [BUILD] Jenkins build failing due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1726#issuecomment-51307848 QA tests have started for PR 1726. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18015/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2179][SQL] Public API for DataTypes and...
Github user chutium commented on a diff in the pull request: https://github.com/apache/spark/pull/1346#discussion_r15862768 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -89,6 +88,44 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, SparkLogicalPlan(ExistingRdd.fromProductRdd(rdd))(self)) /** + * :: DeveloperApi :: + * Creates a [[SchemaRDD]] from an [[RDD]] containing [[Row]]s by applying a schema to this RDD. + * It is important to make sure that the structure of every [[Row]] of the provided RDD matches + * the provided schema. Otherwise, there will be runtime exception. + * Example: + * {{{ + * import org.apache.spark.sql._ + * val sqlContext = new org.apache.spark.sql.SQLContext(sc) + * + * val schema = + *StructType( + * StructField(name, StringType, false) :: + * StructField(age, IntegerType, true) :: Nil) + * --- End diff -- good, i merged the change and used this API ```applySchema(rowRDD, appliedSchema)``` in #1612 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2798 [BUILD] Jenkins build failing due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1726#issuecomment-51311768 QA results for PR 1726:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18015/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1805 SPARK-2879 [BUILD] Use HTTPS to access Maven Central and other repos Maven Central has just now enabled HTTPS access for everyone to Maven Central (http://central.sonatype.org/articles/2014/Aug/03/https-support-launching-now/) This is timely, as a reminder of how easily an attacker can slip malicious code into a build that's downloading artifacts over HTTP (http://blog.ontoillogical.com/blog/2014/07/28/how-to-take-over-any-java-developer/). In the meantime, it looks like the Spring repo also now supports HTTPS, so can be used this way too. I propose to use HTTPS to access these repos. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-2879 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1805.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1805 commit 7043a8e4d1576424068bf307abb315809696c690 Author: Sean Owen sro...@gmail.com Date: 2014-08-06T09:46:16Z Use HTTPS for Maven Central libs and plugins; use id 'central' to override parent properly; use HTTPS for Spring repo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1805#issuecomment-51313947 QA tests have started for PR 1805. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18016/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2798 [BUILD] Jenkins build failing due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1726#issuecomment-51313961 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2798 [BUILD] Correct several small error...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1726#issuecomment-51314361 QA tests have started for PR 1726. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18017/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2817] [SQL] add show create table sup...
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/1760#issuecomment-51318897 how about adding another rule in rewritePaths funciton in TestHive.scala? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2798 [BUILD] Correct several small error...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1726#issuecomment-51319235 QA results for PR 1726:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18017/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1804#issuecomment-51320964 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1804#issuecomment-51321089 QA tests have started for PR 1804. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18018/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1804#issuecomment-51321185 QA results for PR 1804:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18018/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1804#issuecomment-51325551 QA tests have started for PR 1804. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18019/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1022 [BUILD] Depend on Zookeeper from Ka...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1804#issuecomment-51330035 QA results for PR 1804:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18019/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2872] Fix conflict between code and doc...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/1684#issuecomment-51332852 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2872] Fix conflict between code and doc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1684#issuecomment-51333176 QA tests have started for PR 1684. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18020/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1618#issuecomment-51334454 QA tests have started for PR 1618. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18022/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1618#issuecomment-51337161 QA tests have started for PR 1618. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18024/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1618#issuecomment-51344240 QA results for PR 1618:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18024/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1618#issuecomment-51344815 QA results for PR 1618:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18022/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1805#issuecomment-51349772 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-51347255 The previous proposal may be hard to implement in Python. Another solution would be separate goodness-of-fit test from independence test, e.g., `chiSqGofTest` and `chiSqIndTest`. ~~~ def chiSqGofTest(counts: Vector) def chiSqGofTest(counts: Vector, p: Vector) def chiSqIndTest(counts: Matrix) def chiSqIndTest[V1, V2](observations: RDD[(V1, V2)]) ~~~ We can also add direct RDD support, which may be unnecessary: ~~~ def chiSqGofTest[V](observations: RDD[V], p: Map[V, Double]) ~~~ Since we only support `pearson`, we can hide `method` in the public API for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1805#issuecomment-51350321 QA tests have started for PR 1805. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18025/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51353319 QA tests have started for PR 1744. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18026/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [GraphX] initialmessage for pagerank should be...
Github user luyi0619 commented on the pull request: https://github.com/apache/spark/pull/1128#issuecomment-51358241 sure, thanks for your advice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51358315 QA tests have started for PR 1481. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18027/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1805#issuecomment-51358446 QA results for PR 1805:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18025/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51361188 QA results for PR 1744:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18026/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core][SQL] A workaround for SPARK...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1801#issuecomment-51362332 QA tests have started for PR 1801. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18028/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51363900 QA results for PR 1481:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18027/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1671#discussion_r15887125 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.feature + +import java.lang.{Iterable = JavaIterable} + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.annotation.Experimental +import org.apache.spark.api.java.JavaRDD +import org.apache.spark.mllib.linalg.{Vector, Vectors} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.Utils + +/** + * :: Experimental :: + * Maps a sequence of terms to their term frequencies using the hashing trick. + * + * @param numFeatures number of features (default: 100) + */ +@Experimental +class HashingTF(val numFeatures: Int) extends Serializable { + + def this() = this(100) + --- End diff -- Good point. I will update the default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added support for accessing secured HDFS
Github user dkanoafry commented on the pull request: https://github.com/apache/spark/pull/265#issuecomment-51365398 hi, whatever happened to this PR? I am interested in reading data from secure HDFS into spark running on Mesos... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51367511 By the way, if there is something majorly wrong with this PR (e.g. it's too big; I took the wrong approach; etc.) I am more than happy to scrap it and start over, or otherwise redo large parts of it as required. One way or the other, I'd like to see this through to the finish. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51367875 This looks good to me. What do you think, @JoshRosen / @davies ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2718] [yarn] Handle quotes and other ch...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1724#discussion_r15889526 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -148,4 +148,29 @@ object YarnSparkHadoopUtil { } } + /** + * Escapes a string for inclusion in a command line executed by Yarn. Yarn executes commands + * using `bash -c command arg1 arg2` and that means plain quoting doesn't really work. The + * argument is enclosed in single quotes and some key characters are escaped. + * + * @param arg A single argument. + * @return Argument quoted for execution via Yarn's generated shell script. + */ + def escapeForShell(arg: String): String = { +if (arg != null) { + val escaped = new StringBuilder(') + for (i - 0 to arg.length() - 1) { +arg.charAt(i) match { + case '$' = escaped.append(\\$) + case '' = escaped.append(\\\) + case '\'' = escaped.append('\\\'\\\') --- End diff -- Yeah, this is the tricky one. Escaping single quotes inside a single-quoted string does not work. So what you have to do is close the previous string (remember the whole thing is wrapped in single quotes), and start a new string, delimited by double quotes, with a single single quote in it. So basically, for a string containing a single single quote, you're concatenating three different strings: - An empty string at the start: '' - The single quote, wrapped in double quotes: ' - An empty string at the end: '' And since all this is already inside double quotes in the bash script itself, you need to also escape the double quotes. Fun. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2877] [SQL] MetastoreRelation should us...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/1806 [SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when creating the tableDesc JIRA: https://issues.apache.org/jira/browse/SPARK-2877 You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark SPARK-2877 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1806.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1806 commit 4142bcb38d1e7b219091f0f23b230da9f46e5bb0 Author: Yin Huai h...@cse.ohio-state.edu Date: 2014-08-06T17:43:55Z Use Spark's classloader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2877] [SQL] MetastoreRelation should us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1806#issuecomment-51370857 QA tests have started for PR 1806. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18029/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/1671#issuecomment-51371929 Fair enough. Standard approach for same token appearing multiple times in a training example is additive, so it would act the same as HashingTF computation for text (and as a one-hot style encoder for categorical features, and just normally for real features). To deal with collisions, typically take signed features by either (1) apply two hashes, one of which determines sign (cf VW), or (2) use just one signed hash function and take sign of hash value as sign of feature (cf scikit-learn). The idea being that collisions cancel each other out. On Wed, Aug 6, 2014 at 7:01 PM, Xiangrui Meng notificati...@github.com wrote: @MLnick https://github.com/MLnick FeatureHasher might be too general. One thing it is not clear from its name is how to resolve conflicts: same word appears more than once, or different words having the same hash value. We can add FeatureHasher (or called HashingVectorizer) later. I think it might be useful to keep HashingTF as a special case. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/1671#issuecomment-51364477. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51372235 QA tests have started for PR 1481. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18030/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org