[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184564464 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51348/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11186#issuecomment-184564628 @BryanCutler made a quick pass. While we're doing the format change, we may as well make a few little doc clean ups as per my comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184564462 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184564001 **[Test build #51349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51349/consoleFull)** for PR 10757 at commit [`78f156c`](https://github.com/apache/spark/commit/78f156ce0ff51fbd994e15b14bda982e9fcd0868). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52977115 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -234,11 +238,35 @@ def _prepare(cls, ratings): def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative=False, seed=None): """ -Train a matrix factorization model given an RDD of ratings given by users to some products, -in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the -product of two lower-rank matrices of a given rank (number of features). To solve for these -features, we run a given number of iterations of ALS. This is done using a level of -parallelism given by `blocks`. +Train a matrix factorization model given an RDD of ratings given by +users to some products, in the form of (userID, productID, rating) +pairs. We approximate the ratings matrix as the product of two +lower-rank matrices of a given rank (number of features). To solve +for these features, we run a given number of iterations of ALS. This +is done using a level of parallelism given by `blocks`. + +:param ratings: + RDD of `Rating` or (userID, productID, rating) tuple. +:param rank: + Rank of the feature matrices computed (number of features). +:param iterations: + Number of iterations run for each batch of data. --- End diff -- This is a little unclear - what is meant by "for each batch of data"? Perhaps this should simply be `Number of ALS iterations to run`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13334] [ML] ML KMeansModel / BisectingK...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11214#issuecomment-184563668 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13334] [ML] ML KMeansModel / BisectingK...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11214#issuecomment-184563669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51346/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13334] [ML] ML KMeansModel / BisectingK...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11214#issuecomment-184563558 **[Test build #51346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51346/consoleFull)** for PR 11214 at commit [`6fb0b4d`](https://github.com/apache/spark/commit/6fb0b4dc8d7d608f9e394fc1cac896cf645dc423). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976995 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -234,11 +238,35 @@ def _prepare(cls, ratings): def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative=False, seed=None): """ -Train a matrix factorization model given an RDD of ratings given by users to some products, -in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the -product of two lower-rank matrices of a given rank (number of features). To solve for these -features, we run a given number of iterations of ALS. This is done using a level of -parallelism given by `blocks`. +Train a matrix factorization model given an RDD of ratings given by +users to some products, in the form of (userID, productID, rating) +pairs. We approximate the ratings matrix as the product of two +lower-rank matrices of a given rank (number of features). To solve +for these features, we run a given number of iterations of ALS. This --- End diff -- I wonder if the last sentence `This is done using ...` is really necessary (it's better explained in the param doc string below)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976896 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -249,11 +277,39 @@ def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative def trainImplicit(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, alpha=0.01, nonnegative=False, seed=None): """ -Train a matrix factorization model given an RDD of 'implicit preferences' given by users -to some products, in the form of (userID, productID, preference) pairs. We approximate the -ratings matrix as the product of two lower-rank matrices of a given rank (number of -features). To solve for these features, we run a given number of iterations of ALS. -This is done using a level of parallelism given by `blocks`. +Train a matrix factorization model given an RDD of 'implicit +preferences' given by users to some products, in the form of +(userID, productID, preference) pairs. We approximate the ratings --- End diff -- Same comment as above applies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12583][Mesos] Mesos shuffle service: Do...
Github user bbossy commented on the pull request: https://github.com/apache/spark/pull/11207#issuecomment-184563434 @JoshRosen changed to a more descriptive title and added a more detailed problem description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976861 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -234,11 +238,35 @@ def _prepare(cls, ratings): def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative=False, seed=None): """ -Train a matrix factorization model given an RDD of ratings given by users to some products, -in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the -product of two lower-rank matrices of a given rank (number of features). To solve for these -features, we run a given number of iterations of ALS. This is done using a level of -parallelism given by `blocks`. +Train a matrix factorization model given an RDD of ratings given by +users to some products, in the form of (userID, productID, rating) +pairs. We approximate the ratings matrix as the product of two --- End diff -- We refer to `pairs` here but `tuple` below. Perhaps this should be consistent ("tuple" since it's not a pair in fact) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976763 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -165,28 +165,32 @@ def productFeatures(self): @since("1.4.0") def recommendUsers(self, product, num): """ -Recommends the top "num" number of users for a given product and returns a list -of Rating objects sorted by the predicted rating in descending order. +Recommends the top "num" number of users for a given product and +returns a list of Rating objects sorted by the predicted rating in +descending order. """ return list(self.call("recommendUsers", product, num)) @since("1.4.0") def recommendProducts(self, user, num): """ -Recommends the top "num" number of products for a given user and returns a list -of Rating objects sorted by the predicted rating in descending order. +Recommends the top "num" number of products for a given user and +returns a list of Rating objects sorted by the predicted rating in +descending order. """ return list(self.call("recommendProducts", user, num)) def recommendProductsForUsers(self, num): """ -Recommends top "num" products for all users. The number returned may be less than this. +Recommends top "num" products for all users. The number returned may be +less than this. """ return self.call("wrappedRecommendProductsForUsers", num) def recommendUsersForProducts(self, num): """ -Recommends top "num" users for all products. The number returned may be less than this. +Recommends top "num" users for all products. The number returned may be --- End diff -- same comment applies as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976749 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -165,28 +165,32 @@ def productFeatures(self): @since("1.4.0") def recommendUsers(self, product, num): """ -Recommends the top "num" number of users for a given product and returns a list -of Rating objects sorted by the predicted rating in descending order. +Recommends the top "num" number of users for a given product and +returns a list of Rating objects sorted by the predicted rating in +descending order. """ return list(self.call("recommendUsers", product, num)) @since("1.4.0") def recommendProducts(self, user, num): """ -Recommends the top "num" number of products for a given user and returns a list -of Rating objects sorted by the predicted rating in descending order. +Recommends the top "num" number of products for a given user and +returns a list of Rating objects sorted by the predicted rating in +descending order. """ return list(self.call("recommendProducts", user, num)) def recommendProductsForUsers(self, num): """ -Recommends top "num" products for all users. The number returned may be less than this. +Recommends top "num" products for all users. The number returned may be --- End diff -- While we're at this, can we say something like `... the number of recommendations returned per user may be less than this` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976638 --- Diff: python/pyspark/mllib/fpm.py --- @@ -128,17 +131,27 @@ class PrefixSpan(object): @since("1.6.0") def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=3200): """ -Finds the complete set of frequent sequential patterns in the input sequences of itemsets. - -:param data: The input data set, each element contains a sequnce of itemsets. -:param minSupport: the minimal support level of the sequential pattern, any pattern appears -more than (minSupport * size-of-the-dataset) times will be output (default: `0.1`) -:param maxPatternLength: the maximal length of the sequential pattern, any pattern appears -less than maxPatternLength will be output. (default: `10`) -:param maxLocalProjDBSize: The maximum number of items (including delimiters used in -the internal storage format) allowed in a projected database before local -processing. If a projected database exceeds this size, another -iteration of distributed prefix growth is run. (default: `3200`) +Finds the complete set of frequent sequential patterns in the +input sequences of itemsets. + +:param data: + The input data set, each element contains a sequence of + itemsets. +:param minSupport: + The minimal support level of the sequential pattern, any + pattern appears more than (minSupport * size-of-the-dataset) + times will be output. + (default: 0.1) +:param maxPatternLength: + The maximal length of the sequential pattern, any pattern + appears less than maxPatternLength will be output. --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][PYSPARK][DOC] PySpark fpm and al...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11186#discussion_r52976590 --- Diff: python/pyspark/mllib/fpm.py --- @@ -128,17 +131,27 @@ class PrefixSpan(object): @since("1.6.0") def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=3200): """ -Finds the complete set of frequent sequential patterns in the input sequences of itemsets. - -:param data: The input data set, each element contains a sequnce of itemsets. -:param minSupport: the minimal support level of the sequential pattern, any pattern appears -more than (minSupport * size-of-the-dataset) times will be output (default: `0.1`) -:param maxPatternLength: the maximal length of the sequential pattern, any pattern appears -less than maxPatternLength will be output. (default: `10`) -:param maxLocalProjDBSize: The maximum number of items (including delimiters used in -the internal storage format) allowed in a projected database before local -processing. If a projected database exceeds this size, another -iteration of distributed prefix growth is run. (default: `3200`) +Finds the complete set of frequent sequential patterns in the +input sequences of itemsets. + +:param data: + The input data set, each element contains a sequence of + itemsets. +:param minSupport: + The minimal support level of the sequential pattern, any + pattern appears more than (minSupport * size-of-the-dataset) --- End diff -- Can we change this from `appears` -> `appearing` (or `... pattern that appears ...`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10607#issuecomment-184557820 @sethah I did find the perf-test results very difficult to read. Would it be ok to summarize into a readable table to make it easier to compare the *before* and *after* numbers (for posterity)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11100 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10152#discussion_r52975344 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -289,24 +301,19 @@ class Word2Vec extends Serializable with Logging { val expTable = sc.broadcast(createExpTable()) val bcVocab = sc.broadcast(vocab) val bcVocabHash = sc.broadcast(vocabHash) - -val sentences: RDD[Array[Int]] = words.mapPartitions { iter => - new Iterator[Array[Int]] { -def hasNext: Boolean = iter.hasNext - -def next(): Array[Int] = { - val sentence = ArrayBuilder.make[Int] - var sentenceLength = 0 - while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) { -val word = bcVocabHash.value.get(iter.next()) -word match { - case Some(w) => -sentence += w -sentenceLength += 1 - case None => -} - } - sentence.result() +// each partition is a collection of sentences, +// will be translated into arrays of Index integer +val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter => + // Each sentence will map to 0 or more Array[Int] + sentenceIter.flatMap { sentence => +// Sentence of words, some of which map to a word index +val wordIndexes = sentence.flatMap(bcVocabHash.value.get) +if (wordIndexes.nonEmpty) { --- End diff -- @ygcao you have kept the if statement here, which I believe both @mengxr and @srowen have shown is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13325][SQL] Create a 64-bit hashcode ex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11209#issuecomment-184556811 **[Test build #51347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51347/consoleFull)** for PR 11209 at commit [`54c818b`](https://github.com/apache/spark/commit/54c818b4cd66f8108a90d7cf350f8c31b2cd8caa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11100#issuecomment-184556107 LGTM, we keep the `VirtualColumn` to show a better error message, merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13334] [ML] ML KMeansModel / BisectingK...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11214#issuecomment-184555139 **[Test build #51346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51346/consoleFull)** for PR 11214 at commit [`6fb0b4d`](https://github.com/apache/spark/commit/6fb0b4dc8d7d608f9e394fc1cac896cf645dc423). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13325][SQL] Create a 64-bit hashcode ex...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/11209#issuecomment-184554119 @jodersky / @cloud-fan The tests also failed on my machine. It turns out I messed up the initialization order during some cleaning up. This is fixed and the tests should pass now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184552644 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184552648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51343/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184552364 **[Test build #51343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51343/consoleFull)** for PR 10757 at commit [`f0eb991`](https://github.com/apache/spark/commit/f0eb9917f276a2f6f7690b9b48739d0bd2624433). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13329] [SQL] considering output for sta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11210#issuecomment-184552013 **[Test build #51345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51345/consoleFull)** for PR 11210 at commit [`f431fd8`](https://github.com/apache/spark/commit/f431fd87b0a6deb02d0e19f3310cc58eed04fa3a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13334] [ML] ML KMeansModel / BisectingK...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/11214 [SPARK-13334] [ML] ML KMeansModel / BisectingKMeansModel / QuantileDiscretizer should be set parent ML KMeansModel / BisectingKMeansModel / QuantileDiscretizer should be set parent. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-13334 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11214 commit 6fb0b4dc8d7d608f9e394fc1cac896cf645dc423 Author: Yanbo LiangDate: 2016-02-16T06:49:08Z ML KMeansModel / BisectingKMeansModel / QuantileDiscretizer should be set parent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13310] [SQL] Resolve Missing Sorting Co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11198#issuecomment-184549831 **[Test build #51344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51344/consoleFull)** for PR 11198 at commit [`07de4bc`](https://github.com/apache/spark/commit/07de4bcaafdad13fa5528ad280781247aa40f63e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13308] ManagedBuffers passed to OneToOn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11193#issuecomment-184547728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51340/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13308] ManagedBuffers passed to OneToOn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11193#issuecomment-184547724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13308] ManagedBuffers passed to OneToOn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11193#issuecomment-184547086 **[Test build #51340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51340/consoleFull)** for PR 11193 at commit [`2c00f29`](https://github.com/apache/spark/commit/2c00f29272051b8092b6a8a976392e32eeb5488b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13302][PYSPARK][TESTS] Move the temp fi...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/11197#issuecomment-184539074 great :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-184538336 Thanks for the review @yinxusen. I have configured the code format in IDE and using the same for formatting the code. I will fix these comments and update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11083#discussion_r52972187 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Broadcast.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution + +import scala.concurrent._ +import scala.concurrent.duration._ + +import org.apache.spark.broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.physical.BroadcastMode +import org.apache.spark.util.ThreadUtils + +/** + * A broadcast collects, transforms and finally broadcasts the result of a transformed SparkPlan. + */ +case class Broadcast( --- End diff -- Do we need to merge this class with Exchange? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11083#discussion_r52971787 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Broadcast.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution + +import scala.concurrent._ +import scala.concurrent.duration._ + +import org.apache.spark.broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.physical.BroadcastMode +import org.apache.spark.util.ThreadUtils + +/** + * A broadcast collects, transforms and finally broadcasts the result of a transformed SparkPlan. + */ +case class Broadcast( +mode: BroadcastMode, +child: SparkPlan) extends UnaryNode { + + override def output: Seq[Attribute] = child.output + + val timeout: Duration = { +val timeoutValue = sqlContext.conf.broadcastTimeout +if (timeoutValue < 0) { + Duration.Inf +} else { + timeoutValue.seconds +} + } + + @transient + private lazy val relationFuture: Future[broadcast.Broadcast[Any]] = { +// broadcastFuture is used in "doExecute". Therefore we can get the execution id correctly here. +val executionId = sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) +Future { + // This will run in another thread. Set the execution id so that we can connect these jobs + // with the correct execution. + SQLExecution.withExecutionId(sparkContext, executionId) { +// Note that we use .execute().collect() because we don't want to convert data to Scala +// types +val input: Array[InternalRow] = child.execute().map { row => + row.copy() +}.collect() + +// Construct and broadcast the relation. +sparkContext.broadcast(mode(input)) + } +}(Broadcast.executionContext) + } + + override protected def doPrepare(): Unit = { +// Materialize the future. +relationFuture + } + + override protected def doExecute(): RDD[InternalRow] = { +child.execute() // TODO throw an Exception here? --- End diff -- Throw an UnsupportedOperationException? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13302][PYSPARK][TESTS] Move the temp fi...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/11197#issuecomment-184534787 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11083#discussion_r52971719 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -395,18 +395,31 @@ private[sql] case class EnsureRequirements(sqlContext: SQLContext) extends Rule[ assert(requiredChildOrderings.length == children.length) // Ensure that the operator's children satisfy their output distribution requirements: -children = children.zip(requiredChildDistributions).map { case (child, distribution) => - if (child.outputPartitioning.satisfies(distribution)) { +children = children.zip(requiredChildDistributions).map { + case (child, distribution) if child.outputPartitioning.satisfies(distribution) => child - } else { + case (child, BroadcastDistribution(m1)) => +child match { + // The child is broadcasting the same variable: keep the child. + case Broadcast(m2, _) if m1 == m2 => child --- End diff -- I also have the same question. If we have a `BroadcastPartitioning`, seems we can avoid of these changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13302][PYSPARK][TESTS] Move the temp fi...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/11197#discussion_r52971579 --- Diff: python/pyspark/ml/clustering.py --- @@ -310,7 +303,17 @@ def _create_model(self, java_model): sqlContext = SQLContext(sc) globs['sc'] = sc globs['sqlContext'] = sqlContext -(failure_count, test_count) = doctest.testmod(globs=globs, optionflags=doctest.ELLIPSIS) -sc.stop() +import tempfile +temp_path = tempfile.mkdtemp() +globs['temp_path'] = temp_path +try: +(failure_count, test_count) = doctest.testmod(globs=globs, optionflags=doctest.ELLIPSIS) +sc.stop() +finally: --- End diff -- Sorry for misunderstand, I think your are right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13302][PYSPARK][TESTS] Move the temp fi...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11197#discussion_r52971131 --- Diff: python/pyspark/ml/clustering.py --- @@ -310,7 +303,17 @@ def _create_model(self, java_model): sqlContext = SQLContext(sc) globs['sc'] = sc globs['sqlContext'] = sqlContext -(failure_count, test_count) = doctest.testmod(globs=globs, optionflags=doctest.ELLIPSIS) -sc.stop() +import tempfile +temp_path = tempfile.mkdtemp() +globs['temp_path'] = temp_path +try: +(failure_count, test_count) = doctest.testmod(globs=globs, optionflags=doctest.ELLIPSIS) +sc.stop() +finally: --- End diff -- So finally is still useful even if we don't explicitly catch/handle any exceptions - are you saying the sc.stop and doctest will never throw any exceptions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Correct SparseVector.parse documentation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11213#issuecomment-184530007 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Correct SparseVector.parse documentation
GitHub user mgyucht opened a pull request: https://github.com/apache/spark/pull/11213 Correct SparseVector.parse documentation There's a small typo in the SparseVector.parse docstring (which says that it returns a DenseVector rather than a SparseVector), which seems to be incorrect. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgyucht/spark fix-sparsevector-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11213.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11213 commit 1e73745d5f97161a4084f2a838f1a1144b221aad Author: Miles YuchtDate: 2016-02-16T05:39:29Z Correct SparseVector.parse documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user ygcao commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-184528879 addressed the 'final' comment, and checked lint and test cases. shall we do the merge then? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not p...
Github user zjffdu commented on the pull request: https://github.com/apache/spark/pull/11211#issuecomment-184526617 PYTHONHASHSEED is set in script spark-submit no matter what version of python. And it would only be set in executor when python version is greater than 3.3. PYTHONHASHSEED is introduced in python 3.2.3 (https://docs.python.org/3.3/using/cmdline.html). I am not sure the purpose of disable random hash, just feel that we can set PYTHONHASHSEED as 0 in all the cases since it looks like there's no case we want to enable the random of hash. And it's also fine to set it in python 2, because it is only introduced after 3.2.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184525577 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184525578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51337/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/11179#issuecomment-184524951 Thank you for the review comments, @yanboliang I've added your suggestions. Let me know if you have more comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184525108 **[Test build #51337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51337/consoleFull)** for PR 10757 at commit [`21c94d2`](https://github.com/apache/spark/commit/21c94d2224609ce3171e62c7cb58ee64cca683e7). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13013][Docs] Replace example code in ml...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-184524570 @keypointt Please reformat other Java code files with 2-indent spaces, not only the lines that I pointed out. For re-using the example code, even though they are not identical, they are very similar in functionalities of showing the usage of those classes. Take `PowerIterationClusteringExample` as an example, other than rewriting the previous example code with the code in the markdown file, I prefer to change it as follows: ```scala def run(params: Params) { val conf = new SparkConf() .setMaster("local") .setAppName(s"PowerIterationClustering with $params") val sc = new SparkContext(conf) Logger.getRootLogger.setLevel(Level.WARN) // $example on$ val circlesRdd = generateCirclesRdd(sc, params.k, params.numPoints) val model = new PowerIterationClustering() .setK(params.k) .setMaxIterations(params.maxIterations) .setInitializationMode("degree") .run(circlesRdd) val clusters = model.assignments.collect().groupBy(_.cluster).mapValues(_.map(_.id)) val assignments = clusters.toList.sortBy { case (k, v) => v.length } val assignmentsStr = assignments .map { case (k, v) => s"$k -> ${v.sorted.mkString("[", ",", "]")}" }.mkString(", ") val sizesStr = assignments.map { _._2.length }.sorted.mkString("(", ",", ")") println(s"Cluster assignments: $assignmentsStr\ncluster sizes: $sizesStr") // $example off$ sc.stop() } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184524502 **[Test build #51343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51343/consoleFull)** for PR 10757 at commit [`f0eb991`](https://github.com/apache/spark/commit/f0eb9917f276a2f6f7690b9b48739d0bd2624433). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12375] [ML] add handleinvalid for vecto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10466#issuecomment-184524069 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51342/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12375] [ML] add handleinvalid for vecto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10466#issuecomment-184524066 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12375] [ML] add handleinvalid for vecto...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10466#issuecomment-184523963 **[Test build #51342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51342/consoleFull)** for PR 10466 at commit [`6a0efed`](https://github.com/apache/spark/commit/6a0efede2b99a315895b1d3cccb9262ea845476c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184522878 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13018][Docs] Replace example code in ml...
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/11126#discussion_r52969279 --- Diff: docs/mllib-pmml-model-export.md --- @@ -45,41 +45,12 @@ The table below outlines the `spark.mllib` models that can be exported to PMML a To export a supported `model` (see table above) to PMML, simply call `model.toPMML`. +As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats. --- End diff -- Let's wrap it next time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -523,11 +523,45 @@ case class Atan2(left: Expression, right: Expression) case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER") { - override def genCode(ctx: CodegenContext, ev: ExprCode): String = { -defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.pow($c1, $c2)") - } -} + override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, NumericType) + + override def dataType: DataType = (left.dataType, right.dataType) match { +case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt +case _ => DoubleType + } + + protected override def nullSafeEval(input1: Any, input2: Any): Any = +(left.dataType, right.dataType) match { + case (dt: DecimalType, ByteType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Byte]) + case (dt: DecimalType, ShortType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Short]) + case (dt: DecimalType, IntegerType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int]) + case (dt: DecimalType, FloatType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Float]) + case (dt: DecimalType, DoubleType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Double]) + case (dt1: DecimalType, dt2: DecimalType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Decimal].toDouble) --- End diff -- Shall we cast the result of `math.pow` back to `DecimalType` for these three cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12260][wip][Streaming]Graceful Shutdown...
Github user zzcclp commented on the pull request: https://github.com/apache/spark/pull/10252#issuecomment-184515856 @chenghao-intel @mwws , sorry for my late reply. Currently, we just record the kafka offset and accumulators to third-party storage system after per batch, and then restore them from the window‘s earliest start time. For stateful data, we have no good way to recover by now, so it will lose some statistical data. Next, one of our business system must ensure data integrity after software upgrade or even application logic update, so we urgently hope that spark can native support this feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11179#issuecomment-184515193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51339/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11179#issuecomment-184515192 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11179#issuecomment-184515101 **[Test build #51339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51339/consoleFull)** for PR 11179 at commit [`e4707e7`](https://github.com/apache/spark/commit/e4707e775f34c0018f74451d048fb28a9c08ef48). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11204#discussion_r52968390 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala --- @@ -201,4 +201,68 @@ class CatalystQlSuite extends PlanTest { parser.parsePlan("select sum(product + 1) over (partition by (product + (1)) order by 2) " + "from windowData") } + + test("nesting UNION") { +val parsed = parser.parsePlan( + """ + |SELECT `u_1`.`id` FROM (((SELECT `t0`.`id` FROM `default`.`t0`) + |UNION ALL (SELECT `t0`.`id` FROM `default`.`t0`)) UNION ALL + |(SELECT `t0`.`id` FROM `default`.`t0`)) AS u_1 + """.stripMargin) + +val expected = Project( + UnresolvedAlias(UnresolvedAttribute("u_1.id"), None) :: Nil, + Subquery("u_1", +Union( + Union( +Project( + UnresolvedAlias(UnresolvedAttribute("t0.id"), None) :: Nil, + UnresolvedRelation(TableIdentifier("t0", Some("default")), None)), +Project( + UnresolvedAlias(UnresolvedAttribute("t0.id"), None) :: Nil, + UnresolvedRelation(TableIdentifier("t0", Some("default")), None))), + Project( +UnresolvedAlias(UnresolvedAttribute("t0.id"), None) :: Nil, +UnresolvedRelation(TableIdentifier("t0", Some("default")), None) + +comparePlans(parsed, expected) + +val parsedSame = parser.parsePlan( + """ + |SELECT `u_1`.`id` FROM ((SELECT `t0`.`id` FROM `default`.`t0`) + |UNION ALL (SELECT `t0`.`id` FROM `default`.`t0`) UNION ALL + |(SELECT `t0`.`id` FROM `default`.`t0`)) AS u_1 + """.stripMargin) + +comparePlans(parsedSame, expected) + +val parsed2 = parser.parsePlan( --- End diff -- Recursively nested UNION. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11204#discussion_r52968367 --- Diff: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g --- @@ -2320,6 +2320,19 @@ regularBody[boolean topLevel] ) | selectStatement[topLevel] + | + (LPAREN selectStatement[true]) => nestedSetOpSelectStatement[topLevel] + ; + +nestedSetOpSelectStatement[boolean topLevel] + : + ( + LPAREN s=selectStatement[topLevel] RPAREN -> {$s.tree} + ) + (set=setOpSelectStatement[$nestedSetOpSelectStatement.tree, topLevel]) --- End diff -- I think it might be the simplest approach to support recursively nested UNION query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968376 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- Better add a comment that explains why we need to clone before write. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184513440 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51341/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13308] ManagedBuffers passed to OneToOn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11193#issuecomment-184512866 **[Test build #51340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51340/consoleFull)** for PR 11193 at commit [`2c00f29`](https://github.com/apache/spark/commit/2c00f29272051b8092b6a8a976392e32eeb5488b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968287 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala --- @@ -351,6 +350,20 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { } test("pow") { +testBinary(Pow, (d: Decimal, n: Byte) => d.pow(n), + (-5 to 5).map(v => (Decimal(v * 1.0), v.toByte))) --- End diff -- maybe `v.toDouble` is better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11204#discussion_r52968331 --- Diff: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g --- @@ -2320,6 +2320,19 @@ regularBody[boolean topLevel] ) | selectStatement[topLevel] + | + (LPAREN selectStatement[true]) => nestedSetOpSelectStatement[topLevel] + ; + +nestedSetOpSelectStatement[boolean topLevel] + : + ( + LPAREN s=selectStatement[topLevel] RPAREN -> {$s.tree} + ) + (set=setOpSelectStatement[$nestedSetOpSelectStatement.tree, topLevel]) --- End diff -- I made a little to support recursively nested UNION. I also updated the test. But it is basically the same approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12729 PhantomReferences to replace Final...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11140#issuecomment-184512589 No comment on the contents of this PR (since I haven't looked at them), but I did want to note that I think that the pull request description is a little thin here. Could you add a concise summary of the changes here, their impact on the code, and motivation for why we're doing this? This helps reviewers / readers know what to focus on and also helps future readers by allowing them to understand the gist of this change without having to read the entire JIRA / discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184513438 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968214 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala --- @@ -103,8 +103,7 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } else { domain.foreach { case (v1, v2) => -checkEvaluation(c(Literal(v1), Literal(v2)), f(v1 + 0.0, v2 + 0.0), EmptyRow) -checkEvaluation(c(Literal(v2), Literal(v1)), f(v2 + 0.0, v1 + 0.0), EmptyRow) +checkEvaluation(c(Literal(v1), Literal(v2)), f(v1, v2), EmptyRow) --- End diff -- keep the test of `f(v2, v1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12375] [ML] add handleinvalid for vecto...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10466#issuecomment-184511393 **[Test build #51342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51342/consoleFull)** for PR 10466 at commit [`6a0efed`](https://github.com/apache/spark/commit/6a0efede2b99a315895b1d3cccb9262ea845476c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12583][Mesos] Fix mesos shuffle service
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11207#issuecomment-184510940 No comment on the contents of this PR (since I haven't looked at it yet), but would you mind changing the PR to something more descriptive? As it stands now, "Fix Mesos shuffle service" is a lot less descriptive than, say, "Delete shuffle files after Mesos shuffle service exits" or something similar. Could you also edit the description to include a concise one-sentence description of the user-facing bug / symptom that this fixes? Right now this describes a lot of mechanism, but I feel like the description is a bit thin on context for newcomers who are trying to understand what this patch is doing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13018][Docs] Replace example code in ml...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11126 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not p...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11211#issuecomment-184509476 How do we handle this in Python 2? if we're running Python 2.x, do we currently propagate `PYTHONHASHSEED` to the worker? Also, how are we going to ensure that this change isn't accidentally rolled back? This seems subtle, so adding an explanatory paragraph comment into the source code near this line would make sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13097][ML] Binarizer allowing Double AN...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10976 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13018][Docs] Replace example code in ml...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/11126#discussion_r52967782 --- Diff: docs/mllib-pmml-model-export.md --- @@ -45,41 +45,12 @@ The table below outlines the `spark.mllib` models that can be exported to PMML a To export a supported `model` (see table above) to PMML, simply call `model.toPMML`. +As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats. --- End diff -- minor: please wrap lines at 100 chars --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13018][Docs] Replace example code in ml...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11126#issuecomment-184509386 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13308] ManagedBuffers passed to OneToOn...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11193#issuecomment-184509228 Thanks for the careful review, @zsxwing. I agree with your feedback and also think that it makes a lot more sense to have `convertToNetty()` increment the reference count. I've gone ahead and updated the patch to do this and have rolled back a confusing `retain()` call in the test code (which you pointed out earlier). Take a look at the `refCnt()` assertions that I added in the test suites to see whether they match up with what you had in mind. Note that as of today `convertToNetty` is only called in one place in `MessageEncoder` and the result of this is passed to `MessageWithHeader` alongside the message that the buffer came from, so it should be verify that `MessageWithHeader.deallocate()` will free all of the references. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13097][ML] Binarizer allowing Double AN...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10976#issuecomment-184509207 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11179#issuecomment-184508061 **[Test build #51339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51339/consoleFull)** for PR 11179 at commit [`e4707e7`](https://github.com/apache/spark/commit/e4707e775f34c0018f74451d048fb28a9c08ef48). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13013][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-184507173 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51338/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13013][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-184507169 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13013][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-184507162 **[Test build #51338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51338/consoleFull)** for PR 6 at commit [`8195cdf`](https://github.com/apache/spark/commit/8195cdf6052ad226b8102c2d40d2341d409596e1). * This patch **fails Python style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13013][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-184506760 **[Test build #51338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51338/consoleFull)** for PR 6 at commit [`8195cdf`](https://github.com/apache/spark/commit/8195cdf6052ad226b8102c2d40d2341d409596e1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13310] [SQL] Resolve Missing Sorting Co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11198#issuecomment-184501312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51332/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13310] [SQL] Resolve Missing Sorting Co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11198#issuecomment-184501309 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13310] [SQL] Resolve Missing Sorting Co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11198#issuecomment-184501104 **[Test build #51332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51332/consoleFull)** for PR 11198 at commit [`49a2d6e`](https://github.com/apache/spark/commit/49a2d6e8c153609901ed79035cd1abe236f1d39c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13013][Docs] Replace example code in ml...
Github user keypointt commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r52966146 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -18,141 +18,42 @@ // scalastyle:off println package org.apache.spark.examples.mllib -import org.apache.log4j.{Level, Logger} -import scopt.OptionParser - import org.apache.spark.{SparkConf, SparkContext} -import org.apache.spark.mllib.clustering.PowerIterationClustering -import org.apache.spark.rdd.RDD +// $example on$ +import org.apache.spark.mllib.clustering.{PowerIterationClustering, PowerIterationClusteringModel} +// $example off$ -/** --- End diff -- @yinxusen could you please explain more how to reuse? previous examples are quite different from what is shown insdie {highlight} block --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13329] [SQL] considering output for sta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11210#issuecomment-184497912 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51336/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13329] [SQL] considering output for sta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11210#issuecomment-184497911 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12966][SQL] ArrayType(DecimalType) supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10928#issuecomment-184497847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51330/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13329] [SQL] considering output for sta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11210#issuecomment-184497831 **[Test build #51336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51336/consoleFull)** for PR 11210 at commit [`2738737`](https://github.com/apache/spark/commit/273873753fb97721864ee9e85d9dc9f16edab8ce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12966][SQL] ArrayType(DecimalType) supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10928#issuecomment-184497845 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12966][SQL] ArrayType(DecimalType) supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10928#issuecomment-184497738 **[Test build #51330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51330/consoleFull)** for PR 10928 at commit [`68952b6`](https://github.com/apache/spark/commit/68952b65c65aebfe6bc5a41a80518b8fc2288c8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-184495970 **[Test build #51337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51337/consoleFull)** for PR 10757 at commit [`21c94d2`](https://github.com/apache/spark/commit/21c94d2224609ce3171e62c7cb58ee64cca683e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13237] [SQL] generated broadcast outer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11130#issuecomment-184495758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51333/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13237] [SQL] generated broadcast outer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11130#issuecomment-184495757 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13237] [SQL] generated broadcast outer ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11130#issuecomment-184495643 **[Test build #51333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51333/consoleFull)** for PR 11130 at commit [`5744941`](https://github.com/apache/spark/commit/5744941063ba05b07e4a7265277162c331a9c48c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow
Github user yucai commented on the pull request: https://github.com/apache/spark/pull/11212#issuecomment-184495551 @adrian-wang could you help review? Much thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13329] [SQL] considering output for sta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11210#issuecomment-184492883 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51334/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org