[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15843 **[Test build #68513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68513/consoleFull)** for PR 15843 at commit [`3d858a2`](https://github.com/apache/spark/commit/3d858a2326809b7e1c679b712d84a8a21767d13c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15704 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #68512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68512/consoleFull)** for PR 15704 at commit [`a815df9`](https://github.com/apache/spark/commit/a815df9b9eb840d410565f13f89e899204cab341). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15704 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68512/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15704#discussion_r87549789 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -225,6 +226,102 @@ class HiveDDLSuite } } + test("SPARK-17732: Drop partitions by filter") { +withTable("sales") { + sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") + + for (country <- Seq("US", "CA", "KR")) { +for (quarter <- 1 to 4) { + sql(s"ALTER TABLE sales ADD PARTITION (country='$country', quarter='$quarter')") +} + } + + sql("ALTER TABLE sales DROP PARTITION (country < 'KR', quarter > '2')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=CA/quarter=1") :: +Row("country=CA/quarter=2") :: +Row("country=KR/quarter=1") :: +Row("country=KR/quarter=2") :: +Row("country=KR/quarter=3") :: +Row("country=KR/quarter=4") :: +Row("country=US/quarter=1") :: +Row("country=US/quarter=2") :: +Row("country=US/quarter=3") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (country < 'KR'), PARTITION (quarter <= '1')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=2") :: +Row("country=KR/quarter=3") :: +Row("country=KR/quarter=4") :: +Row("country=US/quarter=2") :: +Row("country=US/quarter=3") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (country='KR', quarter='4')") + sql("ALTER TABLE sales DROP PARTITION (country='US', quarter='3')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=2") :: +Row("country=KR/quarter=3") :: +Row("country=US/quarter=2") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (quarter <= 2), PARTITION (quarter >= '4')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=3") :: Nil) + + val m = intercept[AnalysisException] { +sql("ALTER TABLE sales DROP PARTITION (quarter <= 4), PARTITION (quarter <= '2')") + }.getMessage + // `PARTITION (quarter <= '2')` should raises exceptions because `PARTITION (quarter <= 4)` + // already removes all partitions. --- End diff -- As we have discussed before, this behavior may not the same as ALTER TABLE DROP PARTITION with only equal to spec. Should we make them consistent? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley looks good, merged ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #68512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68512/consoleFull)** for PR 15704 at commit [`a815df9`](https://github.com/apache/spark/commit/a815df9b9eb840d410565f13f89e899204cab341). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 You're right! It's another bug: copy should be implemented in JavaParams, not JavaModel. I'm sending this PR to fix that: https://github.com/techaddict/spark/pull/1 Can you please check it out and merge it into your PR if it looks OK to you? All pyspark.ml tests ran successfully with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15704#discussion_r87549043 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -226,6 +227,63 @@ class HiveDDLSuite } } + test("SPARK-17732: Drop partitions by filter") { +withTable("sales") { + sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") + + for (country <- Seq("US", "CA", "KR")) { +for (quarter <- 1 to 4) { + sql(s"ALTER TABLE sales ADD PARTITION (country='$country', quarter='$quarter')") +} + } + + sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=1") :: +Row("country=KR/quarter=2") :: +Row("country=KR/quarter=3") :: +Row("country=KR/quarter=4") :: +Row("country=US/quarter=1") :: +Row("country=US/quarter=2") :: +Row("country=US/quarter=3") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (quarter <= '2')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=3") :: +Row("country=KR/quarter=4") :: +Row("country=US/quarter=3") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (country='KR', quarter='4')") + sql("ALTER TABLE sales DROP PARTITION (country='US', quarter='3')") --- End diff -- I added the case by updating the existing testcases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15850: [SPARK-18411] [SQL] Add Argument Types and Test Cases fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15850 **[Test build #68511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68511/consoleFull)** for PR 15850 at commit [`c02b10d`](https://github.com/apache/spark/commit/c02b10d21d3d1ebaddd93c58112e67dc7ef0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15850: [SPARK-18411] [SQL] Add Argument Types and Test C...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15850 [SPARK-18411] [SQL] Add Argument Types and Test Cases for String Functions [WIP] ### What changes were proposed in this pull request? Add argument types and test cases into the extended descriptions of string functions. ### How was this patch tested? Added test cases to verify the added argument types. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark addArgument4StringExpressions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15850.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15850 commit c02b10d21d3d1ebaddd93c58112e67dc7ef0 Author: gatorsmile Date: 2016-11-11T07:29:16Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/15593#discussion_r87547519 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1486,57 +1504,65 @@ private class LogisticAggregator( var marginOfLabel = 0.0 var maxMargin = Double.NegativeInfinity -val margins = Array.tabulate(numClasses) { i => - var margin = 0.0 - features.foreachActive { (index, value) => -if (localFeaturesStd(index) != 0.0 && value != 0.0) { - margin += localCoefficients(i * numFeaturesPlusIntercept + index) * -value / localFeaturesStd(index) -} +val margins = new Array[Double](numClasses) +features.foreachActive { (index, value) => + val stdValue = value / localFeaturesStd(index) + var j = 0 + while (j < numClasses) { +margins(j) += localCoefficients(index * numClasses + j) * stdValue +j += 1 } - +} +var i = 0 +while (i < numClasses) { if (fitIntercept) { -margin += localCoefficients(i * numFeaturesPlusIntercept + numFeatures) +margins(i) += localCoefficients(numClasses * numFeatures + i) } - if (i == label.toInt) marginOfLabel = margin - if (margin > maxMargin) { -maxMargin = margin + if (i == label.toInt) marginOfLabel = margins(i) + if (margins(i) > maxMargin) { +maxMargin = margins(i) } - margin + i += 1 } /** * When maxMargin > 0, the original formula could cause overflow. * We address this by subtracting maxMargin from all the margins, so it's guaranteed * that all of the new margins will be smaller than zero to prevent arithmetic overflow. */ +val multipliers = new Array[Double](numClasses) val sum = { var temp = 0.0 - if (maxMargin > 0) { -for (i <- 0 until numClasses) { - margins(i) -= maxMargin - temp += math.exp(margins(i)) -} - } else { -for (i <- 0 until numClasses) { - temp += math.exp(margins(i)) -} + var i = 0 + while (i < numClasses) { +if (maxMargin > 0) margins(i) -= maxMargin +val exp = math.exp(margins(i)) +temp += exp +multipliers(i) = exp +i += 1 } temp } -for (i <- 0 until numClasses) { - val multiplier = math.exp(margins(i)) / sum - { -if (label == i) 1.0 else 0.0 - } - features.foreachActive { (index, value) => -if (localFeaturesStd(index) != 0.0 && value != 0.0) { - localGradientArray(i * numFeaturesPlusIntercept + index) += -weight * multiplier * value / localFeaturesStd(index) +margins.indices.foreach { i => + multipliers(i) = multipliers(i) / sum - (if (label == i) 1.0 else 0.0) +} +features.foreachActive { (index, value) => + if (localFeaturesStd(index) != 0.0 && value != 0.0) { +val stdValue = value / localFeaturesStd(index) +var j = 0 +while (j < numClasses) { + localGradientArray(index * numClasses + j) += +weight * multipliers(j) * stdValue + j += 1 } } - if (fitIntercept) { -localGradientArray(i * numFeaturesPlusIntercept + numFeatures) += weight * multiplier +} +if (fitIntercept) { + var i = 0 + while (i < numClasses) { +localGradientArray(numFeatures * numClasses + i) += weight * multipliers(i) +i += 1 } } --- End diff -- I'm not sure I fully get where you intend to use `foreachActive` over the gradient matrix? Maybe it's the location of this comment that is confusing me ... ... but here in `multinomialUpdateInPlace`, we are iterating over features using `foreachActive`, then for each feature iterating over `numClasses`. If we iterate over the gradient using `foreachActive` how will that work? Won't it be super inefficient? Perhaps I am missing something about what you intend, could you clarify with an example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additio
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68510/consoleFull)** for PR 15849 at commit [`82a4487`](https://github.com/apache/spark/commit/82a4487e5f7d4fe7f7a375cbdf86554882bcdf59). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68510/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68510/consoleFull)** for PR 15849 at commit [`82a4487`](https://github.com/apache/spark/commit/82a4487e5f7d4fe7f7a375cbdf86554882bcdf59). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15849: [SPARK-18410][STREAMING] Add structured kafka exa...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/15849 [SPARK-18410][STREAMING] Add structured kafka example ## What changes were proposed in this pull request? This PR provides structured kafka wordcount examples ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-18410 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15849.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15849 commit 962370cd84aace15b17f8ac58d285a2840def3c4 Author: genmao.ygm Date: 2016-11-11T06:42:54Z SPARK-18410: Add structured kafka example commit 4f83a1f8559eb038b801b738fdfd90ce003acd92 Author: genmao.ygm Date: 2016-11-11T07:16:35Z update commit 82a4487e5f7d4fe7f7a375cbdf86554882bcdf59 Author: genmao.ygm Date: 2016-11-11T07:19:23Z update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r87546426 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { -val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) +// No codegen version expects GenericArrayData for array expect Binarytype +val catalystValue = expected match { + case arr: Array[Byte] if expression.dataType == BinaryType => arr + case arr: Array[_] => new GenericArrayData(arr.map(CatalystTypeConverters.convertToCatalyst)) --- End diff -- I think that you are right. This workaround was for previous versions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15704#discussion_r87546041 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -226,6 +227,63 @@ class HiveDDLSuite } } + test("SPARK-17732: Drop partitions by filter") { +withTable("sales") { + sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") + + for (country <- Seq("US", "CA", "KR")) { +for (quarter <- 1 to 4) { + sql(s"ALTER TABLE sales ADD PARTITION (country='$country', quarter='$quarter')") +} + } + + sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=1") :: +Row("country=KR/quarter=2") :: +Row("country=KR/quarter=3") :: +Row("country=KR/quarter=4") :: +Row("country=US/quarter=1") :: +Row("country=US/quarter=2") :: +Row("country=US/quarter=3") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (quarter <= '2')") + checkAnswer(sql("SHOW PARTITIONS sales"), +Row("country=KR/quarter=3") :: +Row("country=KR/quarter=4") :: +Row("country=US/quarter=3") :: +Row("country=US/quarter=4") :: Nil) + + sql("ALTER TABLE sales DROP PARTITION (country='KR', quarter='4')") + sql("ALTER TABLE sales DROP PARTITION (country='US', quarter='3')") --- End diff -- To add that, we should make another testcases because the remaining partitions are not enough to test that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley yes I did it for `JavaWrapper` first, but try running tests with it gives https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68478/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 Thanks a lot for finding & reporting this! The fix should probably go in JavaWrapper, not JavaModel, right? I tested this manually (in JavaWrapper), and it seems to fix the problematic case with StringIndexer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15840 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15840 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68509/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15840 **[Test build #68509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68509/consoleFull)** for PR 15840 at commit [`deffccd`](https://github.com/apache/spark/commit/deffccdf2ef314acf3de94c4fbd33655e65e24e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15704 Thank you, @viirya ! I also feel like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15847: [SPARK-18387] [SQL] Add serialization to checkEva...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15847#discussion_r87541254 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1431,43 +1431,49 @@ case class FormatNumber(x: Expression, d: Expression) // Associated with the pattern, for the last d value, and we will update the // pattern (DecimalFormat) once the new coming d value differ with the last one. + // This is an Option to distinguish between 0 (numberFormat is valid) and uninitialized after + // serialization (numberFormat has not been updated for dValue = 0). @transient - private var lastDValue: Int = -100 + private var lastDValue: Option[Int] = None --- End diff -- Actually you can do this via a lazy val too, which is just ``` @transient private lazy var lastDValue: Int = -100 ``` then I believe it gets initialized to -100 after deserialization automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r87538921 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { -val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) +// No codegen version expects GenericArrayData for array expect Binarytype +val catalystValue = expected match { + case arr: Array[Byte] if expression.dataType == BinaryType => arr + case arr: Array[_] => new GenericArrayData(arr.map(CatalystTypeConverters.convertToCatalyst)) --- End diff -- I think they are the same here, isn't? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15704 The changes to parsing looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15790: [SPARK-18264][SPARKR] build vignettes with package, upda...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15790 So one proposal I was thinking of is to just check in a built version of the vignette in to the source tree. That way the release packaging wouldn't need to change. The only thing to keep in mind is that whenever we update the vignette we will need to rebuild it. Thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13706 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68506/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13706 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13706 **[Test build #68506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68506/consoleFull)** for PR 13706 at commit [`e895a9c`](https://github.com/apache/spark/commit/e895a9c7b89d2a53f6747f1e7fa08f8e97b80ed4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/15840 @hvanhovell @kiszk I tried to use `CodegenContext.nullSafeExec()` in `MapObjects` as an example. If you can bear with this for now, I'll apply it to other places to generate nullability checking codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15840 **[Test build #68509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68509/consoleFull)** for PR 15840 at commit [`deffccd`](https://github.com/apache/spark/commit/deffccdf2ef314acf3de94c4fbd33655e65e24e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13706 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13706 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68505/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13706 **[Test build #68505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68505/consoleFull)** for PR 13706 at commit [`fb8b57a`](https://github.com/apache/spark/commit/fb8b57a4d46f6856dc2c883c6e995c248dda6a3b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68507/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #68507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68507/consoleFull)** for PR 15172 at commit [`6863efe`](https://github.com/apache/spark/commit/6863efe77118f91c0f849d34d4698dad608213b1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15704 Thank you, @hvanhovell . Now, this PR becomes much concise due to your advice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68504/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15702 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15702 **[Test build #68504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68504/consoleFull)** for PR 15702 at commit [`de601bb`](https://github.com/apache/spark/commit/de601bb4fdb9e5a45bddefa31de38bbb7fc2570f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15659 Also like if there is any testing or anything I can do or coordinate to get done that would help y'all feel comfortable with this please let me know :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15704 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68503/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #68503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68503/consoleFull)** for PR 15704 at commit [`c9e7c06`](https://github.com/apache/spark/commit/c9e7c069b9a5c429a2dd73d3f542ddd045e3b876). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15742: [SPARK-16808][Core] History Server main page does not ho...
Github user mariobriggs commented on the issue: https://github.com/apache/spark/pull/15742 ok. will look into that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15820 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15820 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68508/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15820 **[Test build #68508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68508/consoleFull)** for PR 15820 at commit [`3aa9d7e`](https://github.com/apache/spark/commit/3aa9d7e6ebbf4b0362f9ce58d97012dd5be96bce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15847: [SPARK-18387] [SQL] Add serialization to checkEvaluation...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68502/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15847: [SPARK-18387] [SQL] Add serialization to checkEvaluation...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15847 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15847: [SPARK-18387] [SQL] Add serialization to checkEvaluation...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15847 **[Test build #68502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68502/consoleFull)** for PR 15847 at commit [`8e829ae`](https://github.com/apache/spark/commit/8e829ae87b197de2ff4b8777202a47d5f1204c56). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...
Github user skanjila commented on the issue: https://github.com/apache/spark/pull/15848 Also I really would love to avoid having to rebase yet again due to more needed commits :) to this pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...
Github user skanjila commented on the issue: https://github.com/apache/spark/pull/15848 Yes to your first comment on PageViewStream, for the other one TestSQLContext it was unfortunately missed, I'lol add that with the next pull request which will also contain the code fixes to the Python unit tests, is it ok to merge this one without the things you mention. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15800: [SPARK-18334] MinHash should use binary hash dist...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15800#discussion_r87528327 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -76,7 +72,19 @@ class MinHashModel private[ml] ( @Since("2.1.0") override protected[ml] def hashDistance(x: Vector, y: Vector): Double = { // Since it's generated by hashing, it will be a pair of dense vectors. -x.toDense.values.zip(y.toDense.values).map(pair => math.abs(pair._1 - pair._2)).min +if (x.toDense.values.zip(y.toDense.values).exists(pair => pair._1 == pair._2)) { --- End diff -- I think I do more agree on the comment from @jkbradley at https://github.com/apache/spark/pull/15800#issuecomment-259298082, if I understand correctly some terms here. Is the indicator meaning a matching hashing value between two vectors from one hashing function, i.e., h_i? If this understanding is correct, I think averaging indicators should be the right way to compute MinHash's hash distance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87527804 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case class CachedKafkaConsumer private( record } + /** + * Get the record at the `offset`. If it doesn't exist, try to get the earliest record in + * `[offset, untilOffset)`. + */ + def getAndIgnoreLostData( + offset: Long, + untilOffset: Long, + pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = { +// scalastyle:off +// When `failOnDataLoss` is `false`, we need to handle the following cases (note: untilOffset and latestOffset are exclusive): +// 1. Some data are aged out, and `offset < beginningOffset <= untilOffset - 1 <= latestOffset - 1` +// Seek to the beginningOffset and fetch the data. +// 2. Some data are aged out, and `offset <= untilOffset - 1 < beginningOffset`. +// There is nothing to fetch, return null. +// 3. The topic is deleted. +// There is nothing to fetch, return null. +// 4. The topic is deleted and recreated, and `beginningOffset <= offset <= untilOffset - 1 <= latestOffset - 1`. +// We cannot detect this case. We can still fetch data like nothing happens. +// 5. The topic is deleted and recreated, and `beginningOffset <= offset < latestOffset - 1 < untilOffset - 1`. +// Same as 4. +// 6. The topic is deleted and recreated, and `beginningOffset <= latestOffset - 1 < offset <= untilOffset - 1`. +// There is nothing to fetch, return null. +// 7. The topic is deleted and recreated, and `offset < beginningOffset <= untilOffset - 1`. +// Same as 1. +// 8. The topic is deleted and recreated, and `offset <= untilOffset - 1 < beginningOffset`. +// There is nothing to fetch, return null. +// scalastyle:on +require(offset < untilOffset, s"offset: $offset, untilOffset: $untilOffset") +logDebug(s"Get $groupId $topicPartition nextOffset $nextOffsetInFetchedData requested $offset") +try { + if (offset != nextOffsetInFetchedData) { +logInfo(s"Initial fetch for $topicPartition $offset") +seek(offset) +poll(pollTimeoutMs) + } else if (!fetchedData.hasNext()) { +// The last pre-fetched data has been drained. +poll(pollTimeoutMs) + } + getRecordFromFetchedData(offset, untilOffset) +} catch { + case e: OffsetOutOfRangeException => +logWarning(s"Cannot fetch offset $offset, try to recover from the beginning offset", e) +advanceToBeginningOffsetAndFetch(offset, untilOffset, pollTimeoutMs) +} + } + + /** + * Try to advance to the beginning offset and fetch again. `beginningOffset` should be in + * `[offset, untilOffset]`. If not, it will try to fetch `offset` again if it's in + * `[beginningOffset, latestOffset)`. Otherwise, it will return null and reset the pre-fetched + * data. + */ + private def advanceToBeginningOffsetAndFetch( + offset: Long, + untilOffset: Long, + pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = { +val beginningOffset = getBeginningOffset() +if (beginningOffset <= offset) { + val latestOffset = getLatestOffset() + if (latestOffset <= offset) { +// beginningOffset <= latestOffset - 1 < offset <= untilOffset - 1 +logWarning(s"Offset ${offset} is later than the latest offset $latestOffset. " + + s"Skipped [$offset, $untilOffset)") +reset() +null + } else { +// beginningOffset <= offset <= min(latestOffset - 1, untilOffset - 1) +getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs) + } +} else { + if (beginningOffset >= untilOffset) { +// offset <= untilOffset - 1 < beginningOffset +logWarning(s"Buffer miss for $groupId $topicPartition [$offset, $untilOffset)") +reset() +null + } else { +// offset < beginningOffset <= untilOffset - 1 +logWarning(s"Buffer miss for $groupId $topicPartition [$offset, $beginningOffset)") +getAndIgnoreLostData(beginningOffset, untilOffset, pollTimeoutMs) + } +} + } + + /** + * Get the earliest record in [offset, untilOffset) from the fetched data. If there is no such + * record, returns null. Must be called after `poll`. + */ + private def getRecordFromFetchedData( + offset: Long, + untilOffset: Long): Co
[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15820 **[Test build #68508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68508/consoleFull)** for PR 15820 at commit [`3aa9d7e`](https://github.com/apache/spark/commit/3aa9d7e6ebbf4b0362f9ce58d97012dd5be96bce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 cc: @jkbradley @davies @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15817 cc: @sethah @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15848 Just to be clear, from the discussion in the JIRA, it seems `PageViewStream` example is missed intendedly here for now because changing from `local[2]` to `local[4]` is failed for an unknown reason? And.. it seems ``` ./sql/core/src/test/scala/org/apache/spark/sql/test/TestSQLContext.scala: this(new SparkContext("local[2]", "test-sql-context", ``` is missed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r87525792 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { -val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) +// No codegen version expects GenericArrayData for array expect Binarytype +val catalystValue = expected match { + case arr: Array[Byte] if expression.dataType == BinaryType => arr --- End diff -- I don't see `convertToCatalyst` do special case for `BinaryType` before. Why we need to do that now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r87525740 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { -val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) +// No codegen version expects GenericArrayData for array expect Binarytype +val catalystValue = expected match { + case arr: Array[Byte] if expression.dataType == BinaryType => arr + case arr: Array[_] => new GenericArrayData(arr.map(CatalystTypeConverters.convertToCatalyst)) --- End diff -- Doesn't this line do the same what `CatalystTypeConverters.convertToCatalyst` does? I don't see `convertToCatalyst` do special case for `BinaryType` before. Why we need to do that now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #68507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68507/consoleFull)** for PR 15172 at commit [`6863efe`](https://github.com/apache/spark/commit/6863efe77118f91c0f849d34d4698dad608213b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15172 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68499/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #68499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68499/consoleFull)** for PR 15172 at commit [`6863efe`](https://github.com/apache/spark/commit/6863efe77118f91c0f849d34d4698dad608213b1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15742: [SPARK-16808][Core] History Server main page does not ho...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15742 It doesn't merge cleanly into branch-2.0. You can file a separate PR if you want it fixed there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15742: [SPARK-16808][Core] History Server main page does not ho...
Github user mariobriggs commented on the issue: https://github.com/apache/spark/pull/15742 @vanzin since this a regression bux in 2.0, any particular reason it is merged only to 2.1 . I believe this should be in 2.0.2/3 as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13706 **[Test build #68506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68506/consoleFull)** for PR 13706 at commit [`e895a9c`](https://github.com/apache/spark/commit/e895a9c7b89d2a53f6747f1e7fa08f8e97b80ed4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15652: [SPARK-16987] [None] Add spark-default.conf prope...
Github user hayashidac commented on a diff in the pull request: https://github.com/apache/spark/pull/15652#discussion_r87522936 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -85,8 +85,10 @@ private[ml] trait DecisionTreeParams extends PredictorParams * (default = 256 MB) * @group expertParam */ - final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB", -"Maximum memory in MB allocated to histogram aggregation.", + final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB", "Maximum memory in MB" + --- End diff -- I rebased unrelated changes. Please confirm it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15790: [SPARK-18264][SPARKR] build vignettes with package, upda...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15790 I think https://github.com/apache/spark/blob/master/dev/make-distribution.sh should change too but I'm not 100% how the R package is built there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13706 **[Test build #68505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68505/consoleFull)** for PR 13706 at commit [`fb8b57a`](https://github.com/apache/spark/commit/fb8b57a4d46f6856dc2c883c6e995c248dda6a3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15846: [CORE][Minor]:remove unused import in SparkContext.scala
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15846 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15846: [CORE][Minor]:remove unused import in SparkContext.scala
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15846 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68494/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15846: [CORE][Minor]:remove unused import in SparkContext.scala
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15846 **[Test build #68494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68494/consoleFull)** for PR 15846 at commit [`f14fba5`](https://github.com/apache/spark/commit/f14fba5bd676f48ab4936a8181f48253b7cbfc40). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user lianhuiwang commented on the issue: https://github.com/apache/spark/pull/13706 @hvanhovell I have updated this PR. Can you take a look? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68501/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11122 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11122 **[Test build #68501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68501/consoleFull)** for PR 11122 at commit [`fe68b6c`](https://github.com/apache/spark/commit/fe68b6c03300a37799ccaad2ee554bde005c8f6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15842: [SPARK-18401][SPARKR][ML] SparkR random forest sh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15842 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14612 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68495/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15842: [SPARK-18401][SPARKR][ML] SparkR random forest should su...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15842 Merged into master and branch-2.1. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14612 **[Test build #68495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68495/consoleFull)** for PR 14612 at commit [`3035cfe`](https://github.com/apache/spark/commit/3035cfe6c05eb74c36412c833c89864bc2126a63). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15814 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15702 **[Test build #68504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68504/consoleFull)** for PR 15702 at commit [`de601bb`](https://github.com/apache/spark/commit/de601bb4fdb9e5a45bddefa31de38bbb7fc2570f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15814 Merging in master/branch-2.1. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15847: [SPARK-18387] [SQL] Add serialization to checkEva...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15847#discussion_r87519874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String * @param name The short name of the function */ abstract class LeafMathExpression(c: Double, name: String) - extends LeafExpression with CodegenFallback { + extends LeafExpression with CodegenFallback with Serializable { --- End diff -- does this actually matter? all expressions should be case classes, which means they are serializable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15847: [SPARK-18387] [SQL] Add serialization to checkEva...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15847#discussion_r87519832 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1431,43 +1431,49 @@ case class FormatNumber(x: Expression, d: Expression) // Associated with the pattern, for the last d value, and we will update the // pattern (DecimalFormat) once the new coming d value differ with the last one. + // This is an Option to distinguish between 0 (numberFormat is valid) and uninitialized after + // serialization (numberFormat has not been updated for dValue = 0). @transient - private var lastDValue: Int = -100 + private var lastDValue: Option[Int] = None --- End diff -- any perf penalty to do it this way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15702 jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15702 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68496/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15702 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15702 **[Test build #68496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68496/consoleFull)** for PR 15702 at commit [`de601bb`](https://github.com/apache/spark/commit/de601bb4fdb9e5a45bddefa31de38bbb7fc2570f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #68503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68503/consoleFull)** for PR 15704 at commit [`c9e7c06`](https://github.com/apache/spark/commit/c9e7c069b9a5c429a2dd73d3f542ddd045e3b876). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15593#discussion_r87518789 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1486,57 +1504,65 @@ private class LogisticAggregator( var marginOfLabel = 0.0 var maxMargin = Double.NegativeInfinity -val margins = Array.tabulate(numClasses) { i => - var margin = 0.0 - features.foreachActive { (index, value) => -if (localFeaturesStd(index) != 0.0 && value != 0.0) { - margin += localCoefficients(i * numFeaturesPlusIntercept + index) * -value / localFeaturesStd(index) -} +val margins = new Array[Double](numClasses) +features.foreachActive { (index, value) => + val stdValue = value / localFeaturesStd(index) + var j = 0 + while (j < numClasses) { +margins(j) += localCoefficients(index * numClasses + j) * stdValue +j += 1 } - +} +var i = 0 +while (i < numClasses) { if (fitIntercept) { -margin += localCoefficients(i * numFeaturesPlusIntercept + numFeatures) +margins(i) += localCoefficients(numClasses * numFeatures + i) } - if (i == label.toInt) marginOfLabel = margin - if (margin > maxMargin) { -maxMargin = margin + if (i == label.toInt) marginOfLabel = margins(i) + if (margins(i) > maxMargin) { +maxMargin = margins(i) } - margin + i += 1 } /** * When maxMargin > 0, the original formula could cause overflow. * We address this by subtracting maxMargin from all the margins, so it's guaranteed * that all of the new margins will be smaller than zero to prevent arithmetic overflow. */ +val multipliers = new Array[Double](numClasses) val sum = { var temp = 0.0 - if (maxMargin > 0) { -for (i <- 0 until numClasses) { - margins(i) -= maxMargin - temp += math.exp(margins(i)) -} - } else { -for (i <- 0 until numClasses) { - temp += math.exp(margins(i)) -} + var i = 0 + while (i < numClasses) { +if (maxMargin > 0) margins(i) -= maxMargin +val exp = math.exp(margins(i)) +temp += exp +multipliers(i) = exp +i += 1 } temp } -for (i <- 0 until numClasses) { - val multiplier = math.exp(margins(i)) / sum - { -if (label == i) 1.0 else 0.0 - } - features.foreachActive { (index, value) => -if (localFeaturesStd(index) != 0.0 && value != 0.0) { - localGradientArray(i * numFeaturesPlusIntercept + index) += -weight * multiplier * value / localFeaturesStd(index) +margins.indices.foreach { i => + multipliers(i) = multipliers(i) / sum - (if (label == i) 1.0 else 0.0) +} +features.foreachActive { (index, value) => + if (localFeaturesStd(index) != 0.0 && value != 0.0) { +val stdValue = value / localFeaturesStd(index) +var j = 0 +while (j < numClasses) { + localGradientArray(index * numClasses + j) += +weight * multipliers(j) * stdValue + j += 1 } } - if (fitIntercept) { -localGradientArray(i * numFeaturesPlusIntercept + numFeatures) += weight * multiplier +} +if (fitIntercept) { + var i = 0 + while (i < numClasses) { +localGradientArray(numFeatures * numClasses + i) += weight * multipliers(i) +i += 1 } } --- End diff -- You can make `def gradient: Vector` returning `Matrix`, and for MLOR, the implementation can be col major matrix so when we use `foreachActive` we don't need to worry about the underlying implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15848 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org