[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21009 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21009 Jenkins, add to whitelist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89076/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20952 **[Test build #89076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89076/testReport)** for PR 20952 at commit [`3a5563b`](https://github.com/apache/spark/commit/3a5563bec7f5f4063b1d32c0af653b13b54186c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21007 cc @cloud-fan and @viirya (from checking the history). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/20989 @foxish The failure seems to be because of `SparkRemoteFileTest` missing from branch-2.3 (only present in master branch). Which branch do you recommend targeting this PR towards? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2116/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21001 **[Test build #89080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89080/testReport)** for PR 21001 at commit [`3f3391b`](https://github.com/apache/spark/commit/3f3391b28a7d1c8fed1df195d6b1a0e27f0a60c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89079/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21015 **[Test build #89079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89079/testReport)** for PR 21015 at commit [`9f16ea6`](https://github.com/apache/spark/commit/9f16ea6f15572a8d189fe537844487abdea797b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21005 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89073/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21005 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21005 **[Test build #89073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89073/testReport)** for PR 21005 at commit [`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89074/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20925 **[Test build #89074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89074/testReport)** for PR 20925 at commit [`262bad8`](https://github.com/apache/spark/commit/262bad88a6d4d6c2513d6da3b2b52e86cd3f5b70). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21015 **[Test build #89079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89079/testReport)** for PR 21015 at commit [`9f16ea6`](https://github.com/apache/spark/commit/9f16ea6f15572a8d189fe537844487abdea797b4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21015 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/20992 @kiszk I did not look at the PR in detail - was curious about the commit saying 'address review comment'; but did not see any comments. Were they removed by submitter ? Or is github acting up ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] speculative task should not run on a...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/20998 Adding isRunning can cause a single 'bad' node (from task pov - not necessarily only bad hardware: just that task fails on node) can keep tasks to fail repeatedly causing app to exit. Particularly with blacklist'ing, I am not very sure how the interactions will play out .. @squito might have more comments. In general, this is not a benign change imo and can have non trivial side effects. In the specific usecase of only two machines, it is an unfortunate side effect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21016: [SPARK-23947][SQL] Add hashUTF8String convenience method...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21016 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21016: [SPARK-23947][SQL] Add hashUTF8String convenience method...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21016 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2115/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20043 Thanks! @cloud-fan @hvanhovell @kiszk @HyukjinKwon @mgaido91 @maropu @rednaxelafx @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21016: [SPARK-23947][SQL] Add hashUTF8String convenience method...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21016 **[Test build #89078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89078/testReport)** for PR 21016 at commit [`ead91a4`](https://github.com/apache/spark/commit/ead91a453fb6a3f92255d14d37c1d4a6beac22fd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21016: [SPARK-23947][SQL] Add hashUTF8String convenience...
GitHub user rednaxelafx opened a pull request: https://github.com/apache/spark/pull/21016 [SPARK-23947][SQL] Add hashUTF8String convenience method to hasher classes ## What changes were proposed in this pull request? Add `hashUTF8String()` to the hasher classes to allow Spark SQL codegen to generate cleaner code for hashing `UTF8String`s. No change in behavior otherwise. Although with the introduction of SPARK-10399, the code size for hashing `UTF8String` is already smaller, it's still good to extract a separate function in the hasher classes so that the generated code can stay clean. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rednaxelafx/apache-spark hashutf8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21016.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21016 commit ead91a453fb6a3f92255d14d37c1d4a6beac22fd Author: Kris Mok Date: 2018-04-09T22:30:03Z SPARK-23947 - Add hashUTF8String convenience method to hasher classes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21015 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21015 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21015: [SPARK-23944][ML] Add the set method for the two ...
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21015 [SPARK-23944][ML] Add the set method for the two LSHModel ## What changes were proposed in this pull request? Add two set method for LSHModel in LSH.scala, BucketedRandomProjectionLSH.scala, and MinHashLSH.scala ## How was this patch tested? New test for the param setup was added into - BucketedRandomProjectionLSHSuite.scala - MinHashLSHSuite.scala Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ludatabricks/spark-1 SPARK-23944 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21015.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21015 commit 9f16ea6f15572a8d189fe537844487abdea797b4 Author: Lu WANG Date: 2018-04-09T21:56:48Z Add the set method for two LSHModels --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 SGTM on divisor. Do we need "full" there in the config? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89077/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21014 **[Test build #89077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89077/testReport)** for PR 21014 at commit [`fb078eb`](https://github.com/apache/spark/commit/fb078eb5b58dc27e4a59efb14ce740f625681bf0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r180228711 --- Diff: python/pyspark/ml/stat.py --- @@ -127,13 +113,86 @@ class Correlation(object): def corr(dataset, column, method="pearson"): """ Compute the correlation matrix with specified method using dataset. + +:param dataset: + A Dataset or a DataFrame. +:param column: + The name of the column of vectors for which the correlation coefficient needs + to be computed. This must be a column of the dataset, and it must contain + Vector objects. +:param method: + String specifying the method to use for computing correlation. + Supported: `pearson` (default), `spearman`. +:return: + A DataFrame that contains the correlation matrix of the column of vectors. This + DataFrame contains a single row and a single column of name + '$METHODNAME($COLUMN)'. """ sc = SparkContext._active_spark_context javaCorrObj = _jvm().org.apache.spark.ml.stat.Correlation args = [_py2java(sc, arg) for arg in (dataset, column, method)] return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): +""" +.. note:: Experimental + +Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous +distribution. + +By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. + +>>> from pyspark.ml.stat import KolmogorovSmirnovTest --- End diff -- Thanks for moving the method-specific documentation. These doctests are method-specific too, though, so can you please move them as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r180245120 --- Diff: python/pyspark/ml/stat.py --- @@ -127,13 +113,86 @@ class Correlation(object): def corr(dataset, column, method="pearson"): """ Compute the correlation matrix with specified method using dataset. + +:param dataset: + A Dataset or a DataFrame. +:param column: + The name of the column of vectors for which the correlation coefficient needs + to be computed. This must be a column of the dataset, and it must contain + Vector objects. +:param method: + String specifying the method to use for computing correlation. + Supported: `pearson` (default), `spearman`. +:return: + A DataFrame that contains the correlation matrix of the column of vectors. This + DataFrame contains a single row and a single column of name + '$METHODNAME($COLUMN)'. """ sc = SparkContext._active_spark_context javaCorrObj = _jvm().org.apache.spark.ml.stat.Correlation args = [_py2java(sc, arg) for arg in (dataset, column, method)] return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): +""" +.. note:: Experimental + +Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous +distribution. + +By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. + +>>> from pyspark.ml.stat import KolmogorovSmirnovTest +>>> dataset = [[-1.0], [0.0], [1.0]] +>>> dataset = spark.createDataFrame(dataset, ['sample']) +>>> ksResult = KolmogorovSmirnovTest.test(dataset, 'sample', 'norm', 0.0, 1.0).first() +>>> round(ksResult.pValue, 3) +1.0 +>>> round(ksResult.statistic, 3) +0.175 +>>> dataset = [[2.0], [3.0], [4.0]] +>>> dataset = spark.createDataFrame(dataset, ['sample']) +>>> ksResult = KolmogorovSmirnovTest.test(dataset, 'sample', 'norm', 3.0, 1.0).first() +>>> round(ksResult.pValue, 3) +1.0 +>>> round(ksResult.statistic, 3) +0.175 + +.. versionadded:: 2.4.0 + +""" +@staticmethod +@since("2.4.0") +def test(dataset, sampleCol, distName, *params): +""" +Perform a Kolmogorov-Smirnov test using dataset. --- End diff -- Can you please make this match the text in the Scala doc? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89071/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21001 **[Test build #89071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89071/testReport)** for PR 21001 at commit [`577a399`](https://github.com/apache/spark/commit/577a3996905fdb092f3f4b1a7011d4a450eaed7c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21014 **[Test build #89077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89077/testReport)** for PR 21014 at commit [`fb078eb`](https://github.com/apache/spark/commit/fb078eb5b58dc27e4a59efb14ce740f625681bf0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21014 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov closed the pull request at: https://github.com/apache/spark/pull/20327 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21014: [SPARK-23941][Mesos] Mesos task failed on specifi...
GitHub user tiboun opened a pull request: https://github.com/apache/spark/pull/21014 [SPARK-23941][Mesos] Mesos task failed on specific spark app name ## What changes were proposed in this pull request? Shell escaped the name passed to spark-submit and change how conf attributes are shell escaped. ## How was this patch tested? This test has been tested manually with Hive-on-spark with mesos or with the use case described in the issue with the sparkPi application with a custom name which contains illegal shell characters. With this PR, hive-on-spark on mesos works like a charm with hive 3.0.0-SNAPSHOT You can merge this pull request into a Git repository by running: $ git pull https://github.com/tiboun/spark fix/SPARK-23941 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21014.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21014 commit fb078eb5b58dc27e4a59efb14ce740f625681bf0 Author: Bounkong Khamphousone Date: 2018-04-09T13:11:55Z fix call to spark-submit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21008: [SPARK-23902][SQL] Add roundOff flag to months_between
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21008 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89070/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21008: [SPARK-23902][SQL] Add roundOff flag to months_between
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21008 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21008: [SPARK-23902][SQL] Add roundOff flag to months_between
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21008 **[Test build #89070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89070/testReport)** for PR 21008 at commit [`4544dd4`](https://github.com/apache/spark/commit/4544dd49968a0b8cd3e9c855575951447bfd2e24). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class MonthsBetween(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20952 lgtm assuming tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20046#discussion_r171190560 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { import testImplicits._ + private def sortWrappedArrayInRow(d: DataFrame) = d.map { --- End diff -- You can also just use the `array_sort` function. That is probably a lot cheaper. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20046#discussion_r171190314 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { import testImplicits._ + private def sortWrappedArrayInRow(d: DataFrame) = d.map { + case Row(key: String, unsorted: mutable.WrappedArray[String]) => --- End diff -- Let's not pattern match against `mutable.WrappedArray` and use `Seq` instead. `mutable.WrappedArray` is pretty much an implementation detail, and pattern matching against it is brittle. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89072/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21001 **[Test build #89072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89072/testReport)** for PR 21001 at commit [`1e27cc8`](https://github.com/apache/spark/commit/1e27cc87fb99822f97b425573b3b6798bfbc5418). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2114/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20327 I think you forgot to actually hit the "close" button... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20952 **[Test build #89076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89076/testReport)** for PR 20952 at commit [`3a5563b`](https://github.com/apache/spark/commit/3a5563bec7f5f4063b1d32c0af653b13b54186c2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20792: Branch 2.1
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20792 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20957: Branch 2.3
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20957 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21013 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89075/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21013 **[Test build #89075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89075/testReport)** for PR 21013 at commit [`c1791bf`](https://github.com/apache/spark/commit/c1791bf9f0cdd8074a19130e05be52d60f1e618c). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21013 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21013 The code changes are pretty much done, but I hit a regression in Python regarding conversion of `DecimalType` with `None` values. I filed to https://issues.apache.org/jira/browse/ARROW-2432 to fix it. Putting this up as a WIP for now, but we might want to think about holding off on the upgrade for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21013 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2113/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21013 **[Test build #89075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89075/testReport)** for PR 21013 at commit [`c1791bf`](https://github.com/apache/spark/commit/c1791bf9f0cdd8074a19130e05be52d60f1e618c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/21013 [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarrow to 0.9.0 ## What changes were proposed in this pull request? Upgrade Arrow to 0.9.0. This includes the Java jar and will require the Jenkins test environment to upgrade pyarrow on the Python 2 environments. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark arrow-upgrade-090 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21013.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21013 commit 255647589c8980465bbeb64788988468748814a8 Author: Bryan Cutler Date: 2018-04-09T19:26:06Z made required code changes for upgrade commit c1791bf9f0cdd8074a19130e05be52d60f1e618c Author: Bryan Cutler Date: 2018-04-09T19:26:54Z remove unused import --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20925 lgtm assuming tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180220465 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka010 + +import java.{util => ju} + +import scala.collection.JavaConverters._ + +import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer} +import org.apache.kafka.common.{KafkaException, TopicPartition} + +import org.apache.spark.TaskContext +import org.apache.spark.internal.Logging + +private[kafka010] sealed trait KafkaDataConsumer[K, V] { + /** + * Get the record for the given offset if available. + * + * @param offset the offset to fetch. + * @param pollTimeoutMs timeout in milliseconds to poll data from Kafka. + */ + def get(offset: Long, pollTimeoutMs: Long): ConsumerRecord[K, V] = { +internalConsumer.get(offset, pollTimeoutMs) + } + + /** + * Start a batch on a compacted topic + * + * @param offset the offset to fetch. + * @param pollTimeoutMs timeout in milliseconds to poll data from Kafka. + */ + def compactedStart(offset: Long, pollTimeoutMs: Long): Unit = { +internalConsumer.compactedStart(offset, pollTimeoutMs) + } + + /** + * Get the next record in the batch from a compacted topic. + * Assumes compactedStart has been called first, and ignores gaps. + * + * @param pollTimeoutMs timeout in milliseconds to poll data from Kafka. + */ + def compactedNext(pollTimeoutMs: Long): ConsumerRecord[K, V] = { +internalConsumer.compactedNext(pollTimeoutMs) + } + + /** + * Rewind to previous record in the batch from a compacted topic. + * + * @throws NoSuchElementException if no previous element + */ + def compactedPrevious(): ConsumerRecord[K, V] = { +internalConsumer.compactedPrevious() + } + + /** + * Release this consumer from being further used. Depending on its implementation, + * this consumer will be either finalized, or reset for reuse later. + */ + def release(): Unit + + /** Reference to the internal implementation that this wrapper delegates to */ + protected def internalConsumer: InternalKafkaConsumer[K, V] +} + + +/** + * A wrapper around Kafka's KafkaConsumer. + * This is not for direct use outside this file. + */ +private[kafka010] +class InternalKafkaConsumer[K, V]( + val groupId: String, + val topicPartition: TopicPartition, + val kafkaParams: ju.Map[String, Object]) extends Logging { + + require(groupId == kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG), +"groupId used for cache key must match the groupId in kafkaParams") + + @volatile private var consumer = createConsumer + + /** indicates whether this consumer is in use or not */ + @volatile var inUse = true + + /** indicate whether this consumer is going to be stopped in the next release */ + @volatile var markedForClose = false + + // TODO if the buffer was kept around as a random-access structure, + // could possibly optimize re-calculating of an RDD in the same batch + @volatile private var buffer = ju.Collections.emptyListIterator[ConsumerRecord[K, V]]() + @volatile private var nextOffset = InternalKafkaConsumer.UNKNOWN_OFFSET + + override def toString: String = { +"InternalKafkaConsumer(" + + s"hash=${Integer.toHexString(hashCode)}, " + + s"groupId=$groupId, " + + s"topicPartition=$topicPartition)" + } + + /** Create a KafkaConsumer to fetch records for `topicPartition` */ + private def createConsumer: KafkaConsumer[K, V] = { +val c = new KafkaConsumer[K, V](kafkaParams) +val tps = new ju.Ar
[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180212631 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka010 + +import java.{util => ju} + +import scala.collection.JavaConverters._ + +import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer} +import org.apache.kafka.common.{KafkaException, TopicPartition} + +import org.apache.spark.TaskContext +import org.apache.spark.internal.Logging + +private[kafka010] sealed trait KafkaDataConsumer[K, V] { + /** + * Get the record for the given offset if available. + * + * @param offset the offset to fetch. + * @param pollTimeoutMs timeout in milliseconds to poll data from Kafka. + */ + def get(offset: Long, pollTimeoutMs: Long): ConsumerRecord[K, V] = { +internalConsumer.get(offset, pollTimeoutMs) + } + + /** + * Start a batch on a compacted topic + * + * @param offset the offset to fetch. + * @param pollTimeoutMs timeout in milliseconds to poll data from Kafka. + */ + def compactedStart(offset: Long, pollTimeoutMs: Long): Unit = { +internalConsumer.compactedStart(offset, pollTimeoutMs) + } + + /** + * Get the next record in the batch from a compacted topic. + * Assumes compactedStart has been called first, and ignores gaps. + * + * @param pollTimeoutMs timeout in milliseconds to poll data from Kafka. + */ + def compactedNext(pollTimeoutMs: Long): ConsumerRecord[K, V] = { +internalConsumer.compactedNext(pollTimeoutMs) + } + + /** + * Rewind to previous record in the batch from a compacted topic. + * + * @throws NoSuchElementException if no previous element + */ + def compactedPrevious(): ConsumerRecord[K, V] = { +internalConsumer.compactedPrevious() + } + + /** + * Release this consumer from being further used. Depending on its implementation, + * this consumer will be either finalized, or reset for reuse later. + */ + def release(): Unit + + /** Reference to the internal implementation that this wrapper delegates to */ + protected def internalConsumer: InternalKafkaConsumer[K, V] +} + + +/** + * A wrapper around Kafka's KafkaConsumer. + * This is not for direct use outside this file. + */ +private[kafka010] +class InternalKafkaConsumer[K, V]( + val groupId: String, + val topicPartition: TopicPartition, + val kafkaParams: ju.Map[String, Object]) extends Logging { + + require(groupId == kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG), +"groupId used for cache key must match the groupId in kafkaParams") + + @volatile private var consumer = createConsumer + + /** indicates whether this consumer is in use or not */ + @volatile var inUse = true + + /** indicate whether this consumer is going to be stopped in the next release */ + @volatile var markedForClose = false + + // TODO if the buffer was kept around as a random-access structure, + // could possibly optimize re-calculating of an RDD in the same batch + @volatile private var buffer = ju.Collections.emptyListIterator[ConsumerRecord[K, V]]() + @volatile private var nextOffset = InternalKafkaConsumer.UNKNOWN_OFFSET + + override def toString: String = { +"InternalKafkaConsumer(" + + s"hash=${Integer.toHexString(hashCode)}, " + + s"groupId=$groupId, " + + s"topicPartition=$topicPartition)" + } + + /** Create a KafkaConsumer to fetch records for `topicPartition` */ + private def createConsumer: KafkaConsumer[K, V] = { +val c = new KafkaConsumer[K, V](kafkaParams) +val tps = new ju.Ar
[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180222555 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka010 + +import java.util.concurrent.{Executors, TimeUnit} + +import scala.collection.JavaConverters._ +import scala.util.Random + +import org.apache.kafka.clients.consumer.ConsumerConfig._ +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.common.serialization.ByteArrayDeserializer +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark._ + +class KafkaDataConsumerSuite extends SparkFunSuite with BeforeAndAfterAll { + + private var testUtils: KafkaTestUtils = _ + + override def beforeAll { +super.beforeAll() +testUtils = new KafkaTestUtils +testUtils.setup() + } + + override def afterAll { +if (testUtils != null) { + testUtils.teardown() + testUtils = null +} +super.afterAll() + } + + test("concurrent use of KafkaDataConsumer") { --- End diff -- this is good, but it would be nice to have a test which checks that cached consumers are re-used when possible. Eg this could pass just by never caching anything. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180221087 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka010 + +import java.util.concurrent.{Executors, TimeUnit} + +import scala.collection.JavaConverters._ +import scala.util.Random + +import org.apache.kafka.clients.consumer.ConsumerConfig._ +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.common.serialization.ByteArrayDeserializer +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark._ + +class KafkaDataConsumerSuite extends SparkFunSuite with BeforeAndAfterAll { + + private var testUtils: KafkaTestUtils = _ + + override def beforeAll { +super.beforeAll() +testUtils = new KafkaTestUtils +testUtils.setup() + } + + override def afterAll { +if (testUtils != null) { + testUtils.teardown() + testUtils = null +} +super.afterAll() + } + + test("concurrent use of KafkaDataConsumer") { +KafkaDataConsumer.init(16, 64, 0.75f) + +val topic = "topic" + Random.nextInt() +val data = (1 to 1000).map(_.toString) +val topicPartition = new TopicPartition(topic, 0) +testUtils.createTopic(topic) +testUtils.sendMessages(topic, data.toArray) + +val groupId = "groupId" +val kafkaParams = Map[String, Object]( + GROUP_ID_CONFIG -> groupId, + BOOTSTRAP_SERVERS_CONFIG -> testUtils.brokerAddress, + KEY_DESERIALIZER_CLASS_CONFIG -> classOf[ByteArrayDeserializer].getName, + VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[ByteArrayDeserializer].getName, + AUTO_OFFSET_RESET_CONFIG -> "earliest", + ENABLE_AUTO_COMMIT_CONFIG -> "false" +) + +val numThreads = 100 +val numConsumerUsages = 500 + +@volatile var error: Throwable = null + +def consume(i: Int): Unit = { + val useCache = Random.nextBoolean + val taskContext = if (Random.nextBoolean) { +new TaskContextImpl(0, 0, 0, 0, attemptNumber = Random.nextInt(2), null, null, null) + } else { +null + } + val consumer = KafkaDataConsumer.acquire[Array[Byte], Array[Byte]]( +groupId, topicPartition, kafkaParams.asJava, taskContext, useCache) + try { +val rcvd = 0 until data.length map { offset => --- End diff -- style -- just by convetion, ranges are an exception to the usual rule, they are wrapped with parens `(x until y).map` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20828 updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2112/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2111/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89069/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21005 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21005 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2110/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20535 **[Test build #89069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89069/testReport)** for PR 20535 at commit [`c811d72`](https://github.com/apache/spark/commit/c811d72f88552a30a985bdbb2c0005eddc56b5ff). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21005 **[Test build #89073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89073/testReport)** for PR 21005 at commit [`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20925 **[Test build #89074 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89074/testReport)** for PR 20925 at commit [`262bad8`](https://github.com/apache/spark/commit/262bad88a6d4d6c2513d6da3b2b52e86cd3f5b70). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89068/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20984 **[Test build #89068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89068/testReport)** for PR 20984 at commit [`962d048`](https://github.com/apache/spark/commit/962d048f8ad02003d83d67cd5105167ab11ac277). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/21005 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21011 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21011 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89067/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21011 **[Test build #89067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89067/testReport)** for PR 21011 at commit [`1408cfc`](https://github.com/apache/spark/commit/1408cfcd8f0ad2a571d29b57d71128584ea4b4f0). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArrayJoin(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20971: [SPARK-23809][SQL][backport] Active SparkSession ...
Github user ericl closed the pull request at: https://github.com/apache/spark/pull/20971 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20940: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r180204845 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -772,6 +772,12 @@ private[spark] class Executor( val accumUpdates = new ArrayBuffer[(Long, Seq[AccumulatorV2[_, _]])]() val curGCTime = computeTotalGcTime() +// get executor level memory metrics +val executorUpdates = new ExecutorMetrics(System.currentTimeMillis(), + ManagementFactory.getMemoryMXBean.getHeapMemoryUsage().getUsed(), --- End diff -- Thanks, I will play around with it a bit. If it seems more complicated or expensive, I'll file a separate subtask. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20983: [SPARK-23747][Structured Streaming] Add EpochCoor...
Github user efimpoberezkin commented on a diff in the pull request: https://github.com/apache/spark/pull/20983#discussion_r180203421 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/EpochCoordinatorSuite.scala --- @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming.continuous + +import org.mockito.InOrder +import org.mockito.Matchers.{any, eq => eqTo} +import org.mockito.Mockito._ +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mockito.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.rpc.RpcEndpointRef +import org.apache.spark.sql.execution.streaming.continuous._ +import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, PartitionOffset} +import org.apache.spark.sql.sources.v2.writer.WriterCommitMessage +import org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter +import org.apache.spark.sql.test.SharedSparkSession + +class EpochCoordinatorSuite + extends SparkFunSuite +with SharedSparkSession +with MockitoSugar +with BeforeAndAfterEach { + + private var epochCoordinator: RpcEndpointRef = _ + + private var writer: StreamWriter = _ + private var query: ContinuousExecution = _ + private var orderVerifier: InOrder = _ + + private val startEpoch = 1L + + override def beforeEach(): Unit = { +val reader = mock[ContinuousReader] +writer = mock[StreamWriter] +query = mock[ContinuousExecution] +orderVerifier = inOrder(writer, query) + +epochCoordinator + = EpochCoordinatorRef.create(writer, reader, query, "test", startEpoch, spark, SparkEnv.get) + } + + override def afterEach(): Unit = { +SparkEnv.get.rpcEnv.stop(epochCoordinator) + } + + test("single epoch") { +setWriterPartitions(3) +setReaderPartitions(2) + +commitPartitionEpoch(0, startEpoch) +commitPartitionEpoch(1, startEpoch) +commitPartitionEpoch(2, startEpoch) +reportPartitionOffset(0, startEpoch) +reportPartitionOffset(1, startEpoch) + +// Here and in subsequent tests this is called to make a synchronous call to EpochCoordinator +// so that mocks would have been acted upon by the time verification happens +makeSynchronousCall() + +verifyCommit(startEpoch) + } + + test("consequent epochs, messages for epoch (k + 1) arrive after messages for epoch k") { +setWriterPartitions(2) +setReaderPartitions(2) + +val epochs = startEpoch to (startEpoch + 1) --- End diff -- I agree that it would be more readable, however the fact that we test for the start epoch first might be not as obvious then since it'd be hardcoded in before. Still pretty obvious though I guess.. and probably there will be no need to change start epoch in tests so hardcoding it is fine, and readability would increase. Will change this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20786 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17466: [SPARK-14681][ML] Added getter for impurityStats
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17466 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20786 LGTM Merging with master Thanks @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21012: [SPARK-23890][SQL] Support CHANGE COLUMN to add nested f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21012 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org