date:20180409

[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday

2018-04-09 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21009
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday

2018-04-09 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21009
  
Jenkins, add to whitelist.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89076/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20952
  
**[Test build #89076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89076/testReport)**
 for PR 20952 at commit 
[`3a5563b`](https://github.com/apache/spark/commit/3a5563bec7f5f4063b1d32c0af653b13b54186c2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21007
  
cc @cloud-fan and @viirya (from checking the history).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-09 Thread madanadit

Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/20989
  
@foxish The failure seems to be because of `SparkRemoteFileTest` missing 
from branch-2.3 (only present in master branch). Which branch do you recommend 
targeting this PR towards?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21001
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2116/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21001
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21001
  
**[Test build #89080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89080/testReport)**
 for PR 21001 at commit 
[`3f3391b`](https://github.com/apache/spark/commit/3f3391b28a7d1c8fed1df195d6b1a0e27f0a60c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89079/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21015
  
**[Test build #89079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89079/testReport)**
 for PR 21015 at commit 
[`9f16ea6`](https://github.com/apache/spark/commit/9f16ea6f15572a8d189fe537844487abdea797b4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21015
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89073/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21005
  
**[Test build #89073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89073/testReport)**
 for PR 21005 at commit 
[`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89074/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20925
  
**[Test build #89074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89074/testReport)**
 for PR 20925 at commit 
[`262bad8`](https://github.com/apache/spark/commit/262bad88a6d4d6c2513d6da3b2b52e86cd3f5b70).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21015
  
**[Test build #89079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89079/testReport)**
 for PR 21015 at commit 
[`9f16ea6`](https://github.com/apache/spark/commit/9f16ea6f15572a8d189fe537844487abdea797b4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/21015
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-09 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20992
  
@kiszk I did not look at the PR in detail - was curious about the commit 
saying 'address review comment'; but did not see any comments. Were they 
removed by submitter ? Or is github acting up ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20998: [SPARK-23888][CORE] speculative task should not run on a...

2018-04-09 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20998
  
Adding isRunning can cause a single 'bad' node (from task pov - not 
necessarily only bad hardware: just that task fails on node) can keep tasks to 
fail repeatedly causing app to exit.

Particularly with blacklist'ing, I am not very sure how the interactions 
will play out .. @squito might have more comments.
In general, this is not a benign change imo and can have non trivial side 
effects.

In the specific usecase of only two machines, it is an unfortunate side 
effect.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21016: [SPARK-23947][SQL] Add hashUTF8String convenience method...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21016
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21016: [SPARK-23947][SQL] Add hashUTF8String convenience method...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21016
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2115/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-04-09 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20043
  
Thanks! @cloud-fan @hvanhovell @kiszk @HyukjinKwon @mgaido91 @maropu 
@rednaxelafx @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21016: [SPARK-23947][SQL] Add hashUTF8String convenience method...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21016
  
**[Test build #89078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89078/testReport)**
 for PR 21016 at commit 
[`ead91a4`](https://github.com/apache/spark/commit/ead91a453fb6a3f92255d14d37c1d4a6beac22fd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21016: [SPARK-23947][SQL] Add hashUTF8String convenience...

2018-04-09 Thread rednaxelafx

GitHub user rednaxelafx opened a pull request:

https://github.com/apache/spark/pull/21016

[SPARK-23947][SQL] Add hashUTF8String convenience method to hasher classes

## What changes were proposed in this pull request?

Add `hashUTF8String()` to the hasher classes to allow Spark SQL codegen to 
generate cleaner code for hashing `UTF8String`s. No change in behavior 
otherwise.

Although with the introduction of SPARK-10399, the code size for hashing 
`UTF8String` is already smaller, it's still good to extract a separate function 
in the hasher classes so that the generated code can stay clean.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rednaxelafx/apache-spark hashutf8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21016


commit ead91a453fb6a3f92255d14d37c1d4a6beac22fd
Author: Kris Mok 
Date:   2018-04-09T22:30:03Z

SPARK-23947 - Add hashUTF8String convenience method to hasher classes




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21015
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21015
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21015: [SPARK-23944][ML] Add the set method for the two ...

2018-04-09 Thread ludatabricks

GitHub user ludatabricks opened a pull request:

https://github.com/apache/spark/pull/21015

[SPARK-23944][ML] Add the set method for the two LSHModel

## What changes were proposed in this pull request?

Add two set method for LSHModel in LSH.scala, 
BucketedRandomProjectionLSH.scala, and MinHashLSH.scala

## How was this patch tested?

New test for the param setup was added into 

- BucketedRandomProjectionLSHSuite.scala

- MinHashLSHSuite.scala

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ludatabricks/spark-1 SPARK-23944

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21015


commit 9f16ea6f15572a8d189fe537844487abdea797b4
Author: Lu WANG 
Date:   2018-04-09T21:56:48Z

Add the set method for two LSHModels




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-09 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/19881
  
SGTM on divisor.

Do we need "full" there in the config?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89077/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89077/testReport)**
 for PR 21014 at commit 
[`fb078eb`](https://github.com/apache/spark/commit/fb078eb5b58dc27e4a59efb14ce740f625681bf0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r180228711
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -127,13 +113,86 @@ class Correlation(object):
 def corr(dataset, column, method="pearson"):
 """
 Compute the correlation matrix with specified method using dataset.
+
+:param dataset:
+  A Dataset or a DataFrame.
+:param column:
+  The name of the column of vectors for which the correlation 
coefficient needs
+  to be computed. This must be a column of the dataset, and it 
must contain
+  Vector objects.
+:param method:
+  String specifying the method to use for computing correlation.
+  Supported: `pearson` (default), `spearman`.
+:return:
+  A DataFrame that contains the correlation matrix of the column 
of vectors. This
+  DataFrame contains a single row and a single column of name
+  '$METHODNAME($COLUMN)'.
 """
 sc = SparkContext._active_spark_context
 javaCorrObj = _jvm().org.apache.spark.ml.stat.Correlation
 args = [_py2java(sc, arg) for arg in (dataset, column, method)]
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class KolmogorovSmirnovTest(object):
+"""
+.. note:: Experimental
+
+Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
+distribution.
+
+By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we 
can provide a test for the
+the null hypothesis that the sample data comes from that theoretical 
distribution.
+
+>>> from pyspark.ml.stat import KolmogorovSmirnovTest
--- End diff --

Thanks for moving the method-specific documentation.  These doctests are 
method-specific too, though, so can you please move them as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r180245120
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -127,13 +113,86 @@ class Correlation(object):
 def corr(dataset, column, method="pearson"):
 """
 Compute the correlation matrix with specified method using dataset.
+
+:param dataset:
+  A Dataset or a DataFrame.
+:param column:
+  The name of the column of vectors for which the correlation 
coefficient needs
+  to be computed. This must be a column of the dataset, and it 
must contain
+  Vector objects.
+:param method:
+  String specifying the method to use for computing correlation.
+  Supported: `pearson` (default), `spearman`.
+:return:
+  A DataFrame that contains the correlation matrix of the column 
of vectors. This
+  DataFrame contains a single row and a single column of name
+  '$METHODNAME($COLUMN)'.
 """
 sc = SparkContext._active_spark_context
 javaCorrObj = _jvm().org.apache.spark.ml.stat.Correlation
 args = [_py2java(sc, arg) for arg in (dataset, column, method)]
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class KolmogorovSmirnovTest(object):
+"""
+.. note:: Experimental
+
+Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
+distribution.
+
+By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we 
can provide a test for the
+the null hypothesis that the sample data comes from that theoretical 
distribution.
+
+>>> from pyspark.ml.stat import KolmogorovSmirnovTest
+>>> dataset = [[-1.0], [0.0], [1.0]]
+>>> dataset = spark.createDataFrame(dataset, ['sample'])
+>>> ksResult = KolmogorovSmirnovTest.test(dataset, 'sample', 'norm', 
0.0, 1.0).first()
+>>> round(ksResult.pValue, 3)
+1.0
+>>> round(ksResult.statistic, 3)
+0.175
+>>> dataset = [[2.0], [3.0], [4.0]]
+>>> dataset = spark.createDataFrame(dataset, ['sample'])
+>>> ksResult = KolmogorovSmirnovTest.test(dataset, 'sample', 'norm', 
3.0, 1.0).first()
+>>> round(ksResult.pValue, 3)
+1.0
+>>> round(ksResult.statistic, 3)
+0.175
+
+.. versionadded:: 2.4.0
+
+"""
+@staticmethod
+@since("2.4.0")
+def test(dataset, sampleCol, distName, *params):
+"""
+Perform a Kolmogorov-Smirnov test using dataset.
--- End diff --

Can you please make this match the text in the Scala doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89071/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21001
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21001
  
**[Test build #89071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89071/testReport)**
 for PR 21001 at commit 
[`577a399`](https://github.com/apache/spark/commit/577a3996905fdb092f3f4b1a7011d4a450eaed7c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89077/testReport)**
 for PR 21014 at commit 
[`fb078eb`](https://github.com/apache/spark/commit/fb078eb5b58dc27e4a59efb14ce740f625681bf0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21014
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-04-09 Thread gerashegalov

Github user gerashegalov closed the pull request at:

https://github.com/apache/spark/pull/20327


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21014: [SPARK-23941][Mesos] Mesos task failed on specifi...

2018-04-09 Thread tiboun

GitHub user tiboun opened a pull request:

https://github.com/apache/spark/pull/21014

[SPARK-23941][Mesos] Mesos task failed on specific spark app name

## What changes were proposed in this pull request?
Shell escaped the name passed to spark-submit and change how conf 
attributes are shell escaped.

## How was this patch tested?
This test has been tested manually with Hive-on-spark with mesos or with 
the use case described in the issue with the sparkPi application with a custom 
name which contains illegal shell characters.

With this PR, hive-on-spark on mesos works like a charm with hive 
3.0.0-SNAPSHOT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tiboun/spark fix/SPARK-23941

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21014


commit fb078eb5b58dc27e4a59efb14ce740f625681bf0
Author: Bounkong Khamphousone 
Date:   2018-04-09T13:11:55Z

fix call to spark-submit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21008: [SPARK-23902][SQL] Add roundOff flag to months_between

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21008
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89070/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21008: [SPARK-23902][SQL] Add roundOff flag to months_between

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21008
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21008: [SPARK-23902][SQL] Add roundOff flag to months_between

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21008
  
**[Test build #89070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89070/testReport)**
 for PR 21008 at commit 
[`4544dd4`](https://github.com/apache/spark/commit/4544dd49968a0b8cd3e9c855575951447bfd2e24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class MonthsBetween(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20952
  
lgtm assuming tests pass


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...

2018-04-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20046#discussion_r171190560
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
 ---
@@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval
 class DataFrameWindowFunctionsSuite extends QueryTest with 
SharedSQLContext {
   import testImplicits._
 
+  private def sortWrappedArrayInRow(d: DataFrame) = d.map {
--- End diff --

You can also just use the `array_sort` function. That is probably a lot 
cheaper.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...

2018-04-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20046#discussion_r171190314
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
 ---
@@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval
 class DataFrameWindowFunctionsSuite extends QueryTest with 
SharedSQLContext {
   import testImplicits._
 
+  private def sortWrappedArrayInRow(d: DataFrame) = d.map {
+  case Row(key: String, unsorted: mutable.WrappedArray[String]) =>
--- End diff --

Let's not pattern match against `mutable.WrappedArray` and use `Seq` 
instead. `mutable.WrappedArray` is pretty much an implementation detail, and 
pattern matching against it is brittle.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89072/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21001
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21001
  
**[Test build #89072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89072/testReport)**
 for PR 21001 at commit 
[`1e27cc8`](https://github.com/apache/spark/commit/1e27cc87fb99822f97b425573b3b6798bfbc5418).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2114/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-04-09 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20327
  
I think you forgot to actually hit the "close" button...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20952: [SPARK-6951][core] Speed up parsing of event logs during...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20952
  
**[Test build #89076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89076/testReport)**
 for PR 20952 at commit 
[`3a5563b`](https://github.com/apache/spark/commit/3a5563bec7f5f4063b1d32c0af653b13b54186c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20792: Branch 2.1

2018-04-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20792


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20957: Branch 2.3

2018-04-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20957


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21013
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89075/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21013
  
**[Test build #89075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89075/testReport)**
 for PR 21013 at commit 
[`c1791bf`](https://github.com/apache/spark/commit/c1791bf9f0cdd8074a19130e05be52d60f1e618c).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21013
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21013
  
The code changes are pretty much done, but I hit a regression in Python 
regarding conversion of `DecimalType` with `None` values.  I filed to 
https://issues.apache.org/jira/browse/ARROW-2432 to fix it.  Putting this up as 
a WIP for now, but we might want to think about holding off on the upgrade for 
now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21013
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2113/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21013
  
**[Test build #89075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89075/testReport)**
 for PR 21013 at commit 
[`c1791bf`](https://github.com/apache/spark/commit/c1791bf9f0cdd8074a19130e05be52d60f1e618c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and...

2018-04-09 Thread BryanCutler

GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/21013

[WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarrow to 0.9.0

## What changes were proposed in this pull request?

Upgrade Arrow to 0.9.0.  This includes the Java jar and will require the 
Jenkins test environment to upgrade pyarrow on the Python 2 environments.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark arrow-upgrade-090

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21013


commit 255647589c8980465bbeb64788988468748814a8
Author: Bryan Cutler 
Date:   2018-04-09T19:26:06Z

made required code changes for upgrade

commit c1791bf9f0cdd8074a19130e05be52d60f1e618c
Author: Bryan Cutler 
Date:   2018-04-09T19:26:54Z

remove unused import




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20925
  
lgtm assuming tests pass


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20997#discussion_r180220465
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala
 ---
@@ -0,0 +1,381 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, 
KafkaConsumer}
+import org.apache.kafka.common.{KafkaException, TopicPartition}
+
+import org.apache.spark.TaskContext
+import org.apache.spark.internal.Logging
+
+private[kafka010] sealed trait KafkaDataConsumer[K, V] {
+  /**
+   * Get the record for the given offset if available.
+   *
+   * @param offset the offset to fetch.
+   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
+   */
+  def get(offset: Long, pollTimeoutMs: Long): ConsumerRecord[K, V] = {
+internalConsumer.get(offset, pollTimeoutMs)
+  }
+
+  /**
+   * Start a batch on a compacted topic
+   *
+   * @param offset the offset to fetch.
+   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
+   */
+  def compactedStart(offset: Long, pollTimeoutMs: Long): Unit = {
+internalConsumer.compactedStart(offset, pollTimeoutMs)
+  }
+
+  /**
+   * Get the next record in the batch from a compacted topic.
+   * Assumes compactedStart has been called first, and ignores gaps.
+   *
+   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
+   */
+  def compactedNext(pollTimeoutMs: Long): ConsumerRecord[K, V] = {
+internalConsumer.compactedNext(pollTimeoutMs)
+  }
+
+  /**
+   * Rewind to previous record in the batch from a compacted topic.
+   *
+   * @throws NoSuchElementException if no previous element
+   */
+  def compactedPrevious(): ConsumerRecord[K, V] = {
+internalConsumer.compactedPrevious()
+  }
+
+  /**
+   * Release this consumer from being further used. Depending on its 
implementation,
+   * this consumer will be either finalized, or reset for reuse later.
+   */
+  def release(): Unit
+
+  /** Reference to the internal implementation that this wrapper delegates 
to */
+  protected def internalConsumer: InternalKafkaConsumer[K, V]
+}
+
+
+/**
+ * A wrapper around Kafka's KafkaConsumer.
+ * This is not for direct use outside this file.
+ */
+private[kafka010]
+class InternalKafkaConsumer[K, V](
+  val groupId: String,
+  val topicPartition: TopicPartition,
+  val kafkaParams: ju.Map[String, Object]) extends Logging {
+
+  require(groupId == kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG),
+"groupId used for cache key must match the groupId in kafkaParams")
+
+  @volatile private var consumer = createConsumer
+
+  /** indicates whether this consumer is in use or not */
+  @volatile var inUse = true
+
+  /** indicate whether this consumer is going to be stopped in the next 
release */
+  @volatile var markedForClose = false
+
+  // TODO if the buffer was kept around as a random-access structure,
+  // could possibly optimize re-calculating of an RDD in the same batch
+  @volatile private var buffer = 
ju.Collections.emptyListIterator[ConsumerRecord[K, V]]()
+  @volatile private var nextOffset = InternalKafkaConsumer.UNKNOWN_OFFSET
+
+  override def toString: String = {
+"InternalKafkaConsumer(" +
+  s"hash=${Integer.toHexString(hashCode)}, " +
+  s"groupId=$groupId, " +
+  s"topicPartition=$topicPartition)"
+  }
+
+  /** Create a KafkaConsumer to fetch records for `topicPartition` */
+  private def createConsumer: KafkaConsumer[K, V] = {
+val c = new KafkaConsumer[K, V](kafkaParams)
+val tps = new ju.Ar

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20997#discussion_r180212631
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala
 ---
@@ -0,0 +1,381 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, 
KafkaConsumer}
+import org.apache.kafka.common.{KafkaException, TopicPartition}
+
+import org.apache.spark.TaskContext
+import org.apache.spark.internal.Logging
+
+private[kafka010] sealed trait KafkaDataConsumer[K, V] {
+  /**
+   * Get the record for the given offset if available.
+   *
+   * @param offset the offset to fetch.
+   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
+   */
+  def get(offset: Long, pollTimeoutMs: Long): ConsumerRecord[K, V] = {
+internalConsumer.get(offset, pollTimeoutMs)
+  }
+
+  /**
+   * Start a batch on a compacted topic
+   *
+   * @param offset the offset to fetch.
+   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
+   */
+  def compactedStart(offset: Long, pollTimeoutMs: Long): Unit = {
+internalConsumer.compactedStart(offset, pollTimeoutMs)
+  }
+
+  /**
+   * Get the next record in the batch from a compacted topic.
+   * Assumes compactedStart has been called first, and ignores gaps.
+   *
+   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
+   */
+  def compactedNext(pollTimeoutMs: Long): ConsumerRecord[K, V] = {
+internalConsumer.compactedNext(pollTimeoutMs)
+  }
+
+  /**
+   * Rewind to previous record in the batch from a compacted topic.
+   *
+   * @throws NoSuchElementException if no previous element
+   */
+  def compactedPrevious(): ConsumerRecord[K, V] = {
+internalConsumer.compactedPrevious()
+  }
+
+  /**
+   * Release this consumer from being further used. Depending on its 
implementation,
+   * this consumer will be either finalized, or reset for reuse later.
+   */
+  def release(): Unit
+
+  /** Reference to the internal implementation that this wrapper delegates 
to */
+  protected def internalConsumer: InternalKafkaConsumer[K, V]
+}
+
+
+/**
+ * A wrapper around Kafka's KafkaConsumer.
+ * This is not for direct use outside this file.
+ */
+private[kafka010]
+class InternalKafkaConsumer[K, V](
+  val groupId: String,
+  val topicPartition: TopicPartition,
+  val kafkaParams: ju.Map[String, Object]) extends Logging {
+
+  require(groupId == kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG),
+"groupId used for cache key must match the groupId in kafkaParams")
+
+  @volatile private var consumer = createConsumer
+
+  /** indicates whether this consumer is in use or not */
+  @volatile var inUse = true
+
+  /** indicate whether this consumer is going to be stopped in the next 
release */
+  @volatile var markedForClose = false
+
+  // TODO if the buffer was kept around as a random-access structure,
+  // could possibly optimize re-calculating of an RDD in the same batch
+  @volatile private var buffer = 
ju.Collections.emptyListIterator[ConsumerRecord[K, V]]()
+  @volatile private var nextOffset = InternalKafkaConsumer.UNKNOWN_OFFSET
+
+  override def toString: String = {
+"InternalKafkaConsumer(" +
+  s"hash=${Integer.toHexString(hashCode)}, " +
+  s"groupId=$groupId, " +
+  s"topicPartition=$topicPartition)"
+  }
+
+  /** Create a KafkaConsumer to fetch records for `topicPartition` */
+  private def createConsumer: KafkaConsumer[K, V] = {
+val c = new KafkaConsumer[K, V](kafkaParams)
+val tps = new ju.Ar

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20997#discussion_r180222555
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka010
+
+import java.util.concurrent.{Executors, TimeUnit}
+
+import scala.collection.JavaConverters._
+import scala.util.Random
+
+import org.apache.kafka.clients.consumer.ConsumerConfig._
+import org.apache.kafka.common.TopicPartition
+import org.apache.kafka.common.serialization.ByteArrayDeserializer
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark._
+
+class KafkaDataConsumerSuite extends SparkFunSuite with BeforeAndAfterAll {
+
+  private var testUtils: KafkaTestUtils = _
+
+  override def beforeAll {
+super.beforeAll()
+testUtils = new KafkaTestUtils
+testUtils.setup()
+  }
+
+  override def afterAll {
+if (testUtils != null) {
+  testUtils.teardown()
+  testUtils = null
+}
+super.afterAll()
+  }
+
+  test("concurrent use of KafkaDataConsumer") {
--- End diff --

this is good, but it would be nice to have a test which checks that cached 
consumers are re-used when possible.  Eg this could pass just by never caching 
anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20997#discussion_r180221087
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka010
+
+import java.util.concurrent.{Executors, TimeUnit}
+
+import scala.collection.JavaConverters._
+import scala.util.Random
+
+import org.apache.kafka.clients.consumer.ConsumerConfig._
+import org.apache.kafka.common.TopicPartition
+import org.apache.kafka.common.serialization.ByteArrayDeserializer
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark._
+
+class KafkaDataConsumerSuite extends SparkFunSuite with BeforeAndAfterAll {
+
+  private var testUtils: KafkaTestUtils = _
+
+  override def beforeAll {
+super.beforeAll()
+testUtils = new KafkaTestUtils
+testUtils.setup()
+  }
+
+  override def afterAll {
+if (testUtils != null) {
+  testUtils.teardown()
+  testUtils = null
+}
+super.afterAll()
+  }
+
+  test("concurrent use of KafkaDataConsumer") {
+KafkaDataConsumer.init(16, 64, 0.75f)
+
+val topic = "topic" + Random.nextInt()
+val data = (1 to 1000).map(_.toString)
+val topicPartition = new TopicPartition(topic, 0)
+testUtils.createTopic(topic)
+testUtils.sendMessages(topic, data.toArray)
+
+val groupId = "groupId"
+val kafkaParams = Map[String, Object](
+  GROUP_ID_CONFIG -> groupId,
+  BOOTSTRAP_SERVERS_CONFIG -> testUtils.brokerAddress,
+  KEY_DESERIALIZER_CLASS_CONFIG -> 
classOf[ByteArrayDeserializer].getName,
+  VALUE_DESERIALIZER_CLASS_CONFIG -> 
classOf[ByteArrayDeserializer].getName,
+  AUTO_OFFSET_RESET_CONFIG -> "earliest",
+  ENABLE_AUTO_COMMIT_CONFIG -> "false"
+)
+
+val numThreads = 100
+val numConsumerUsages = 500
+
+@volatile var error: Throwable = null
+
+def consume(i: Int): Unit = {
+  val useCache = Random.nextBoolean
+  val taskContext = if (Random.nextBoolean) {
+new TaskContextImpl(0, 0, 0, 0, attemptNumber = Random.nextInt(2), 
null, null, null)
+  } else {
+null
+  }
+  val consumer = KafkaDataConsumer.acquire[Array[Byte], Array[Byte]](
+groupId, topicPartition, kafkaParams.asJava, taskContext, useCache)
+  try {
+val rcvd = 0 until data.length map { offset =>
--- End diff --

style -- just by convetion, ranges are an exception to the usual rule, they 
are wrapped with parens `(x until y).map`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-04-09 Thread jose-torres

Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20828
  
updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2112/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2111/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20535
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89069/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2110/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20535
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20535
  
**[Test build #89069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89069/testReport)**
 for PR 20535 at commit 
[`c811d72`](https://github.com/apache/spark/commit/c811d72f88552a30a985bdbb2c0005eddc56b5ff).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21005
  
**[Test build #89073 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89073/testReport)**
 for PR 21005 at commit 
[`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20925
  
**[Test build #89074 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89074/testReport)**
 for PR 20925 at commit 
[`262bad8`](https://github.com/apache/spark/commit/262bad88a6d4d6c2513d6da3b2b52e86cd3f5b70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89068/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20984
  
**[Test build #89068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89068/testReport)**
 for PR 20984 at commit 
[`962d048`](https://github.com/apache/spark/commit/962d048f8ad02003d83d67cd5105167ab11ac277).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/21005
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21011
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21011
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89067/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function

2018-04-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21011
  
**[Test build #89067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89067/testReport)**
 for PR 21011 at commit 
[`1408cfc`](https://github.com/apache/spark/commit/1408cfcd8f0ad2a571d29b57d71128584ea4b4f0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArrayJoin(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20971: [SPARK-23809][SQL][backport] Active SparkSession ...

2018-04-09 Thread ericl

Github user ericl closed the pull request at:

https://github.com/apache/spark/pull/20971


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20940: [SPARK-23429][CORE] Add executor memory metrics t...

2018-04-09 Thread edwinalu

Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20940#discussion_r180204845
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -772,6 +772,12 @@ private[spark] class Executor(
 val accumUpdates = new ArrayBuffer[(Long, Seq[AccumulatorV2[_, _]])]()
 val curGCTime = computeTotalGcTime()
 
+// get executor level memory metrics
+val executorUpdates = new ExecutorMetrics(System.currentTimeMillis(),
+  ManagementFactory.getMemoryMXBean.getHeapMemoryUsage().getUsed(),
--- End diff --

Thanks, I will play around with it a bit. If it seems more complicated or 
expensive, I'll file a separate subtask.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20983: [SPARK-23747][Structured Streaming] Add EpochCoor...

2018-04-09 Thread efimpoberezkin

Github user efimpoberezkin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20983#discussion_r180203421
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/EpochCoordinatorSuite.scala
 ---
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.continuous
+
+import org.mockito.InOrder
+import org.mockito.Matchers.{any, eq => eqTo}
+import org.mockito.Mockito._
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mockito.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.rpc.RpcEndpointRef
+import org.apache.spark.sql.execution.streaming.continuous._
+import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, 
PartitionOffset}
+import org.apache.spark.sql.sources.v2.writer.WriterCommitMessage
+import org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter
+import org.apache.spark.sql.test.SharedSparkSession
+
+class EpochCoordinatorSuite
+  extends SparkFunSuite
+with SharedSparkSession
+with MockitoSugar
+with BeforeAndAfterEach {
+
+  private var epochCoordinator: RpcEndpointRef = _
+
+  private var writer: StreamWriter = _
+  private var query: ContinuousExecution = _
+  private var orderVerifier: InOrder = _
+
+  private val startEpoch = 1L
+
+  override def beforeEach(): Unit = {
+val reader = mock[ContinuousReader]
+writer = mock[StreamWriter]
+query = mock[ContinuousExecution]
+orderVerifier = inOrder(writer, query)
+
+epochCoordinator
+  = EpochCoordinatorRef.create(writer, reader, query, "test", 
startEpoch, spark, SparkEnv.get)
+  }
+
+  override def afterEach(): Unit = {
+SparkEnv.get.rpcEnv.stop(epochCoordinator)
+  }
+
+  test("single epoch") {
+setWriterPartitions(3)
+setReaderPartitions(2)
+
+commitPartitionEpoch(0, startEpoch)
+commitPartitionEpoch(1, startEpoch)
+commitPartitionEpoch(2, startEpoch)
+reportPartitionOffset(0, startEpoch)
+reportPartitionOffset(1, startEpoch)
+
+// Here and in subsequent tests this is called to make a synchronous 
call to EpochCoordinator
+// so that mocks would have been acted upon by the time verification 
happens
+makeSynchronousCall()
+
+verifyCommit(startEpoch)
+  }
+
+  test("consequent epochs, messages for epoch (k + 1) arrive after 
messages for epoch k") {
+setWriterPartitions(2)
+setReaderPartitions(2)
+
+val epochs = startEpoch to (startEpoch + 1)
--- End diff --

I agree that it would be more readable, however the fact that we test for 
the start epoch first might be not as obvious then since it'd be hardcoded in 
before. Still pretty obvious though I guess.. and probably there will be no 
need to change start epoch in tests so hardcoding it is fine, and readability 
would increase. Will change this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-04-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20786


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17466: [SPARK-14681][ML] Added getter for impurityStats

2018-04-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17466


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-09 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20786
  
LGTM
Merging with master
Thanks @WeichenXu123 !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21012: [SPARK-23890][SQL] Support CHANGE COLUMN to add nested f...

2018-04-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21012
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

101 - 200 of 468 matches

Mail list logo