[GitHub] spark pull request #22631: [SPARK-25605][TESTS] Run cast string to timestamp...

2018-10-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22631#discussion_r223253381
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("cast string to timestamp") {
-for (tz <- ALL_TIMEZONES) {
+for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
--- End diff --

 > tests need to be deterministic, or else what's the value? failures can't 
be reproduced

That's not true. AFAIK we have a lot of tests that generate data randomly, 
and if it fails, the test name will include the seed(or the generated data), so 
people can easily reproduce it.

I think it's a good strategy to test a subset of possible cases, to make a 
tradeoff between how soon we can discover a bug, and how fast we can iterate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97098/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22655
  
**[Test build #97098 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97098/testReport)**
 for PR 22655 at commit 
[`6ee69a3`](https://github.com/apache/spark/commit/6ee69a37827737760a662a5a9d03f7b5b37fc39e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22635
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22635
  
**[Test build #97100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97100/testReport)**
 for PR 22635 at commit 
[`08c7223`](https://github.com/apache/spark/commit/08c7223c57d6c2b9536ba311ea4f81b20f37d973).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22635
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3786/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97097/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22655
  
**[Test build #97097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97097/testReport)**
 for PR 22655 at commit 
[`3aa0103`](https://github.com/apache/spark/commit/3aa010320b50fecded7f103292c1b93daf9f3754).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97099/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22630
  
**[Test build #97099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97099/testReport)**
 for PR 22630 at commit 
[`d815c0b`](https://github.com/apache/spark/commit/d815c0bffc7f23325c256d436c243af94cb2a228).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait BlockingOperatorWithCodegen extends CodegenSupport `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22635#discussion_r223251196
  
--- Diff: python/pyspark/accumulators.py ---
@@ -109,10 +109,14 @@
 
 def _deserialize_accumulator(aid, zero_value, accum_param):
 from pyspark.accumulators import _accumulatorRegistry
-accum = Accumulator(aid, zero_value, accum_param)
-accum._deserialized = True
-_accumulatorRegistry[aid] = accum
-return accum
+# If this certain accumulator was deserialized, don't overwrite it.
+if aid in _accumulatorRegistry:
--- End diff --

Yes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22635
  
Thanks @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22635#discussion_r223251175
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3603,6 +3603,31 @@ def test_repr_behaviors(self):
 self.assertEquals(None, df._repr_html_())
 self.assertEquals(expected, df.__repr__())
 
+# SPARK-25591
+def test_same_accumulator_in_udfs(self):
+from pyspark.sql.functions import udf
+
+data_schema = StructType([StructField("a", DoubleType(), True),
+  StructField("b", DoubleType(), True)])
+data = self.spark.createDataFrame([[1.0, 2.0]], schema=data_schema)
+
+test_accum = self.sc.accumulator(0.0)
+
+def first_udf(x):
+test_accum.add(1.0)
+return x
+
+def second_udf(x):
+test_accum.add(100.0)
+return x
+
+func_udf = udf(first_udf, DoubleType())
+func_udf2 = udf(second_udf, DoubleType())
+data = data.withColumn("out1", func_udf(data["a"]))
+data = data.withColumn("out2", func_udf2(data["b"]))
+data.collect()
+self.assertEqual(test_accum.value, 101)
--- End diff --

Ok. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3785/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22630
  
**[Test build #97099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97099/testReport)**
 for PR 22630 at commit 
[`d815c0b`](https://github.com/apache/spark/commit/d815c0bffc7f23325c256d436c243af94cb2a228).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22655
  
**[Test build #97098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97098/testReport)**
 for PR 22655 at commit 
[`6ee69a3`](https://github.com/apache/spark/commit/6ee69a37827737760a662a5a9d03f7b5b37fc39e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3784/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22623
  
That's checked below. Why this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22655
  
**[Test build #97097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97097/testReport)**
 for PR 22655 at commit 
[`3aa0103`](https://github.com/apache/spark/commit/3aa010320b50fecded7f103292c1b93daf9f3754).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3783/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22655
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22623: [SPARK-25636][CORE] spark-submit cuts off the fai...

2018-10-07 Thread devaraj-kavali
Github user devaraj-kavali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22623#discussion_r223248407
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -74,20 +74,26 @@ trait TestPrematureExit {
 @volatile var exitedCleanly = false
 mainObject.exitFn = (_) => exitedCleanly = true
 
+var message: String = null
 val thread = new Thread {
   override def run() = try {
 mainObject.main(input)
   } catch {
 // If exceptions occur after the "exit" has happened, fine to 
ignore them.
 // These represent code paths not reachable during normal 
execution.
-case e: Exception => if (!exitedCleanly) throw e
+case e: Exception =>
+  message = e.getMessage
+  if (!(exitedCleanly || message.contains(searchString))) {
--- End diff --

With this PR change, SparkException will not be caught and thrown directly, 
and nothing writing to System.err and also no exit(exitedCleanly = false) in 
this case,  here we need to check the thrown exception message whether it has 
the expected searchString or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22668: [SPARK-25675] [Spark Job History] Job UI page does not s...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22668
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22668: [SPARK-25675] [Spark Job History] Job UI page does not s...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22668
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22668: [SPARK-25675] [Spark Job History] Job UI page does not s...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22668
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22668: [SPARK-25675] [Spark Job History] Job UI page doe...

2018-10-07 Thread shivusondur
GitHub user shivusondur opened a pull request:

https://github.com/apache/spark/pull/22668

[SPARK-25675] [Spark Job History] Job UI page does not show pagination with 
one page

## What changes were proposed in this pull request?
Currently in PagedTable.scala pageNavigation() method, if it is having only 
one page, they were not using the pagination.
Now it made to use the pagination, even if it is having one page.

## How was this patch tested?
This tested with Spark webUI and History page in spark local setup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shivusondur/spark pagination

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22668


commit 08328c9fd41787b0bd6c81d077f6d917d57690a2
Author: shivusondur 
Date:   2018-10-08T04:32:28Z

[SPARK-25675] [Spark Job History] Job UI page does not show pagination with 
one page
Removed the check for single page and made to show pagination for even 
single page




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [WIP][SPARK-25666][PYTHON] Internally document type conv...

2018-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22655
  
Let me make this table for Pandas UDF too and then open another JIRA (or 
mailing list) to discuss about this further. I need more investigations to 
propose the desired behaviour targeting 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22659
  
In Jenkins CI, testing time of logisticRegressionSuite without the PR is 5 
min 10 sec and with the PR, 4 min 21 sec


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...

2018-10-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22375#discussion_r223245508
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -113,7 +113,7 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(actual.length == 1)
 val expected = UTF8String.fromString("abc")
 
-if (!checkResult(actual.head, expected, expressions.head.dataType)) {
+if (!checkResult(actual.head, expected, expressions.head.dataType, 
expressions.head.nullable)) {
--- End diff --

maybe we should provide an overload of `checkResult` that takes 
`Expression`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97094/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [WIP][SPARK-25666][PYTHON] Internally document type conv...

2018-10-07 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22655
  
it's useful to have this table, thanks!

Shall we discuss the expected behavior here or in another JIRA ticket?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22659
  
**[Test build #97094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97094/testReport)**
 for PR 22659 at commit 
[`c28fd05`](https://github.com/apache/spark/commit/c28fd05f259a681a74ab34d2be1818c205bf29a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

2018-10-07 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22643
  
@dongjoon-hyun please take another look, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97093/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22659
  
**[Test build #97093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97093/testReport)**
 for PR 22659 at commit 
[`3d9673e`](https://github.com/apache/spark/commit/3d9673e4014872b3b0583b86e134bcbdd27f6e39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22594
  
**[Test build #97096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97096/testReport)**
 for PR 22594 at commit 
[`8134249`](https://github.com/apache/spark/commit/8134249d7c6214475acc87a8b0f5a7c99bd21d45).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3782/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22659
  
Before the changes:
Running time of logistic regression suite: **4min 35 sec**
After the changes:
Running time of logistic regression suite: **3min 22 sec**

cc @srowen @HyukjinKwon . Kindly review



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22665: add [openjdk11] to Travis build matrix

2018-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22665
  
@sullis, this file will be obsolete per 
https://github.com/apache/spark/pull/22667. This shall be closed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22660: [SPARK-25624][TEST] Reduce test time of LogisticR...

2018-10-07 Thread shahidki31
Github user shahidki31 closed the pull request at:

https://github.com/apache/spark/pull/22660


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22659
  
In the test "binary logistic regression with intercept with ElasticNet 
regularization", taking around 30sec to run. But we can reduce the time to 15 
sec by reducing the iteration.


![image](https://user-images.githubusercontent.com/23054875/46590813-0a54b080-cad4-11e8-8d27-9b049fc4537c.png)
model1 converges after 100 iteration,

![image](https://user-images.githubusercontent.com/23054875/46590826-19d3f980-cad4-11e8-9c81-4c42ac5559b8.png)
model2 converges after 20 iterations. 
So, if we make maxIter of model1 and model2 as 120 and 30 respectively, we 
can reduce the time to ~15 sec.

In the test "multinomial logistic regression without intercept with 
elasticnet regularization", taking around 30 sec to run. This also can be 
reduced to 15 sec by reducing number of iteration.

![image](https://user-images.githubusercontent.com/23054875/46590808-032da280-cad4-11e8-8b8f-9e70632d.png)
model1 converges after 50 iteration.

![image](https://user-images.githubusercontent.com/23054875/46590819-10e32800-cad4-11e8-9ded-b29e68dfd0ff.png)
model2 converges after 30 iteration.
So, if we make maxIter of model1 and model2 as 75 and 50 respectively, we 
can reduce the computation time less than 15sec




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3781/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22594
  
**[Test build #97095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97095/testReport)**
 for PR 22594 at commit 
[`c332716`](https://github.com/apache/spark/commit/c332716f87b52b2bc3f1cd64e2cde945ac44d142).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22659
  
**[Test build #97094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97094/testReport)**
 for PR 22659 at commit 
[`c28fd05`](https://github.com/apache/spark/commit/c28fd05f259a681a74ab34d2be1818c205bf29a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22659
  
In the test, "multinomial logistic regression with intercept with 
elasticnet regularization" in the "LogisticRegressionSuite", taking around 1 
minute to train 2 logistic regression model.
However after analyzing the training cost over iteration, we can reduce the 
computation time by 50%.
Training cost vs iteration for model 1


![image](https://user-images.githubusercontent.com/23054875/46590546-c496e880-cad1-11e8-8539-5bc9853c33ca.png)


So, model1 is converging after iteration 200.

Training cost vs iteration for model 2:
image

![image](https://user-images.githubusercontent.com/23054875/46590551-ca8cc980-cad1-11e8-8e83-24ad220e1618.png)

After around 50 iteration, model2 is converging.
So, if we give maximum iteration for model1 and model2 as 220 and 90 
respectively, we can reduce the computation time by half.

Computation time in local setup :
Before change:
~54 sec
After change:
~35 sec


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22659
  
**[Test build #97093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97093/testReport)**
 for PR 22659 at commit 
[`3d9673e`](https://github.com/apache/spark/commit/3d9673e4014872b3b0583b86e134bcbdd27f6e39).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22667: [SPARK-25673][BUILD] Remove Travis CI which enables Java...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22667
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22667: [SPARK-25673][BUILD] Remove Travis CI which enables Java...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22667
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3780/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22660: [SPARK-25624][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22660
  
Thanks for the suggestion. I will close this and amend in the other PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22667: [SPARK-25673][BUILD] Remove Travis CI which enables Java...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22667
  
**[Test build #97092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97092/testReport)**
 for PR 22667 at commit 
[`7a535db`](https://github.com/apache/spark/commit/7a535db83ba4b47c489166fb2b149fbb32b0aba4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22667: [SPARK-25673][BUILD] Remove Travis CI which enabl...

2018-10-07 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22667

[SPARK-25673][BUILD] Remove Travis CI which enables Java lint check

## What changes were proposed in this pull request?

https://github.com/apache/spark/pull/12980 added Travis CI file mainly for 
linter because we disabled Java lint check in Jenkins.

It's enabled as of https://github.com/apache/spark/pull/21399 and now SBT 
runs it. Looks we can now remove the file added before.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25673

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22667


commit 7a535db83ba4b47c489166fb2b149fbb32b0aba4
Author: hyukjinkwon 
Date:   2018-10-08T02:00:51Z

Remove Travis CI which enables Java lint check




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22667: [SPARK-25673][BUILD] Remove Travis CI which enables Java...

2018-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22667
  
cc @srowen and @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97091/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97091 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97091/testReport)**
 for PR 22666 at commit 
[`5fb17fb`](https://github.com/apache/spark/commit/5fb17fbefd52198bcf735abc132b0ab9174cbe0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97090/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22661
  
**[Test build #97090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97090/testReport)**
 for PR 22661 at commit 
[`4859a9f`](https://github.com/apache/spark/commit/4859a9f5e78edf81c211c304a57e2603e60b2cc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22663
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97089/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22663
  
**[Test build #97089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97089/testReport)**
 for PR 22663 at commit 
[`e2ca55e`](https://github.com/apache/spark/commit/e2ca55e81e4e395bb16711db63eb23f07ab9ec9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22660: [SPARK-25624][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22660
  
This kind of thing looks OK, but please make one PR. There's no point in 
opening lots of JIRAs and PRs for the same change in N places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22660: [SPARK-25624][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22660
  
cc @srowen Kindly review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22631: [SPARK-25605][TESTS] Run cast string to timestamp...

2018-10-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22631#discussion_r223228194
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("cast string to timestamp") {
-for (tz <- ALL_TIMEZONES) {
+for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
--- End diff --

Surely not by design? tests need to be deterministic, or else what's the 
value? failures can't be reproduced. (I know that in practice many things are 
hard to make deterministic.)

Of course, if you're worried that we might not be testing an important 
case, we have to test it. We can't just not test it sometimes to make some 
tests run a little faster.

Testing just 3 timezones might be fine too; I don't know. Testing 50 
randomly seems suboptimal in all cases. 

I'll open a PR to try simply testing in parallel instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22631: [SPARK-25605][TESTS] Run cast string to timestamp...

2018-10-07 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22631#discussion_r223226770
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("cast string to timestamp") {
-for (tz <- ALL_TIMEZONES) {
+for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
--- End diff --

Yes, there are many tests where data is randomly generated. And they are 
not seeded of course.

As I said, I think the goal here is to test that the function works well 
with different timezones: then picking a subset of timezones would be fine too, 
but I prefer taking them randomly among all because if there is a single 
timezone creating issues (very unlikely IMHO), we would discover it anyway (not 
on the single run though).

Moreover, it would be great then to be consistent among all the codebase on 
what we test. In `DateExpressionsSuite` we test only 3 timezones and here we 
test all 650: it is a weird, isn't it? We should probably define which is the 
right thing to do when timezones are involved and test always the same. 
Otherwise, testing 650 timezones on a single specific function and 3 on the 
most of the others is a nonsense IMHO.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22145: [SPARK-25152][K8S] Enable SparkR Integration Tests for K...

2018-10-07 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22145
  
yes, hopefully soon.  i won't be able to start on this for at least another
week due to our lab having a big event this coming week.

On Sat, Oct 6, 2018 at 5:55 PM Felix Cheung 
wrote:

> @shaneknapp  could we do this soon?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3779/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97091/testReport)**
 for PR 22666 at commit 
[`5fb17fb`](https://github.com/apache/spark/commit/5fb17fbefd52198bcf735abc132b0ab9174cbe0f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-07 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/22666

[SPARK-25672][SQL] schema_of_csv() - schema inference from an example

## What changes were proposed in this pull request?

In the PR, I propose to add new function - *schema_of_csv()* which infers 
schema of CSV string literal. The result of the function is a string containing 
a schema in DDL format. For example:

```sql
select schema_of_csv('1|abc', map('delimiter', '|'))
``` 
```
struct<_c0:int,_c1:string>
```

## How was this patch tested?

Added new tests to `CsvFunctionsSuite`, `CsvExpressionsSuite` and SQL tests 
to `csv-functions.sql`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 schema_of_csv-function

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22666


commit 4c00900e8bfbe56d13576d6dc21fb2f2dbbb105d
Author: Maxim Gekk 
Date:   2018-10-07T14:17:05Z

Dependency of uniVocity 2.7.3 is added for sql/catalyst

commit 25f330a617e41c1207efd880be766136ce9b0bca
Author: Maxim Gekk 
Date:   2018-10-07T14:37:50Z

Moving CSVOptions to sql/catalyst

commit 0d7e7990799a307794f10fe52030eca850762927
Author: Maxim Gekk 
Date:   2018-10-07T17:42:02Z

Moving CSVInferSchema to sql/catalyst

commit 7abbfcae8444e88391e1d456a9a249fa5fccf6f0
Author: Maxim Gekk 
Date:   2018-09-16T19:12:58Z

Added an expression test

commit 6ca4fa3e2bf6b29b82f1ece33c5a75beaf934d87
Author: Maxim Gekk 
Date:   2018-09-21T15:03:39Z

Support options

commit e76536bfc62911c4e2039d4fc63d771b1c3b5fe1
Author: Maxim Gekk 
Date:   2018-09-21T16:05:55Z

Register schema_of_csv and adding SQL tests

commit ef03d3a38e3a7a31a04cda901821238b01ec8f37
Author: Maxim Gekk 
Date:   2018-09-21T17:27:33Z

Adding schema_of_csv and tests

commit 8ed225f3d2c5fbe3df75f8518d539fcdd5f01a2e
Author: Maxim Gekk 
Date:   2018-09-21T17:54:43Z

Support schema_of_csv in PySpark

commit 5fb17fbefd52198bcf735abc132b0ab9174cbe0f
Author: Maxim Gekk 
Date:   2018-10-07T18:49:00Z

2.5 -> 3.0




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22661
  
**[Test build #97090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97090/testReport)**
 for PR 22661 at commit 
[`4859a9f`](https://github.com/apache/spark/commit/4859a9f5e78edf81c211c304a57e2603e60b2cc7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22631: [SPARK-25605][TESTS] Run cast string to timestamp...

2018-10-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22631#discussion_r223224573
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("cast string to timestamp") {
-for (tz <- ALL_TIMEZONES) {
+for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
--- End diff --

Tests should be deterministic, ideally; any sources of randomness should be 
seeded. Do you see one that isn't? 

I think this is like deciding we'll run just 90% of all test suites every 
time randomly, to save time. I think it's just well against good practice.

There are other solutions:
1) pick a subset of timezones that we're confident do exercise the code and 
just explicitly test those
2) parallelize these tests within the test suite

The latter should be trivial in this case: `ALL_TIMEZONES.par.foreach { tz 
=>` instead. It's the same amount of work but 8x, 16x faster by wall clock 
time, depending on how many cores are available. What about that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22641: [SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run ...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22641
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22641: [SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run ...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22641
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97088/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22641: [SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run ...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22641
  
**[Test build #97088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97088/testReport)**
 for PR 22641 at commit 
[`3dbc5dc`](https://github.com/apache/spark/commit/3dbc5dceb9c2e3f727f7044de6463a809a8a9518).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22663
  
**[Test build #97089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97089/testReport)**
 for PR 22663 at commit 
[`e2ca55e`](https://github.com/apache/spark/commit/e2ca55e81e4e395bb16711db63eb23f07ab9ec9f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22663
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3778/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97085/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22663
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22663
  
**[Test build #97085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97085/testReport)**
 for PR 22663 at commit 
[`529bba7`](https://github.com/apache/spark/commit/529bba722b72204c465c71f773a222fdaf8aee39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22665: add [openjdk11] to Travis build matrix

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22665
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22665: add [openjdk11] to Travis build matrix

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22665
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22665: add [openjdk11] to Travis build matrix

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22665
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchm...

2018-10-07 Thread peter-toth
Github user peter-toth commented on a diff in the pull request:

https://github.com/apache/spark/pull/22664#discussion_r22366
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 ---
@@ -34,10 +34,15 @@ import org.apache.spark.sql.vectorized.ColumnVector
 
 /**
  * Benchmark to measure data source read performance.
- * To run this:
- *  spark-submit --class  
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
--- End diff --

```
bin/spark-submit --class 
org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark --jars 
core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
 sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
```
does work for me, but I checked in `FilterPushdownBenchmark` and it seems 
we don't mention other required jars.
Shall I modify the command?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22665: add [openjdk11] to Travis build matrix

2018-10-07 Thread sullis
GitHub user sullis opened a pull request:

https://github.com/apache/spark/pull/22665

add [openjdk11] to Travis build matrix

## What changes were proposed in this pull request?

add [openjdk11] to Travis build matrix

## How was this patch tested?

Travis build.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sullis/spark travis-openjdk11

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22665


commit 104309d810bb354b3a05e37fcc3221bf3d808bf2
Author: Sean Sullivan 
Date:   2018-10-07T17:55:47Z

add [openjdk11] to Travis build matrix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22623: [SPARK-25636][CORE] spark-submit cuts off the fai...

2018-10-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22623#discussion_r223221911
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -74,20 +74,26 @@ trait TestPrematureExit {
 @volatile var exitedCleanly = false
 mainObject.exitFn = (_) => exitedCleanly = true
 
+var message: String = null
 val thread = new Thread {
   override def run() = try {
 mainObject.main(input)
   } catch {
 // If exceptions occur after the "exit" has happened, fine to 
ignore them.
 // These represent code paths not reachable during normal 
execution.
-case e: Exception => if (!exitedCleanly) throw e
+case e: Exception =>
+  message = e.getMessage
+  if (!(exitedCleanly || message.contains(searchString))) {
--- End diff --

If it didn't exit cleanly, can it be possible that the exception is the 
correct expected one, and that its message contains the search string? I'm 
probably missing the reason why this has to be checked here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

2018-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97087/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >