[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219540966
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source 
Guide](avro-data-source-guide.html).
 
  * The JDBC driver class must be visible to the primordial class loader on 
the client session and on all executors. This is because Java's DriverManager 
class does a security check that results in it ignoring all drivers not visible 
to the primordial class loader when one goes to open a connection. One 
convenient way to do this is to modify compute_classpath.sh on all worker nodes 
to include your driver JARs.
  * Some databases, such as H2, convert all names to upper case. You'll 
need to use upper case to refer to those names in Spark SQL.
-
+ * Users can specify vendor-specific JDBC connection properties in the 
data source options to do special treatment. For example, 
`spark.read.format("jdbc").option("url", 
oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. 
`oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable 
this flag to avoid Oracle date being resolved as timestamp.
--- End diff --

Ur.. I see. This section is troubleshooting, so I think you'd be better to 
illustrate a concrete trouble case, then you describe a solution for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22519: [SPARK-25505][SQL] The output order of grouping c...

2018-09-21 Thread maryannxue
GitHub user maryannxue opened a pull request:

https://github.com/apache/spark/pull/22519

[SPARK-25505][SQL] The output order of grouping columns in Pivot is 
different from the input order

## What changes were proposed in this pull request?

The grouping columns from a Pivot query are inferred as "input columns - 
pivot columns - pivot aggregate columns", where input columns are the output of 
the child relation of Pivot. The grouping columns will be the leading columns 
in the pivot output and they should preserve the same order as specified by the 
input. For example,
```
SELECT * FROM (
  SELECT course, earnings, "a" as a, "z" as z, "b" as b, "y" as y, "c" as 
c, "x" as x, "d" as d, "w" as w
  FROM courseSales
)
PIVOT (
  sum(earnings)
  FOR course IN ('dotNET', 'Java')
)
```
The output columns should be "a, z, b, y, c, x, d, w, ..." but now it is 
"a, b, c, d, w, x, y, z, ..."

The fix is to use the child plan's `output` instead of `outputSet` so that 
the order can be preserved.

## How was this patch tested?

Added UT.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/spark spark-25505

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22519


commit bd416bd74ee77329b2527fffecd21f7f90090334
Author: maryannxue 
Date:   2018-09-21T14:33:16Z

[SPARK-25505][SQL] The output order of grouping columns in Pivot is 
different from the input order




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to u...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22490
  
**[Test build #96435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96435/testReport)**
 for PR 22490 at commit 
[`02ecf3f`](https://github.com/apache/spark/commit/02ecf3f6ba737332464021a4bbf7320b4d71bd70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219539062
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
 the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
 user and password are normally provided as 
connection properties for
-logging into the data sources. In addition to the connection properties, 
Spark also supports
-the following case-insensitive options:
+logging into the data sources. Vendor-specific connection properties can 
also be passed to the
+underlying JDBC driver in the same way. For example:
+
+{% highlight scala %}
+// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not 
disabled, a column of Oracle
+// DATE type will be resolved as Catalyst TimestampType, which is probably 
not the desired behavior.
+spark.read.format("jdbc")
+  .option("url", oracleJdbcUrl)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+{% endhighlight %}
+
--- End diff --

I have moved this description to `Troubleshooting` section. I also tried to 
brush up the description. Writing good documentation is sometimes difficult 
than writing code. really need your help :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22461
  
**[Test build #96434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96434/testReport)**
 for PR 22461 at commit 
[`f6274a5`](https://github.com/apache/spark/commit/f6274a50177e18be7b36d87913c44103f2fa02d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219537942
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +468,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is 
not disabled, column d
+  // (Oracle DATE) will be resolved as Catalyst Timestamp, which will 
fail test result
+  // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 
00:00:00.0.
--- End diff --

OK, I will remove duplicate description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-09-21 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22138
  
@gaborgsomogyi Totally makes sense. Let me address while the patch is 
reviewed by committers. I may get recommendations to rename the config or even 
more, so addressing documentation would be the last part.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22263
  
**[Test build #96433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96433/testReport)**
 for PR 22263 at commit 
[`c3b6dfc`](https://github.com/apache/spark/commit/c3b6dfcee106c4d0e38975b1a31c3f3e97d2abc1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3350/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22288
  
**[Test build #96432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96432/testReport)**
 for PR 22288 at commit 
[`ffbc9c3`](https://github.com/apache/spark/commit/ffbc9c32d14a0c82036defb90eb18167f93bad4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3349/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use main m...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22495
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use main m...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22495
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96419/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use main m...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22495
  
**[Test build #96419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96419/testReport)**
 for PR 22495 at commit 
[`bbdc202`](https://github.com/apache/spark/commit/bbdc2029c992803e75d04be9e5b2507d6df1779f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22198
  
**[Test build #96431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96431/testReport)**
 for PR 22198 at commit 
[`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3348/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-09-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20433
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...

2018-09-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22204
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-09-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22198
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/7
  
LGTM except for the R/python parts (I'm not familiar with these parts and 
I'll leave them to @felixcheung and @HyukjinKwon).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96420/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22316
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22316
  
**[Test build #96420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96420/testReport)**
 for PR 22316 at commit 
[`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219522094
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
 the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
 user and password are normally provided as 
connection properties for
-logging into the data sources. In addition to the connection properties, 
Spark also supports
-the following case-insensitive options:
+logging into the data sources. Vendor-specific connection properties can 
also be passed to the
+underlying JDBC driver in the same way. For example:
+
+{% highlight scala %}
+// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not 
disabled, a column of Oracle
+// DATE type will be resolved as Catalyst TimestampType, which is probably 
not the desired behavior.
+spark.read.format("jdbc")
+  .option("url", oracleJdbcUrl)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+{% endhighlight %}
+
--- End diff --

Also, you'd be better to brush up the description more to make users 
understood easily...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22331: [SPARK-25331][SS] Make FileStreamSink ignore part...

2018-09-21 Thread misutoth
Github user misutoth commented on a diff in the pull request:

https://github.com/apache/spark/pull/22331#discussion_r219521421
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StagingFileCommitProtocol.scala
 ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import org.apache.hadoop.fs.{FileAlreadyExistsException, FileContext, Path}
+import org.apache.hadoop.mapreduce.{JobContext, TaskAttemptContext}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage
+
+class StagingFileCommitProtocol(jobId: String, path: String)
+  extends FileCommitProtocol with Serializable with Logging
+  with ManifestCommitProtocol {
+  private var stagingDir: Option[Path] = None
--- End diff --

Yes, I will do that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219520789
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
 the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
 user and password are normally provided as 
connection properties for
-logging into the data sources. In addition to the connection properties, 
Spark also supports
-the following case-insensitive options:
+logging into the data sources. Vendor-specific connection properties can 
also be passed to the
+underlying JDBC driver in the same way. For example:
+
+{% highlight scala %}
+// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not 
disabled, a column of Oracle
+// DATE type will be resolved as Catalyst TimestampType, which is probably 
not the desired behavior.
+spark.read.format("jdbc")
+  .option("url", oracleJdbcUrl)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+{% endhighlight %}
+
--- End diff --

Probably, I thinks its better to put this description in the 
`Troubleshooting` section: 
https://spark.apache.org/docs/2.3.1/sql-programming-guide.html#troubleshooting


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219518705
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +468,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is 
not disabled, column d
+  // (Oracle DATE) will be resolved as Catalyst Timestamp, which will 
fail test result
+  // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 
00:00:00.0.
--- End diff --

I thinks we don't need this duplicate description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219518607
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +468,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is 
not disabled, column d
+  // (Oracle DATE) will be resolved as Catalyst Timestamp, which will 
fail test result
+  // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 
00:00:00.0.
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+  .option("sessionInitStatement",
+"ALTER SESSION SET NLS_TIMESTAMP_FORMAT = '-MM-DD 
HH24:MI:SS.FF'")
--- End diff --

I thinks we don't need this duplicate description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22498: [SPARK-18364] : Expose metrics for YarnShuffleService

2018-09-21 Thread pgandhi999
Github user pgandhi999 commented on the issue:

https://github.com/apache/spark/pull/22498
  
@tgravescs Fine with me, thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19868
  
@jinxing64 Let's not talk too much about the problem of 
`spark.sql.hive.verifyPartitionPath`. We should just introduce 
`spark.sql.files.ignoreMissingFiles` and say this can replace 
`spark.sql.hive.verifyPartitionPath`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...

2018-09-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22513


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22511
  
**[Test build #4346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4346/testReport)**
 for PR 22511 at commit 
[`aee82ab`](https://github.com/apache/spark/commit/aee82abe4cd9fbefa14fb280644276fe491bcf9a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22513
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/22511
  
> this seems like a big change, will we hit perf regression?

Not vs. 2.3.  It only effects things when stream-to-disk is enabled, and 
when it is enabled, for reading remote cached blocks, this is actually going 
back to the old behavior.  See 
https://github.com/apache/spark/pull/19476/files#diff-2b643ea78c1add0381754b1f47eec132R692
 -- `FileSegmentManagedBuffer.nioByteBuffer()` reads the entire file into a 
regular byte buffer.  I changed it to memory map the file as an attempted 
optimization.

This change will make things slower, but that's vs. other changes that are 
only in 2.4.  There are TODOs in the code for ways to improve this further but 
that should not go into 2.4.

> is this a long-standing bug?

the change here is not fixing a long-standing bug, its just updating new 
changes for 2.4.

however, I'm really wondering why TorrentBroadcast calls dispose on the 
blocks.  For regular buffers, its a no-op, so it hasn't mattered, but I can't 
come up with a reason that you *do* want to dispose those blocks.

Secondly, it seems there is an implicit assumption that you never add 
memory-mapped byte buffers to the MemoryStore.  Maybe that makes sense ... its 
kind of messing with the Memory / Disk management spark has.  But the 
MemoryStore never checks that you don't add a mapped buffer, you'll just get 
weird behavior like this later on.  Seems there should be a check at the very 
least, to avoid this kind of issue.

As neither of those things are new to 2.4, I don't think we should deal w/ 
them here.

The major motivation for memory mapping the file was not for broadcast 
blocks, it was for reading large cached blocks.  But it actually makes more 
sense to change the interfaces in BlockManager to allow us to just get the 
managedBuffer, instead of a ChunkedByteBuffer (thats this TODO: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L728)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...

2018-09-21 Thread redsanket
Github user redsanket commented on the issue:

https://github.com/apache/spark/pull/22173
  
closes #21402


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24355] Spark external shuffle server impro...

2018-09-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22173


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...

2018-09-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22173
  
merged into master (2.5.0)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-21 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19868
  
@cloud-fan 
Thanks for ping~ I updated the description. Let me know if I should refine 
it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22485: [SPARK-18364][YARN] Expose metrics for YarnShuffl...

2018-09-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22485#discussion_r219508537
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.network.yarn
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.metrics2.MetricsRecordBuilder
+import org.mockito.Matchers._
+import org.mockito.Mockito.{mock, times, verify, when}
+import org.scalatest.Matchers
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.network.server.OneForOneStreamManager
+import org.apache.spark.network.shuffle.{ExternalShuffleBlockHandler, 
ExternalShuffleBlockResolver}
+
+class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
+
+  val streamManager = mock(classOf[OneForOneStreamManager])
+  val blockResolver = mock(classOf[ExternalShuffleBlockResolver])
+  when(blockResolver.getRegisteredExecutorsSize).thenReturn(42)
+
+  val metrics = new ExternalShuffleBlockHandler(streamManager, 
blockResolver).getAllMetrics
+
+  test("metrics named as expected") {
+val allMetrics = Set(
+  "openBlockRequestLatencyMillis", 
"registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes", "registeredExecutorsSize")
+
+metrics.getMetrics.keySet().asScala should be (allMetrics)
+  }
+
+  // these three metrics have the same effect on the collector
+  for (testname <- Seq("openBlockRequestLatencyMillis",
+  "registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes")) {
+test(s"$testname - collector receives correct types") {
+  val builder = mock(classOf[MetricsRecordBuilder])
+  when(builder.addCounter(any(), anyLong())).thenReturn(builder)
+  when(builder.addGauge(any(), anyDouble())).thenReturn(builder)
+
+  YarnShuffleServiceMetrics.collectMetric(builder, testname,
+metrics.getMetrics.get(testname))
+
+  verify(builder).addCounter(anyObject(), anyLong())
+  verify(builder, times(4)).addGauge(anyObject(), anyDouble())
+}
+  }
+
+  // this metric writes only one gauge to the collector
+  test("registeredExecutorsSize - collector receives correct types") {
+val builder = mock(classOf[MetricsRecordBuilder])
+when(builder.addCounter(any(), anyLong())).thenReturn(builder)
+when(builder.addGauge(any(), anyDouble())).thenReturn(builder)
+
+YarnShuffleServiceMetrics.collectMetric(builder, 
"registeredExecutorsSize",
--- End diff --

you are calling a private function here




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22485: [SPARK-18364][YARN] Expose metrics for YarnShuffl...

2018-09-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22485#discussion_r219508486
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.network.yarn
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.metrics2.MetricsRecordBuilder
+import org.mockito.Matchers._
+import org.mockito.Mockito.{mock, times, verify, when}
+import org.scalatest.Matchers
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.network.server.OneForOneStreamManager
+import org.apache.spark.network.shuffle.{ExternalShuffleBlockHandler, 
ExternalShuffleBlockResolver}
+
+class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
+
+  val streamManager = mock(classOf[OneForOneStreamManager])
+  val blockResolver = mock(classOf[ExternalShuffleBlockResolver])
+  when(blockResolver.getRegisteredExecutorsSize).thenReturn(42)
+
+  val metrics = new ExternalShuffleBlockHandler(streamManager, 
blockResolver).getAllMetrics
+
+  test("metrics named as expected") {
+val allMetrics = Set(
+  "openBlockRequestLatencyMillis", 
"registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes", "registeredExecutorsSize")
+
+metrics.getMetrics.keySet().asScala should be (allMetrics)
+  }
+
+  // these three metrics have the same effect on the collector
+  for (testname <- Seq("openBlockRequestLatencyMillis",
+  "registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes")) {
+test(s"$testname - collector receives correct types") {
+  val builder = mock(classOf[MetricsRecordBuilder])
+  when(builder.addCounter(any(), anyLong())).thenReturn(builder)
+  when(builder.addGauge(any(), anyDouble())).thenReturn(builder)
+
+  YarnShuffleServiceMetrics.collectMetric(builder, testname,
--- End diff --

you are calling a private function here 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22498: [SPARK-18364] : Expose metrics for YarnShuffleService

2018-09-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22498
  
@pgandhi999 thanks for the pr, since you have additional changes and there 
are additional changes that were made by the original author here:  
palantir#236   Perhaps for now we should review the lowest common denominator 
one and get that in and we can put additional metrics on top of it in separate 
prs.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22485: [SPARK-18364][YARN] Expose metrics for YarnShuffl...

2018-09-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22485#discussion_r219506086
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.network.yarn
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.metrics2.MetricsRecordBuilder
+import org.mockito.Matchers._
+import org.mockito.Mockito.{mock, times, verify, when}
+import org.scalatest.Matchers
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.network.server.OneForOneStreamManager
+import org.apache.spark.network.shuffle.{ExternalShuffleBlockHandler, 
ExternalShuffleBlockResolver}
+
+class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
+
+  val streamManager = mock(classOf[OneForOneStreamManager])
+  val blockResolver = mock(classOf[ExternalShuffleBlockResolver])
+  when(blockResolver.getRegisteredExecutorsSize).thenReturn(42)
+
+  val metrics = new ExternalShuffleBlockHandler(streamManager, 
blockResolver).getAllMetrics
+
+  test("metrics named as expected") {
+val allMetrics = Set(
+  "openBlockRequestLatencyMillis", 
"registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes", "registeredExecutorsSize")
+
+metrics.getMetrics.keySet().asScala should be (allMetrics)
+  }
+
+  // these three metrics have the same effect on the collector
+  for (testname <- Seq("openBlockRequestLatencyMillis",
+  "registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes")) {
+test(s"$testname - collector receives correct types") {
+  val builder = mock(classOf[MetricsRecordBuilder])
+  when(builder.addCounter(any(), anyLong())).thenReturn(builder)
+  when(builder.addGauge(any(), anyDouble())).thenReturn(builder)
+
+  YarnShuffleServiceMetrics.collectMetric(builder, testname,
+metrics.getMetrics.get(testname))
+
+  verify(builder).addCounter(anyObject(), anyLong())
+  verify(builder, times(4)).addGauge(anyObject(), anyDouble())
+}
+  }
+
+  // this metric writes only one gauge to the collector
+  test("registeredExecutorsSize - collector receives correct types") {
+val builder = mock(classOf[MetricsRecordBuilder])
+when(builder.addCounter(any(), anyLong())).thenReturn(builder)
--- End diff --

Don't seem to use these?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22494: [SPARK-22036][SQL][followup] add a new config for...

2018-09-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22494#discussion_r219505527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1345,6 +1345,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val LITERAL_PICK_MINIMUM_PRECISION =
+buildConf("spark.sql.literal.pickMinimumPrecision")
--- End diff --

nit: I am thinking whether we can consider this as a `legacy` config. We 
can also remove it once the PR for SPARK-25454 will be merged and/or we remove 
the support to negative scale decimals. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22485: [SPARK-18364][YARN] Expose metrics for YarnShuffl...

2018-09-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22485#discussion_r219506036
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.network.yarn
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.metrics2.MetricsRecordBuilder
+import org.mockito.Matchers._
+import org.mockito.Mockito.{mock, times, verify, when}
+import org.scalatest.Matchers
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.network.server.OneForOneStreamManager
+import org.apache.spark.network.shuffle.{ExternalShuffleBlockHandler, 
ExternalShuffleBlockResolver}
+
+class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
+
+  val streamManager = mock(classOf[OneForOneStreamManager])
+  val blockResolver = mock(classOf[ExternalShuffleBlockResolver])
+  when(blockResolver.getRegisteredExecutorsSize).thenReturn(42)
+
+  val metrics = new ExternalShuffleBlockHandler(streamManager, 
blockResolver).getAllMetrics
+
+  test("metrics named as expected") {
+val allMetrics = Set(
+  "openBlockRequestLatencyMillis", 
"registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes", "registeredExecutorsSize")
+
+metrics.getMetrics.keySet().asScala should be (allMetrics)
+  }
+
+  // these three metrics have the same effect on the collector
+  for (testname <- Seq("openBlockRequestLatencyMillis",
+  "registerExecutorRequestLatencyMillis",
+  "blockTransferRateBytes")) {
+test(s"$testname - collector receives correct types") {
+  val builder = mock(classOf[MetricsRecordBuilder])
+  when(builder.addCounter(any(), anyLong())).thenReturn(builder)
--- End diff --

I don't see where these when's are used? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22485: [SPARK-18364][YARN] Expose metrics for YarnShuffl...

2018-09-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22485#discussion_r219311876
  
--- Diff: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 ---
@@ -168,6 +170,15 @@ protected void serviceInit(Configuration conf) throws 
Exception {
   TransportConf transportConf = new TransportConf("shuffle", new 
HadoopConfigProvider(conf));
   blockHandler = new ExternalShuffleBlockHandler(transportConf, 
registeredExecutorFile);
 
+  // register metrics on the block handler into the Node Manager's 
metrics system.
+  YarnShuffleServiceMetrics serviceMetrics =
+  new YarnShuffleServiceMetrics(blockHandler.getAllMetrics());
--- End diff --

indentation should be 2 spaces, fix here and below


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22513
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96415/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22513
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22513
  
**[Test build #96415 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96415/testReport)**
 for PR 22513 at commit 
[`1a5e1e9`](https://github.com/apache/spark/commit/1a5e1e927072e4438ea1ce7dc579a6d5b0986835).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18544
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18544#discussion_r219505379
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -131,14 +131,14 @@ private[sql] class HiveSessionCatalog(
 Try(super.lookupFunction(funcName, children)) match {
   case Success(expr) => expr
   case Failure(error) =>
-if (functionRegistry.functionExists(funcName)) {
-  // If the function actually exists in functionRegistry, it means 
that there is an
-  // error when we create the Expression using the given children.
+if (super.functionExists(name)) {
+  // If the function actually exists in functionRegistry or 
externalCatalog,
+  // it means that there is an error when we create the Expression 
using the given children.
   // We need to throw the original exception.
   throw error
 } else {
-  // This function is not in functionRegistry, let's try to load 
it as a Hive's
-  // built-in function.
+  // This function is not in functionRegistry or externalCatalog,
--- End diff --

`This function does not exist(neither in functionRegistry or 
externalCatalog) ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22512
  
**[Test build #96430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96430/testReport)**
 for PR 22512 at commit 
[`bff88ee`](https://github.com/apache/spark/commit/bff88ee81f57900cca38df8455c4a2eb78b4b758).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18544#discussion_r219505121
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -131,14 +131,14 @@ private[sql] class HiveSessionCatalog(
 Try(super.lookupFunction(funcName, children)) match {
   case Success(expr) => expr
   case Failure(error) =>
-if (functionRegistry.functionExists(funcName)) {
-  // If the function actually exists in functionRegistry, it means 
that there is an
-  // error when we create the Expression using the given children.
+if (super.functionExists(name)) {
+  // If the function actually exists in functionRegistry or 
externalCatalog,
--- End diff --

`If the function exists (either in functionRegistry or externalCatalog), 
...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22512
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22512
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3347/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-21 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22466
  
See JIRA, I don't think this should be merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22494
  
**[Test build #96429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96429/testReport)**
 for PR 22494 at commit 
[`b4fdd13`](https://github.com/apache/spark/commit/b4fdd1307059c7df7c386a96aad6bc17b593d9c5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22494
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3346/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to u...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22490
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to u...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96418/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to u...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22490
  
**[Test build #96418 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96418/testReport)**
 for PR 22490 at commit 
[`cbfac03`](https://github.com/apache/spark/commit/cbfac03f4e34c32b797f9e26ec9146136a0c14be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10130: SPARK-12105 - SPARK-SQL add convenient show functions

2018-09-21 Thread davis-x
Github user davis-x commented on the issue:

https://github.com/apache/spark/pull/10130
  
+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #96428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96428/testReport)**
 for PR 22518 at commit 
[`7c75067`](https://github.com/apache/spark/commit/7c75067767ed6935960d09c7915da86fea3553fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3345/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22450
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96417/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22450
  
**[Test build #96417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96417/testReport)**
 for PR 22450 at commit 
[`4e240d9`](https://github.com/apache/spark/commit/4e240d9abea9ea67312f31e3af129416b8c3381a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #96427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96427/testReport)**
 for PR 22518 at commit 
[`36fa664`](https://github.com/apache/spark/commit/36fa664c6d251901270984115ff2ebfd1b665fca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3344/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493] [SQL] Fix multiline crlf

2018-09-21 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r219495737
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -212,6 +212,7 @@ class CSVOptions(
 settings.setEmptyValue(emptyValueInRead)
 settings.setMaxCharsPerColumn(maxCharsPerColumn)
 
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+settings.setLineSeparatorDetectionEnabled(true)
--- End diff --

The auto-detection mechanism is enabled for both - multi-line and per-line 
mode. I guess it has some overhead on detection of new lines which is not 
needed in per-line mode. I would benchmark it in both modes (see 
`CSVBenchmarks`), and if the overhead in per-line mode is significant, I would 
not enable the option when `multiLine` is set to `false`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22517: Branch 2.3 how can i fix error use Pyspark

2018-09-21 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22517
  
Do you mind close this PR. questions and help should be sent to 
`u...@spark.apache.org`
```
u...@spark.apache.org is for usage questions, help, and announcements. 
(subscribe) (unsubscribe) (archives)
d...@spark.apache.org is for people who want to contribute code to Spark. 
(subscribe) (unsubscribe) (archives)
```
http://spark.apache.org/community.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

2018-09-21 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/22518

[SPARK-25482][SQL] ReuseSubquery can be useless when the subqueries have 
the same id

## What changes were proposed in this pull request?

When a `ExecSubqueryExpression` is copied, it may happen that there are two 
instances with the same `exprId`. This can happen for instance when a filter 
containing a scalar subquery is pushed to a DataSource. In this scenario, 
`ReuseSubquery` becomes useless, as replacing the `SubqueryExec` is ignored 
since the new plan is equal to the previous one.

The PR avoids this problem by creating a new `ExprId` for the subquery 
expression when it is changed in `ReuseSubquery`, so that the changes are not 
ignored.

## How was this patch tested?

added UT

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-25482

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22518


commit 36fa664c6d251901270984115ff2ebfd1b665fca
Author: Marco Gaido 
Date:   2018-09-21T12:50:11Z

[SPARK-25482][SQL] ReuseSubquery can be effectless when the subqueries have 
the same id




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22517: Branch 2.3 how can i fix error use Pyspark

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22517
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22517: Branch 2.3 how can i fix error use Pyspark

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22517
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22517: Branch 2.3 how can i fix error use Pyspark

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22517
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22494: [SPARK-22036][SQL][followup] add a new config for...

2018-09-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22494#discussion_r219493420
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2858,6 +2858,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val result = ds.flatMap(_.bar).distinct
 result.rdd.isEmpty
   }
+
+  test("SPARK-25454: decimal division with negative scale") {
+// TODO: completely fix this issue even LITERAL_PRECISE_PRECISION is 
true.
--- End diff --

nit: `even when`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22494: [SPARK-22036][SQL][followup] add a new config for...

2018-09-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22494#discussion_r219492191
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1345,6 +1345,16 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val LITERAL_PRECISE_PRECISION =
+buildConf("spark.sql.literal.precisePrecision")
+  .internal()
+  .doc("When integral literals are used with decimals in binary 
operators, Spark will " +
+"pick a precise precision for the literals to calculate the 
precision and scale " +
+"of the result decimal, when this config is true. By picking a 
precise precision, we " +
+"can avoid wasting precision, to reduce the possibility of 
overflow.")
--- End diff --

`to reduce the possibility of overflow` actually this is not true and 
depends on the value of `DECIMAL_OPERATIONS_ALLOW_PREC_LOSS`, If 
`DECIMAL_OPERATIONS_ALLOW_PREC_LOSS` is true, the risk is to have a precision 
loss, but we don't overflow. If that is false, then this statement is right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22494: [SPARK-22036][SQL][followup] add a new config for...

2018-09-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22494#discussion_r219492885
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1345,6 +1345,16 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val LITERAL_PRECISE_PRECISION =
--- End diff --

mmh... PRECISE_PRECISION sounds weird... can we look for a better one? I 
don't have a suggestion right now, but I'll think about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22494: [SPARK-22036][SQL][followup] add a new config for...

2018-09-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22494#discussion_r219493091
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1345,6 +1345,16 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val LITERAL_PRECISE_PRECISION =
+buildConf("spark.sql.literal.precisePrecision")
+  .internal()
+  .doc("When integral literals are used with decimals in binary 
operators, Spark will " +
+"pick a precise precision for the literals to calculate the 
precision and scale " +
--- End diff --

`a precise precision` -> `the minimal precision required to represent the 
given value`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22517: Branch 2.3 how can i fix error use Pyspark

2018-09-21 Thread lovezeropython
GitHub user lovezeropython opened a pull request:

https://github.com/apache/spark/pull/22517

Branch 2.3 how can i fix error use Pyspark


how can i fix error use Pyspark 


![image](https://user-images.githubusercontent.com/35518020/45883415-9cc23800-bde3-11e8-90b1-2bed95608c54.png)

use spark 2.3
python 3.6


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22517


commit a9d0784e6733666a0608e8236322f1dc380e96b7
Author: smallory 
Date:   2018-03-15T02:58:54Z

[SPARK-23642][DOCS] AccumulatorV2 subclass isZero scaladoc fix

Added/corrected scaladoc for isZero on the DoubleAccumulator, 
CollectionAccumulator, and LongAccumulator subclasses of AccumulatorV2, 
particularly noting where there are requirements in addition to having a value 
of zero in order to return true.

## What changes were proposed in this pull request?

Three scaladoc comments are updated in AccumulatorV2.scala
No changes outside of comment blocks were made.

## How was this patch tested?

Running "sbt unidoc", fixing style errors found, and reviewing the 
resulting local scaladoc in firefox.

Author: smallory 

Closes #20790 from smallory/patch-1.

(cherry picked from commit 4f5bad615b47d743b8932aea1071652293981604)
Signed-off-by: hyukjinkwon 

commit 72c13ed844d6be6510ce2c5e3526c234d1d5e10f
Author: hyukjinkwon 
Date:   2018-03-15T17:55:33Z

[SPARK-23695][PYTHON] Fix the error message for Kinesis streaming tests

## What changes were proposed in this pull request?

This PR proposes to fix the error message for Kinesis in PySpark when its 
jar is missing but explicitly enabled.

```bash
ENABLE_KINESIS_TESTS=1 SPARK_TESTING=1 bin/pyspark pyspark.streaming.tests
```

Before:

```
Skipped test_flume_stream (enable by setting environment variable 
ENABLE_FLUME_TESTS=1Skipped test_kafka_stream (enable by setting environment 
variable ENABLE_KAFKA_0_8_TESTS=1Traceback (most recent call last):
  File 
"/usr/local/Cellar/python/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
 line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File 
"/usr/local/Cellar/python/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
 line 72, in _run_code
exec code in run_globals
  File "/.../spark/python/pyspark/streaming/tests.py", line 1572, in 

% kinesis_asl_assembly_dir) +
NameError: name 'kinesis_asl_assembly_dir' is not defined
```

After:

```
Skipped test_flume_stream (enable by setting environment variable 
ENABLE_FLUME_TESTS=1Skipped test_kafka_stream (enable by setting environment 
variable ENABLE_KAFKA_0_8_TESTS=1Traceback (most recent call last):
  File 
"/usr/local/Cellar/python/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
 line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File 
"/usr/local/Cellar/python/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
 line 72, in _run_code
exec code in run_globals
  File "/.../spark/python/pyspark/streaming/tests.py", line 1576, in 

"You need to build Spark with 'build/sbt -Pkinesis-asl "
Exception: Failed to find Spark Streaming Kinesis assembly jar in 
/.../spark/external/kinesis-asl-assembly. You need to build Spark with 
'build/sbt -Pkinesis-asl assembly/package 
streaming-kinesis-asl-assembly/assembly'or 'build/mvn -Pkinesis-asl package' 
before running this test.
```

## How was this patch tested?

Manually tested.

Author: hyukjinkwon 

Closes #20834 from HyukjinKwon/minor-variable.

(cherry picked from commit 56e8f48a43eb51e8582db2461a585b13a771a00a)
Signed-off-by: Takuya UESHIN 

commit 2e1e274ed9d7a30656555e71c68e7de34a336a8a
Author: Sahil Takiar 
Date:   2018-03-16T00:04:39Z

[SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger

## What changes were proposed in this pull request?

Changed `Logger` in `InProcessAppHandle` to use `InProcessAppHandle` 
instead of `ChildProcAppHandle`

Author: Sahil Takiar 

Closes #20815 from sahilTakiar/master.

(cherry picked from commit 7618896e855579f111dd92cd76794a5672a087e5)
Signed-off-by: Marcelo Vanzin 

commit 52a52d5d26fc1650e788eec62ce478c76f627470
Author: Marcelo Vanzin 
Date:   2018-03-16T00:12:01Z

[SPARK-23671][CORE] Fix condition to enable the SHS thread pool.

Author: Marce

[GitHub] spark issue #22516: [SPARK-25468]Highlight current page index in the history...

2018-09-21 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22516
  
Yes. I hit this issue also.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22263
  
**[Test build #96426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96426/testReport)**
 for PR 22263 at commit 
[`865e7d1`](https://github.com/apache/spark/commit/865e7d10be99c5f0ccfc89b2fc208de91e810ded).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-09-21 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r219490742
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -162,7 +162,8 @@ statement
 tableIdentifier partitionSpec? describeColName?
#describeTable
 | REFRESH TABLE tableIdentifier
#refreshTable
 | REFRESH (STRING | .*?)   
#refreshResource
-| CACHE LAZY? TABLE tableIdentifier (AS? query)?   
#cacheTable
+| CACHE LAZY? storageLevel=identifier? TABLE
+tableIdentifier (AS? query)?   
#cacheTable
--- End diff --

Thanks @maropu, I changed to option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3343/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22461
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22461
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96416/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22461
  
**[Test build #96416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96416/testReport)**
 for PR 22461 at commit 
[`9a948e5`](https://github.com/apache/spark/commit/9a948e5e59ccb6a39e9fa844273188ae331d5abd).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22494
  
**[Test build #96425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96425/testReport)**
 for PR 22494 at commit 
[`ad79c56`](https://github.com/apache/spark/commit/ad79c56ca038fac6797814410d665110ef43e826).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3342/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22494
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22494
  
sorry, I screwed up my local branch and made a wrong conclusion. Turning 
off the precise precision for integral literals can fix the regression. So it's 
better to add a config for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-09-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22514
  
cc @cloud-fan @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3341/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22494
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >