date:20181206

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99807/testReport)**
 for PR 22683 at commit 
[`87a9d5a`](https://github.com/apache/spark/commit/87a9d5ad1ebfbb9b247e95ead3e1a4c34ee08020).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5843/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23252
  
**[Test build #99808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99808/testReport)**
 for PR 23252 at commit 
[`957cb15`](https://github.com/apache/spark/commit/957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...

2018-12-06 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/23104
  
@guoxiaolongzte  good job


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23245
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99810/testReport)**
 for PR 22683 at commit 
[`6a3c58b`](https://github.com/apache/spark/commit/6a3c58b119ed298e1cab8d9a9b341a667a86c8f0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)**
 for PR 22683 at commit 
[`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...

2018-12-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23238#discussion_r239708569
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -141,6 +141,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated 
as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as 
`SELECT 1 FROM range(10) WHERE true`  and returns 10 rows. This violates SQL 
standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without 
GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) 
HAVING true` will return only one row. To restore the previous behavior, set 
`spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`.
 
+  - In version 2.3 and earlier, when reading from a Parquet data source 
table, Spark always returns null for any column whose column names in Hive 
metastore schema and Parquet schema are in different letter cases, no matter 
whether `spark.sql.caseSensitive` is set to true or false. Since 2.4, when 
`spark.sql.caseSensitive` is set to false, Spark does case insensitive column 
name resolution between Hive metastore schema and Parquet schema, so even 
column names are in different letter cases, Spark returns corresponding column 
values. An exception is thrown if there is ambiguity, i.e. more than one 
Parquet column is matched. This change also applies to Parquet Hive tables when 
`spark.sql.hive.convertMetastoreParquet` is set to true.
--- End diff --

Hi, @seancxmao . Maybe, the followings?
```
- `spark.sql.caseSensitive` is set to true or false
+ `spark.sql.caseSensitive` is set to `true` or `false`
```
```
- `spark.sql.caseSensitive` is set to false
+ `spark.sql.caseSensitive` is set to `false`
```
```
- `spark.sql.hive.convertMetastoreParquet` is set to true
+ `spark.sql.hive.convertMetastoreParquet` is set to `true`
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...

2018-12-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23238
  
Thank you for adding this to the migration doc.
cc @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23251
  
**[Test build #99802 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99802/testReport)**
 for PR 23251 at commit 
[`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23251
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23251
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99802/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99804 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)**
 for PR 23108 at commit 
[`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99801/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5845/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99809 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99809/testReport)**
 for PR 23245 at commit 
[`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ArrowCollectSerializer(Serializer):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99809/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23239
  
**[Test build #99801 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99801/testReport)**
 for PR 23239 at commit 
[`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22707: [SPARK-25717][SQL] Insert overwrite a recreated external...

2018-12-06 Thread fjh100456

Github user fjh100456 commented on the issue:

https://github.com/apache/spark/pull/22707
  
Is there any more suggestions? @wangyum @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239690226
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, 
IntegerType}
 
 /**
  * Specifies how tuples that share common expressions will be distributed 
when a query is executed
- * in parallel on many machines.  Distribution can be used to refer to two 
distinct physical
- * properties:
- *  - Inter-node partitioning of data: In this case the distribution 
describes how tuples are
- *partitioned across physical machines in a cluster.  Knowing this 
property allows some
- *operators (e.g., Aggregate) to perform partition local operations 
instead of global ones.
- *  - Intra-partition ordering of data: In this case the distribution 
describes guarantees made
- *about how tuples are distributed within a single partition.
+ * in parallel on many machines.
+ *
+ * Distribution here refers to inter-node partitioning of data:
+ *   The distribution describes how tuples are partitioned across physical 
machines in a cluster.
+ *   Knowing this property allows some operators (e.g., Aggregate) to 
perform partition local
+ *   operations instead of global ones.
  */
--- End diff --

for ordering, I think people can look at `OrderedDistribution`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99797/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239693849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -243,10 +248,19 @@ case class HashPartitioning(expressions: 
Seq[Expression], numPartitions: Int)
  * Represents a partitioning where rows are split across partitions based 
on some total ordering of
  * the expressions specified in `ordering`.  When data is partitioned in 
this manner the following
--- End diff --

nit: add "," after "this manner".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23250: [SPARK-26298][BUILD] Upgrade Janino to 3.0.11

2018-12-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23250
  
Thank you, @HyukjinKwon . Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5841/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)**
 for PR 23108 at commit 
[`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23221
  
**[Test build #99798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99798/testReport)**
 for PR 23221 at commit 
[`c9ab9bc`](https://github.com/apache/spark/commit/c9ab9bcc378168ff3430d8885899ccd74afe7b32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239698500
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -78,6 +80,7 @@ object SQLMetrics {
   private val SUM_METRIC = "sum"
   private val SIZE_METRIC = "size"
   private val TIMING_METRIC = "timing"
+  private val NS_TIMING_METRIC = "nanosecond"
--- End diff --

How about naming it as `NORMALIZE_TIMING_METRIC`, maybe it can be reused 
later for other timing metric which need normalize unit. If you think its 
strange name I'll change back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239701364
  
--- Diff: R/pkg/tests/fulltests/test_mllib_clustering.R ---
@@ -319,4 +319,18 @@ test_that("spark.posterior and spark.perplexity", {
   expect_equal(length(local.posterior), sum(unlist(local.posterior)))
 })
 
+test_that("spark.assignClusters", {
+  df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+ list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+ list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+  clusters <- spark.assignClusters(df, initMode = "degree", weightCol = 
"weight")
+  expected_result <- createDataFrame(list(list(4L, 1L),
+  list(0L, 0L),
+  list(1L, 0L),
+  list(3L, 1L),
+  list(2L, 0L)),
+  schema = c("id", "cluster"))
--- End diff --

ditto for style


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread wangjiaochun

Github user wangjiaochun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239704796
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

I mean that before add code "if (sortedRecords.hasNext()) { return }" it 
will fail. now add assertEquals(0, spillFilesCreated.size()) to 
writeEmptyIterator seems good. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-06 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/23218
  
do we need to relnote jvm compatibility?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239706529
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
 ---
@@ -16,10 +16,13 @@
  */
 package org.apache.spark.deploy.k8s.features
 
-import scala.collection.JavaConverters._
+import java.io.File
+import java.nio.charset.StandardCharsets
+import java.nio.file.Files
 
 import io.fabric8.kubernetes.api.model._
 import org.scalatest.BeforeAndAfter
+import scala.collection.JavaConverters._
--- End diff --

? Hi, @mccheah . We import `java.*` and `scala.*` before any others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-12-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20146
  
ping @dbtsai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)**
 for PR 22683 at commit 
[`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99804/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23249
  
**[Test build #99812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99812/testReport)**
 for PR 23249 at commit 
[`04be19e`](https://github.com/apache/spark/commit/04be19e62caa8fd0365b4998e22cdcad846be6b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99813/testReport)**
 for PR 22683 at commit 
[`bf150fb`](https://github.com/apache/spark/commit/bf150fb4bbc68627d19521a31a0d3a294d079862).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5846/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99815/testReport)**
 for PR 22683 at commit 
[`9dff9ee`](https://github.com/apache/spark/commit/9dff9eea09cbf3d5298bd6d261e1595cafaaae69).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23239
  
The change looks fine.
Do we already have tests for cases 2 and 4?  We know test for case 3 is 
[here](https://github.com/apache/spark/pull/23043).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-12-06 Thread sujith71955

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22575#discussion_r239500890
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -631,6 +631,33 @@ object SQLConf {
 .intConf
 .createWithDefault(200)
 
+  val SQLSTREAM_WATERMARK_ENABLE = 
buildConf("spark.sqlstreaming.watermark.enable")
+.doc("Whether use watermark in sqlstreaming.")
+.booleanConf
+.createWithDefault(false)
+
+  val SQLSTREAM_OUTPUTMODE = buildConf("spark.sqlstreaming.outputMode")
+.doc("The output mode used in sqlstreaming")
+.stringConf
+.createWithDefault("append")
+
+  val SQLSTREAM_TRIGGER = buildConf("spark.sqlstreaming.trigger")
--- End diff --

so here stream-stream join is not supported right?  to elaborate can i 
create two stream source tables and then join both and write to sink?
because if i want to create two streams for 2 different topics, i may need 
to provide different configurations for watermark or window or rigger interval. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23241
  
**[Test build #99774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99774/testReport)**
 for PR 23241 at commit 
[`6dfa27a`](https://github.com/apache/spark/commit/6dfa27ad49fdaa52c8fb83a18238e9f724b9d550).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23239#discussion_r239507673
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -198,11 +198,45 @@ protected final void writeLong(long offset, long 
value) {
 Platform.putLong(getBuffer(), offset, value);
   }
 
+  // We need to take care of NaN and -0.0 in several places:
+  //   1. When compare values, different NaNs should be treated as same, 
`-0.0` and `0.0` should be
+  //  treated as same.
+  //   2. In range partitioner, different NaNs should belong to the same 
partition, -0.0 and 0.0
--- End diff --

It turns out this is not a problem. The doc of `RangePartitioning` is 
misleading. I'm updating the doc at https://github.com/apache/spark/pull/23249


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239509724
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

That's what I'm wondering about. Is it actually desirable to not fail on a 
partial frame? I'm not sure. We *shouldn't* encounter it elsewhere.

This changes a developer API, but may not even be a breaking change as 
there is a default implementation. We can take breaking changes in Spark 3 
though.

I think I agree with your approach here in the end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5813/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99768/testReport)**
 for PR 23245 at commit 
[`021134c`](https://github.com/apache/spark/commit/021134cd2b6a0a82ef8ef36a5ce122bff397ab32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99768/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23241
  
**[Test build #99778 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99778/testReport)**
 for PR 23241 at commit 
[`ff64adc`](https://github.com/apache/spark/commit/ff64adcdff37dee1e4ac14045c2cdb277d4acf4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239525888
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala ---
@@ -118,10 +118,12 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
   case e: HaltReplayException =>
 // Just stop replay.
   case _: EOFException if maybeTruncated =>
-  case _: IOException if maybeTruncated =>
-logWarning(s"Failed to read Spark event log: $sourceName")
   case ioe: IOException =>
-throw ioe
+if (maybeTruncated) {
--- End diff --

I think this was already the behavior? if it doesn't match the 'if' it 
would just throw anyway


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/23249

[SPARK-26297][SQL] improve the doc of Distribution/Partitioning

## What changes were proposed in this pull request?

Some documents of `Distribution/Partitioning` are stale and misleading, 
this PR fixes them:
1. `ClusteredDistribution` doesn't have intra-partition requirement
2. `OrderedDistribution` does not require tuples that share the same value 
being colocated in the same partition.
3. `RangePartitioning` can provide a weaker guarantee for a prefix of its 
`ordering` expressions.

## How was this patch tested?

comment-only PR.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23249


commit 24ea28abd5a385351703335df33b26838d203fe3
Author: Wenchen Fan 
Date:   2018-12-06T15:47:23Z

improve the doc of Distribution/Partitioning




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23249
  
cc @maryannxue @hvanhovell @gatorsmile @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5811/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239508437
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -118,10 +116,13 @@ case class HashClusteredDistribution(
 
 /**
  * Represents data where tuples have been ordered according to the 
`ordering`
- * [[Expression Expressions]].  This is a strictly stronger guarantee than
- * [[ClusteredDistribution]] as an ordering will ensure that tuples that 
share the
- * same value for the ordering expressions are contiguous and will never 
be split across
- * partitions.
+ * [[Expression Expressions]].
+ *
+ * Tuples that share the same values for the ordering expressions must be 
contiguous within a
+ * partition. They can also across partitions, but these partitions must 
be contiguous. For example,
+ * if value `v` is the biggest values in partition 3, it can also be in 
partition 4 as the smallest
+ * value. If all the values in partition 4 are `v`, it can also be in 
partition 5 as the smallest
+ * value.
  */
 case class OrderedDistribution(ordering: Seq[SortOrder]) extends 
Distribution {
--- End diff --

This is only used by sort, and sort doesn't require rows of same value to 
be colocated in the same partition.

Actually we already use this knowledge to optimize 
`RangePartitioning.satisfy`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23249
  
**[Test build #99775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99775/testReport)**
 for PR 23249 at commit 
[`24ea28a`](https://github.com/apache/spark/commit/24ea28abd5a385351703335df33b26838d203fe3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239508488
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -118,10 +116,13 @@ case class HashClusteredDistribution(
 
 /**
  * Represents data where tuples have been ordered according to the 
`ordering`
- * [[Expression Expressions]].  This is a strictly stronger guarantee than
- * [[ClusteredDistribution]] as an ordering will ensure that tuples that 
share the
- * same value for the ordering expressions are contiguous and will never 
be split across
- * partitions.
+ * [[Expression Expressions]].
+ *
+ * Tuples that share the same values for the ordering expressions must be 
contiguous within a
+ * partition. They can also across partitions, but these partitions must 
be contiguous. For example,
+ * if value `v` is the biggest values in partition 3, it can also be in 
partition 4 as the smallest
+ * value. If all the values in partition 4 are `v`, it can also be in 
partition 5 as the smallest
+ * value.
  */
 case class OrderedDistribution(ordering: Seq[SortOrder]) extends 
Distribution {
--- End diff --

This is only used by sort, and sort doesn't require rows of same value to 
be colocated in the same partition.

Actually we already use this knowledge to optimize 
`RangePartitioning.satisfy`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5810/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/23215
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23215
  
**[Test build #99766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99766/testReport)**
 for PR 23215 at commit 
[`25f7039`](https://github.com/apache/spark/commit/25f7039c6b836d40370b615d3d0259c9640dde4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5812/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31

Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239516496
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks @srowen . 

> Is it actually desirable to not fail on a partial frame? I'm not sure. We 
shouldn't encounter it elsewhere.
Yes. Ideally it shouldn't fail. Even for EventLoggingListener if the 
application is finished, the frame will close (That is why it is applicable for 
only running application). After analyzing again the zstd code, the impact 
seems lesser "Either throw exception or read the frame", and latter seems 
better.
I can update the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23241
  
**[Test build #99777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99777/testReport)**
 for PR 23241 at commit 
[`7d6ad51`](https://github.com/apache/spark/commit/7d6ad5187542023943a5790096ff8d8927a06366).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5815/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23202: [SPARK-26248][SQL] Infer date type from CSV

2018-12-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23202
  
I'd defer to @HyukjinKwon ; looks OK in broad strokes but he would know 
much more about the CSV parsing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

2018-12-06 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23201
  
@cloud-fan May I ask you to look at this PR, please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99770/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23215
  
**[Test build #99776 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99776/testReport)**
 for PR 23215 at commit 
[`6f4e652`](https://github.com/apache/spark/commit/6f4e652add4157bfcdad4d7a924c74363f2b5cf2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99766/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5814/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31

Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239521593
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

I have updated the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23202: [SPARK-26248][SQL] Infer date type from CSV

2018-12-06 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23202
  
@HyukjinKwon @srowen Is there anything which worries you in the PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23249
  
**[Test build #99779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99779/testReport)**
 for PR 23249 at commit 
[`3df1e44`](https://github.com/apache/spark/commit/3df1e446a8f9c9d04912856e617617c1ef7c8373).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5816/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...

2018-12-06 Thread 10110346

Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/23228
  
cc @JoshRosen  @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23201#discussion_r239547742
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 ---
@@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: 
JSONOptions) extends Serializable {
 DecimalType(bigDecimal.precision, bigDecimal.scale)
 }
 decimalTry.getOrElse(StringType)
-  case VALUE_STRING => StringType
+  case VALUE_STRING =>
+val stringValue = parser.getText
--- End diff --

`DateType` is not inferred at all but there is another type inference code 
that could be shared between JSON and CSV (maybe somewhere else).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23207
  
**[Test build #99782 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99782/testReport)**
 for PR 23207 at commit 
[`d5ee249`](https://github.com/apache/spark/commit/d5ee2493478d11ba688172d4b27a15b18beaf559).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239548704
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala
 ---
@@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter {
 FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait 
time"),
 RECORDS_READ -> SQLMetrics.createMetric(sc, "records read"))
 }
+
+/**
+ * A shuffle write metrics reporter for SQL exchange operators. Different 
with
+ * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => 
reporter) set in
+ * shuffle dependency, so the local SQLMetric should transient and create 
on executor.
+ * @param metrics Shuffle write metrics in current SparkPlan.
+ * @param metricsReporter Other reporter need to be updated in this 
SQLShuffleWriteMetricsReporter.
+ */
+private[spark] case class SQLShuffleWriteMetricsReporter(
+metrics: Map[String, SQLMetric])(metricsReporter: 
ShuffleWriteMetricsReporter)
--- End diff --

Reimplement done in a780b70.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/23207
  
```
Can we put the above in a closure and pass it into shuffle dependency? Then 
in SQL we just put the above in SQL using custom metrics.
```
Yea, the commit of a780b70 achieve this by adding `ShuffleWriteProcessor` 
abstract.
And the read metrics rename reverted in 7d104eb, will do it and display 
change in another pr.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5819/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23159
  
cc @cloud-fan and @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

2018-12-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22275


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/23208#discussion_r239559037
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java ---
@@ -25,7 +25,10 @@
  * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
  * have a public, 0-arg constructor.
  * 
- * The major responsibility of this interface is to return a {@link Table} 
for read/write.
+ * The major responsibility of this interface is to return a {@link Table} 
for read/write. If you
+ * want to allow end-users to write data to non-existing tables via write 
APIs in `DataFrameWriter`
+ * with `SaveMode`, you must return a {@link Table} instance even if the 
table doesn't exist. The
+ * table schema can be empty in this case.
--- End diff --

What does it mean to write to a non-existing table? If you're writing 
somewhere, the table must exist.

This is for creating a table directly from configuration and an 
implementation class in the DataFrameWriter API. The target of the write still 
needs to exist.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...

2018-12-06 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22275
  
merged to master, thanks @holdenk @viirya and @felixcheung !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 550 matches

Mail list logo