date:20180713

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-07-13 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17673
  
@shubhamchopra  are you still working on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-07-13 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13599
  
I'm interested in us fixing this, especially after yesterday when I spent 
several hours working with workaround hacks. But I want us to do something not 
YARN specific and not involve a large slow down on worker creation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21049
  
@henryr Could you update the PR based on the review? We can safely drop 
them in scalar subqueries and nested subqueries


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20908
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92976/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20908
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92977/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18139: [SPARK-20787][PYTHON] PySpark can't handle datetimes bef...

2018-07-13 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18139
  
@rberenguel is this still on your radar?
Also jenkins ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20908
  
**[Test build #92976 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92976/testReport)**
 for PR 20908 at commit 
[`f5aeafc`](https://github.com/apache/spark/commit/f5aeafc5ee474ea41cd00acbf8660957d15d5c64).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-13 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20629#discussion_r202423675
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala ---
@@ -37,7 +37,7 @@ import org.apache.spark.sql.{Row, SparkSession}
  */
 @Since("0.8.0")
 class KMeansModel @Since("2.4.0") (@Since("1.0.0") val clusterCenters: 
Array[Vector],
-  @Since("2.4.0") val distanceMeasure: String)
+  @Since("2.4.0") val distanceMeasure: String, @Since("2.4.0") val 
trainingCost: Double)
--- End diff --

Since we changed the constructor here, and since it is not private, we 
should provide a similar (and deprecated) constructor without training cost 
which calls this with the default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
**[Test build #92977 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92977/testReport)**
 for PR 21748 at commit 
[`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
**[Test build #92980 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92980/testReport)**
 for PR 21583 at commit 
[`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
**[Test build #92979 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92979/testReport)**
 for PR 21748 at commit 
[`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21741#discussion_r202423258
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -378,6 +378,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED =
+buildConf("spark.sql.parquet.filterPushdown.timestamp")
+  .doc("If true, enables Parquet filter push-down optimization for 
Timestamp. " +
+"This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is " +
+"enabled and Timestamp stored as TIMESTAMP_MICROS or 
TIMESTAMP_MILLIS type.")
--- End diff --

You need to explain how to use `spark.sql.parquet.outputTimestampType` to 
control the Parquet timestamp type  Spark uses to writes parquet files.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-13 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21583
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21748
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/931/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/931/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/931/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #92978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92978/testReport)**
 for PR 20611 at commit 
[`9ceeb30`](https://github.com/apache/spark/commit/9ceeb30ae0f0b04ac46980c499c9c286ba68e20a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
**[Test build #92977 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92977/testReport)**
 for PR 21748 at commit 
[`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21603#discussion_r202418834
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
 // See SPARK-20364.
 def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) 
&& !name.contains(".")
 
+// All DataTypes that support `makeEq` can provide better performance.
+def shouldConvertInPredicate(name: String): Boolean = nameToType(name) 
match {
--- End diff --

Also need to update the benchmark suite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21603#discussion_r202418683
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
 // See SPARK-20364.
 def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) 
&& !name.contains(".")
 
+// All DataTypes that support `makeEq` can provide better performance.
+def shouldConvertInPredicate(name: String): Boolean = nameToType(name) 
match {
--- End diff --

It depends on which PR will be merged first. The corresponding PRs should 
update this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21603#discussion_r202418582
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
 // See SPARK-20364.
 def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) 
&& !name.contains(".")
 
+// All DataTypes that support `makeEq` can provide better performance.
+def shouldConvertInPredicate(name: String): Boolean = nameToType(name) 
match {
--- End diff --

Let us keep it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21603#discussion_r202418387
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -376,7 +374,8 @@ class ParquetFileFormat
   // Collects all converted Parquet filter predicates. Notice that 
not all predicates can be
   // converted (`ParquetFilters.createFilter` returns an 
`Option`). That's why a `flatMap`
   // is used here.
-  .flatMap(new ParquetFilters(pushDownDate, 
pushDownStringStartWith)
+  .flatMap(new ParquetFilters(pushDownDate, 
pushDownStringStartWith,
--- End diff --

let us create `val parquetFilters = new ParquetFilters(pushDownDate, 
pushDownStringStartWith, pushDownInFilterThreshold )`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21748
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/21757#discussion_r202417166
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) 
extends Rule[LogicalPlan]
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, 
_, _)
-if DDLUtils.isDatasourceTable(tableMeta) =>
+if DDLUtils.isDatasourceTable(tableMeta) &&
+  DDLUtils.convertSchema(tableMeta, sparkSession) =>
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread CodingCat

Github user CodingCat closed the pull request at:

https://github.com/apache/spark/pull/21757


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20908
  
**[Test build #92976 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92976/testReport)**
 for PR 20908 at commit 
[`f5aeafc`](https://github.com/apache/spark/commit/f5aeafc5ee474ea41cd00acbf8660957d15d5c64).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20908
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/930/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20908
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21757#discussion_r202416374
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) 
extends Rule[LogicalPlan]
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, 
_, _)
-if DDLUtils.isDatasourceTable(tableMeta) =>
+if DDLUtils.isDatasourceTable(tableMeta) &&
+  DDLUtils.convertSchema(tableMeta, sparkSession) =>
--- End diff --

If you are using `format("parquet")` to create a new table, it will be a 
data source table. We always use the native reader/writer to read/write such a 
table.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92975/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21762
  
**[Test build #92975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92975/testReport)**
 for PR 21762 at commit 
[`bb7a43c`](https://github.com/apache/spark/commit/bb7a43c8f3e34c90ebe8f0e22019c096776b6da3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AvroDeserializer(rootAvroType: Schema, rootCatalystType: 
DataType) `
  * `  sealed trait CatalystDataUpdater `
  * `  final class RowUpdater(row: InternalRow) extends CatalystDataUpdater 
`
  * `  final class ArrayDataUpdater(array: ArrayData) extends 
CatalystDataUpdater `
  * `class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, 
nullable: Boolean) `
  * `class IncompatibleSchemaException(msg: String, ex: Throwable = null) 
extends Exception(msg, ex)`
  * `class SerializableSchema(@transient var value: Schema)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...

2018-07-13 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20908
  
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21763: Branch 2.1

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21763
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/21757#discussion_r202414440
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) 
extends Rule[LogicalPlan]
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, 
_, _)
-if DDLUtils.isDatasourceTable(tableMeta) =>
+if DDLUtils.isDatasourceTable(tableMeta) &&
+  DDLUtils.convertSchema(tableMeta, sparkSession) =>
--- End diff --

do you mean any table built through df.write.format("..") should be taken 
as a data source table no matter we register it with HMS or not


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21763: Branch 2.1

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21763
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21763: Branch 2.1

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21763
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21760
  
This breaks the build. 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.6/7842/

I need to revert it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21763: Branch 2.1

2018-07-13 Thread rajesh7738

GitHub user rajesh7738 opened a pull request:

https://github.com/apache/spark/pull/21763

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21763


commit 664c9795c94d3536ff9fe54af06e0fb6c0012862
Author: Shixiong Zhu 
Date:   2017-03-04T03:00:35Z

[SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite doesn't 
recover the log level

## What changes were proposed in this pull request?

"DataFrameCallbackSuite.execute callback functions when a DataFrame action 
failed" sets the log level to "fatal" but doesn't recover it. Hence, tests 
running after it won't output any logs except fatal logs.

This PR uses `testQuietly` instead to avoid changing the log level.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu 

Closes #17156 from zsxwing/SPARK-19816.

(cherry picked from commit fbc4058037cf5b0be9f14a7dd28105f7f8151bed)
Signed-off-by: Yin Huai 

commit ca7a7e8a893a30d85e4315a4fa1ca1b1c56a703c
Author: uncleGen 
Date:   2017-03-06T02:17:30Z

[SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not 
filter checkpointFilesOfLatestTime with the PATH string.

## What changes were proposed in this pull request?


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73800/testReport/

```
sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code
passed to eventually never returned normally. Attempted 617 times over 
10.003740484 seconds.
Last failure message: 8 did not equal 2.
at 
org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
at 
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
at 
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:336)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
at 
org.apache.spark.streaming.DStreamCheckpointTester$class.generateOutput(CheckpointSuite
.scala:172)
at 
org.apache.spark.streaming.CheckpointSuite.generateOutput(CheckpointSuite.scala:211)
```

the check condition is:

```
val checkpointFilesOfLatestTime = 
Checkpoint.getCheckpointFiles(checkpointDir).filter {
 _.toString.contains(clock.getTimeMillis.toString)
}
// Checkpoint files are written twice for every batch interval. So assert 
that both
// are written to make sure that both of them have been written.
assert(checkpointFilesOfLatestTime.size === 2)
```

the path string may contain the `clock.getTimeMillis.toString`, like `3500` 
:

```

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-500

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1000

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1500

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2000

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2500

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3000

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500.bk

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500
   â²â²â²â²
```

so we should only check the filename, but not the whole path.

## How was this patch tested?

Jenkins.

Author: uncleGen 

Closes #17167 from uncleGen/flaky-CheckpointSuite.

(cherry picked from commit 207067ead6db6dc87b0d144a658e2564e3280a89)
Signed-off-by: Shixiong Zhu 

commit fd6c6d5c363008a229759bf628edc0f6c5e00ade
Author: Tyson Condie 
Date:   2017-03-07T00:39:05Z

[SPARK-19719][SS] Kafka writer

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/929/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21762
  
**[Test build #92975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92975/testReport)**
 for PR 21762 at commit 
[`bb7a43c`](https://github.com/apache/spark/commit/bb7a43c8f3e34c90ebe8f0e22019c096776b6da3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21762: [SPARK-24800][SQL] Refactor Avro Serializer and D...

2018-07-13 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/21762

[SPARK-24800][SQL] Refactor Avro Serializer and Deserializer

## What changes were proposed in this pull request?
Currently the Avro Deserializer converts input Avro format data to `Row`, 
and then convert the `Row` to `InternalRow`.
While the Avro Serializer converts `InternalRow` to `Row`, and then output 
Avro format data.
This PR allows direct conversion between `InternalRow` and Avro format data.

Credits to @cloud-fan . 
## How was this patch tested?

Unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark avro_io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21762


commit bb7a43c8f3e34c90ebe8f0e22019c096776b6da3
Author: Gengliang Wang 
Date:   2018-07-13T08:18:12Z

refactor avro Serializer and Deserializer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21720
  
ping @maryannxue Resolve the conflicts? Will review it again after that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add read schema suite for file-...

2018-07-13 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Thank you so much, @gatorsmile . Sure. I'll make a PR to improve error 
handling for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add read schema suite for file-...

2018-07-13 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Also, thank you, @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtil...

2018-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21760


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...

2018-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21745


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21761
  
**[Test build #92974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92974/testReport)**
 for PR 21761 at commit 
[`531be9a`](https://github.com/apache/spark/commit/531be9a84ff5f2c99d3c8b7b223d8dd2cbf596cf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/928/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21761
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21745
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21761
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92973/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21761
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21761
  
**[Test build #92973 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92973/testReport)**
 for PR 21761 at commit 
[`cd9d0e6`](https://github.com/apache/spark/commit/cd9d0e6b76241f4eaf609ed1b5721c96f4d149b0).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21761
  
**[Test build #92973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92973/testReport)**
 for PR 21761 at commit 
[`cd9d0e6`](https://github.com/apache/spark/commit/cd9d0e6b76241f4eaf609ed1b5721c96f4d149b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/927/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21761
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/21761

[SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

## What changes were proposed in this pull request?

Upgrade Apache Avro from 1.7.7 to 1.8.2. The major new features:

1. More logical types. From the spec of 1.8.2 
https://avro.apache.org/docs/1.8.2/spec.html#Logical+Types we can see comparing 
to [1.7.7](https://avro.apache.org/docs/1.7.7/spec.html#Logical+Types), the new 
version support:
- Date
- Time (millisecond precision)
- Time (microsecond precision)
- Timestamp (millisecond precision)
- Timestamp (microsecond precision)
- Duration

2. Single-object encoding: 
https://avro.apache.org/docs/1.8.2/spec.html#single_object_encoding

This PR aims to update Apache Spark to support these new features.

## How was this patch tested?

Unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark upgrade_avro_1.8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21761


commit cd9d0e6b76241f4eaf609ed1b5721c96f4d149b0
Author: Gengliang Wang 
Date:   2018-07-13T09:03:56Z

upgrade Apache AVRO to 1.8.2




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-07-13 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20611
  
Look worth going ahead to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-13 Thread mrow4a

Github user mrow4a commented on the issue:

https://github.com/apache/spark/pull/21748
  
Can we remove this as a part of this PR? 
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L87
 - it seems to set a client mode by default..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-07-13 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r202363783
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -303,94 +303,44 @@ case class LoadDataCommand(
   s"partitioned, but a partition spec was provided.")
   }
 }
-
-val loadPath =
+val loadPath = {
   if (isLocal) {
-val uri = Utils.resolveURI(path)
-val file = new File(uri.getPath)
-val exists = if (file.getAbsolutePath.contains("*")) {
-  val fileSystem = FileSystems.getDefault
-  val dir = file.getParentFile.getAbsolutePath
-  if (dir.contains("*")) {
-throw new AnalysisException(
-  s"LOAD DATA input path allows only filename wildcard: $path")
-  }
-
-  // Note that special characters such as "*" on Windows are not 
allowed as a path.
-  // Calling `WindowsFileSystem.getPath` throws an exception if 
there are in the path.
-  val dirPath = fileSystem.getPath(dir)
-  val pathPattern = new File(dirPath.toAbsolutePath.toString, 
file.getName).toURI.getPath
-  val safePathPattern = if (Utils.isWindows) {
-// On Windows, the pattern should not start with slashes for 
absolute file paths.
-pathPattern.stripPrefix("/")
-  } else {
-pathPattern
-  }
-  val files = new File(dir).listFiles()
-  if (files == null) {
-false
-  } else {
-val matcher = fileSystem.getPathMatcher("glob:" + 
safePathPattern)
-files.exists(f => 
matcher.matches(fileSystem.getPath(f.getAbsolutePath)))
-  }
-} else {
-  new File(file.getAbsolutePath).exists()
-}
-if (!exists) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-uri
+val localFS = FileContext.getLocalFSFileContext()
+localFS.makeQualified(new Path(path))
   } else {
-val uri = new URI(path)
-val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != 
null) {
-  uri
-} else {
-  // Follow Hive's behavior:
-  // If no schema or authority is provided with non-local inpath,
-  // we will use hadoop configuration "fs.defaultFS".
-  val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
-  val defaultFS = if (defaultFSConf == null) {
-new URI("")
-  } else {
-new URI(defaultFSConf)
-  }
-
-  val scheme = if (uri.getScheme() != null) {
--- End diff --

Hm. I was trying to understand where this logic went. I see that's sort of 
in the call to `makeQualified`. I couldn't find the docs for that method 
overload though because it's actually "LimitedPrivate" in Hadoop. I think we 
shouldn't call this method? can we instead just restore this logic?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-07-13 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r202360482
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -303,94 +303,44 @@ case class LoadDataCommand(
   s"partitioned, but a partition spec was provided.")
   }
 }
-
-val loadPath =
+val loadPath = {
   if (isLocal) {
-val uri = Utils.resolveURI(path)
-val file = new File(uri.getPath)
-val exists = if (file.getAbsolutePath.contains("*")) {
-  val fileSystem = FileSystems.getDefault
-  val dir = file.getParentFile.getAbsolutePath
-  if (dir.contains("*")) {
-throw new AnalysisException(
-  s"LOAD DATA input path allows only filename wildcard: $path")
-  }
-
-  // Note that special characters such as "*" on Windows are not 
allowed as a path.
-  // Calling `WindowsFileSystem.getPath` throws an exception if 
there are in the path.
-  val dirPath = fileSystem.getPath(dir)
-  val pathPattern = new File(dirPath.toAbsolutePath.toString, 
file.getName).toURI.getPath
-  val safePathPattern = if (Utils.isWindows) {
-// On Windows, the pattern should not start with slashes for 
absolute file paths.
-pathPattern.stripPrefix("/")
-  } else {
-pathPattern
-  }
-  val files = new File(dir).listFiles()
-  if (files == null) {
-false
-  } else {
-val matcher = fileSystem.getPathMatcher("glob:" + 
safePathPattern)
-files.exists(f => 
matcher.matches(fileSystem.getPath(f.getAbsolutePath)))
-  }
-} else {
-  new File(file.getAbsolutePath).exists()
-}
-if (!exists) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-uri
+val localFS = FileContext.getLocalFSFileContext()
+localFS.makeQualified(new Path(path))
   } else {
-val uri = new URI(path)
-val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != 
null) {
-  uri
-} else {
-  // Follow Hive's behavior:
-  // If no schema or authority is provided with non-local inpath,
-  // we will use hadoop configuration "fs.defaultFS".
-  val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
-  val defaultFS = if (defaultFSConf == null) {
-new URI("")
-  } else {
-new URI(defaultFSConf)
-  }
-
-  val scheme = if (uri.getScheme() != null) {
-uri.getScheme()
-  } else {
-defaultFS.getScheme()
-  }
-  val authority = if (uri.getAuthority() != null) {
-uri.getAuthority()
-  } else {
-defaultFS.getAuthority()
-  }
-
-  if (scheme == null) {
-throw new AnalysisException(
-  s"LOAD DATA: URI scheme is required for non-local input 
paths: '$path'")
-  }
-
-  // Follow Hive's behavior:
-  // If LOCAL is not specified, and the path is relative,
-  // then the path is interpreted relative to "/user/"
-  val uriPath = uri.getPath()
-  val absolutePath = if (uriPath != null && 
uriPath.startsWith("/")) {
-uriPath
-  } else {
-s"/user/${System.getProperty("user.name")}/$uriPath"
-  }
-  new URI(scheme, authority, absolutePath, uri.getQuery(), 
uri.getFragment())
-}
-val hadoopConf = sparkSession.sessionState.newHadoopConf()
-val srcPath = new Path(hdfsUri)
-val fs = srcPath.getFileSystem(hadoopConf)
-if (!fs.exists(srcPath)) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-hdfsUri
+val loadPath = new Path(path)
+// Follow Hive's behavior:
+// If no schema or authority is provided with non-local inpath,
+// we will use hadoop configuration "fs.defaultFS".
+val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
+val defaultFS = if (defaultFSConf == null) new URI("") else new 
URI(defaultFSConf)
+// Follow Hive's behavior:
+// If LOCAL is not specified, and the path is relative,
+// then the path is interpreted relative to "/user/"
+val uriPath = new 
Path(s"/user/${System.getProperty

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-07-13 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r202360294
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -303,94 +303,44 @@ case class LoadDataCommand(
   s"partitioned, but a partition spec was provided.")
   }
 }
-
-val loadPath =
+val loadPath = {
   if (isLocal) {
-val uri = Utils.resolveURI(path)
-val file = new File(uri.getPath)
-val exists = if (file.getAbsolutePath.contains("*")) {
-  val fileSystem = FileSystems.getDefault
-  val dir = file.getParentFile.getAbsolutePath
-  if (dir.contains("*")) {
-throw new AnalysisException(
-  s"LOAD DATA input path allows only filename wildcard: $path")
-  }
-
-  // Note that special characters such as "*" on Windows are not 
allowed as a path.
-  // Calling `WindowsFileSystem.getPath` throws an exception if 
there are in the path.
-  val dirPath = fileSystem.getPath(dir)
-  val pathPattern = new File(dirPath.toAbsolutePath.toString, 
file.getName).toURI.getPath
-  val safePathPattern = if (Utils.isWindows) {
-// On Windows, the pattern should not start with slashes for 
absolute file paths.
-pathPattern.stripPrefix("/")
-  } else {
-pathPattern
-  }
-  val files = new File(dir).listFiles()
-  if (files == null) {
-false
-  } else {
-val matcher = fileSystem.getPathMatcher("glob:" + 
safePathPattern)
-files.exists(f => 
matcher.matches(fileSystem.getPath(f.getAbsolutePath)))
-  }
-} else {
-  new File(file.getAbsolutePath).exists()
-}
-if (!exists) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-uri
+val localFS = FileContext.getLocalFSFileContext()
+localFS.makeQualified(new Path(path))
   } else {
-val uri = new URI(path)
-val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != 
null) {
-  uri
-} else {
-  // Follow Hive's behavior:
-  // If no schema or authority is provided with non-local inpath,
-  // we will use hadoop configuration "fs.defaultFS".
-  val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
-  val defaultFS = if (defaultFSConf == null) {
-new URI("")
-  } else {
-new URI(defaultFSConf)
-  }
-
-  val scheme = if (uri.getScheme() != null) {
-uri.getScheme()
-  } else {
-defaultFS.getScheme()
-  }
-  val authority = if (uri.getAuthority() != null) {
-uri.getAuthority()
-  } else {
-defaultFS.getAuthority()
-  }
-
-  if (scheme == null) {
-throw new AnalysisException(
-  s"LOAD DATA: URI scheme is required for non-local input 
paths: '$path'")
-  }
-
-  // Follow Hive's behavior:
-  // If LOCAL is not specified, and the path is relative,
-  // then the path is interpreted relative to "/user/"
-  val uriPath = uri.getPath()
-  val absolutePath = if (uriPath != null && 
uriPath.startsWith("/")) {
-uriPath
-  } else {
-s"/user/${System.getProperty("user.name")}/$uriPath"
-  }
-  new URI(scheme, authority, absolutePath, uri.getQuery(), 
uri.getFragment())
-}
-val hadoopConf = sparkSession.sessionState.newHadoopConf()
-val srcPath = new Path(hdfsUri)
-val fs = srcPath.getFileSystem(hadoopConf)
-if (!fs.exists(srcPath)) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-hdfsUri
+val loadPath = new Path(path)
+// Follow Hive's behavior:
+// If no schema or authority is provided with non-local inpath,
+// we will use hadoop configuration "fs.defaultFS".
+val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
+val defaultFS = if (defaultFSConf == null) new URI("") else new 
URI(defaultFSConf)
+// Follow Hive's behavior:
+// If LOCAL is not specified, and the path is relative,
+// then the path is interpreted relative to "/user/"
+val uriPath = new 
Path(s"/user/${System.getProperty

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-07-13 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r202359876
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1912,11 +1912,59 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   sql("LOAD DATA LOCAL INPATH '/non-exist-folder/*part*' INTO 
TABLE load_t")
 }.getMessage
 assert(m.contains("LOAD DATA input path does not exist"))
+  }
+}
+  }
 
-val m2 = intercept[AnalysisException] {
-  sql(s"LOAD DATA LOCAL INPATH '$path*/*part*' INTO TABLE load_t")
+  test("Support wildcard character in folderlevel for LOAD DATA LOCAL 
INPATH") {
+withTempDir { dir =>
+  val path = dir.toURI.toString.stripSuffix("/")
+  val dirPath = dir.getAbsoluteFile
+  for (i <- 1 to 3) {
+Files.write(s"$i", new File(dirPath, s"part-r-$i"), 
StandardCharsets.UTF_8)
+  }
+  withTable("load_t_folder_wildcard") {
+sql("CREATE TABLE load_t (a STRING)")
+sql(s"LOAD DATA LOCAL INPATH '${
+  path.substring(0, path.length - 1)
+.concat("*")
+}/' INTO TABLE load_t")
+checkAnswer(sql("SELECT * FROM load_t"), Seq(Row("1"), Row("2"), 
Row("3")))
+val m = intercept[AnalysisException] {
+  sql(s"LOAD DATA LOCAL INPATH '${
+path.substring(0, path.length - 1).concat("_invalid_dir") 
concat ("*")
+  }/' INTO TABLE load_t")
 }.getMessage
-assert(m2.contains("LOAD DATA input path allows only filename 
wildcard"))
+assert(m.contains("LOAD DATA input path does not exist"))
+  }
+}
+  }
--- End diff --

Still need a space here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21102
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92970/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21690: [SPARK-24713]AppMatser of spark streaming kafka O...

2018-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21690


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21102
  
**[Test build #92970 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92970/testReport)**
 for PR 21102 at commit 
[`fce9eb0`](https://github.com/apache/spark/commit/fce9eb09bf0666711dbb5584c56b2534e495dffc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-13 Thread koeninger

Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/21690
  
LGTM, merging to master.  Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/926/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/926/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/926/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92972/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
**[Test build #92972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92972/testReport)**
 for PR 21652 at commit 
[`1bc3d07`](https://github.com/apache/spark/commit/1bc3d070f4a92d16c4a2a5bf2876a50d1a311ba3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/925/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/925/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/925/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
**[Test build #92972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92972/testReport)**
 for PR 21652 at commit 
[`1bc3d07`](https://github.com/apache/spark/commit/1bc3d070f4a92d16c4a2a5bf2876a50d1a311ba3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21652
  
jenkins test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92967/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21704
  
**[Test build #92967 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92967/testReport)**
 for PR 21704 at commit 
[`5115961`](https://github.com/apache/spark/commit/5115961fb0503cabbdbdead7c29c1521ab4f76cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the ...

2018-07-13 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21184#discussion_r202334300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -284,6 +288,80 @@ class Analyzer(
 }
   }
 
+  /**
+   * Replaces [[Alias]] with the same exprId but different references with 
[[Alias]] having
+   * different exprIds. This is a rare situation which can cause incorrect 
results.
+   */
+  object DeduplicateAliases extends Rule[LogicalPlan] {
--- End diff --

Yes, that is also true. But in many places in the codebase we just compare 
attributes using `semanticEquals` or in some other cases, even `equals`. Well, 
if we admit that different attributes can have the same `exprId`, all these 
places should be checked in order to be sure that the same problem cannot 
happen there too. Moreover (this is more a nit), the `semanticEquals` or 
`sameRef` method itself would be wrong according to its semantic, as it may 
return `true` even when two attributes don't have the same reference. This is 
the reason why I opted for this solution, which seems to me cleaner as it 
solves the root cause of the problem. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92968/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #92968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92968/testReport)**
 for PR 20611 at commit 
[`bee161f`](https://github.com/apache/spark/commit/bee161f07ae4f76a0f090f64ac84c39f752652ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-13 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21652
  
@liyinan926  I had the same issue locally but I dont think it is becaus eof 
th ePR.

"[ERROR] Failed to execute goal on project 
spark-kubernetes-integration-tests_2.11: Could not resolve dependencies for 
project 
spark-kubernetes-integration-tests:spark-kubernetes-integration-tests_2.11:jar:2.4.0-SNAPSHOT:
 Failed to collect dependencies at 
org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT: Failed to read artifact 
descriptor for org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT: Failure to 
find org.apache.spark:spark-parent_2.11:pom:2.4.0-20180712.095204-165 in 
https://repository.apache.org/snapshots was cached in the local repository, 
resolution will not be reattempted until the update interval of 
apache.snapshots has elapsed or updates are forced"

I fixed that locally (I have my own script to run them) by doing 
./build/mvn install... since the integrationt ests suite is run in stanadlone 
mode it expects parent artifact to be available. @srowen  thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21751: [SPARK-24208][SQL][FOLLOWUP] Move test cases to proper l...

2018-07-13 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21751
  
sure, I'll keep them in mind. Sorry for the mistakes, I'll be more careful. 
Thanks @gatorsmile.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...

2018-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92971/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...

2018-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21760
  
**[Test build #92971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92971/testReport)**
 for PR 21760 at commit 
[`26b88ca`](https://github.com/apache/spark/commit/26b88ca201a70283528f289cdd2e1e216fce6e7a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AvroSuite extends QueryTest with SharedSQLContext with 
SQLTestUtils `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-13 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/21556#discussion_r202327362
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -37,41 +39,64 @@ import org.apache.spark.unsafe.types.UTF8String
 /**
  * Some utility function to convert Spark data source filters to Parquet 
filters.
  */
-private[parquet] class ParquetFilters(pushDownDate: Boolean, 
pushDownStartWith: Boolean) {
+private[parquet] class ParquetFilters(
+pushDownDate: Boolean,
+pushDownDecimal: Boolean,
+pushDownStartWith: Boolean) {
 
   private case class ParquetSchemaType(
   originalType: OriginalType,
   primitiveTypeName: PrimitiveTypeName,
-  decimalMetadata: DecimalMetadata)
-
-  private val ParquetBooleanType = ParquetSchemaType(null, BOOLEAN, null)
-  private val ParquetByteType = ParquetSchemaType(INT_8, INT32, null)
-  private val ParquetShortType = ParquetSchemaType(INT_16, INT32, null)
-  private val ParquetIntegerType = ParquetSchemaType(null, INT32, null)
-  private val ParquetLongType = ParquetSchemaType(null, INT64, null)
-  private val ParquetFloatType = ParquetSchemaType(null, FLOAT, null)
-  private val ParquetDoubleType = ParquetSchemaType(null, DOUBLE, null)
-  private val ParquetStringType = ParquetSchemaType(UTF8, BINARY, null)
-  private val ParquetBinaryType = ParquetSchemaType(null, BINARY, null)
-  private val ParquetDateType = ParquetSchemaType(DATE, INT32, null)
+  length: Int,
+  decimalMeta: DecimalMetadata)
+
+  private val ParquetBooleanType = ParquetSchemaType(null, BOOLEAN, 0, 
null)
+  private val ParquetByteType = ParquetSchemaType(INT_8, INT32, 0, null)
+  private val ParquetShortType = ParquetSchemaType(INT_16, INT32, 0, null)
+  private val ParquetIntegerType = ParquetSchemaType(null, INT32, 0, null)
+  private val ParquetLongType = ParquetSchemaType(null, INT64, 0, null)
+  private val ParquetFloatType = ParquetSchemaType(null, FLOAT, 0, null)
+  private val ParquetDoubleType = ParquetSchemaType(null, DOUBLE, 0, null)
+  private val ParquetStringType = ParquetSchemaType(UTF8, BINARY, 0, null)
+  private val ParquetBinaryType = ParquetSchemaType(null, BINARY, 0, null)
+  private val ParquetDateType = ParquetSchemaType(DATE, INT32, 0, null)
 
   private def dateToDays(date: Date): SQLDate = {
 DateTimeUtils.fromJavaDate(date)
   }
 
+  private def decimalToInt32(decimal: JBigDecimal): Integer = 
decimal.unscaledValue().intValue()
+
+  private def decimalToInt64(decimal: JBigDecimal): JLong = 
decimal.unscaledValue().longValue()
+
+  private def decimalToByteArray(decimal: JBigDecimal, numBytes: Int): 
Binary = {
+val decimalBuffer = new Array[Byte](numBytes)
+val bytes = decimal.unscaledValue().toByteArray
+
+val fixedLengthBytes = if (bytes.length == numBytes) {
+  bytes
+} else {
+  val signByte = if (bytes.head < 0) -1: Byte else 0: Byte
+  java.util.Arrays.fill(decimalBuffer, 0, numBytes - bytes.length, 
signByte)
+  System.arraycopy(bytes, 0, decimalBuffer, numBytes - bytes.length, 
bytes.length)
+  decimalBuffer
+}
+Binary.fromReusedByteArray(fixedLengthBytes, 0, numBytes)
+  }
+
   private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => 
FilterPredicate] = {
--- End diff --

`ParquetBooleanType`, `ParquetLongType`, `ParquetFloatType` and 
`ParquetDoubleType` do not need `Option`. Here is a example:
```scala
scala> import org.apache.parquet.io.api.Binary
import org.apache.parquet.io.api.Binary

scala> Option(null).map(s => 
Binary.fromString(s.asInstanceOf[String])).orNull
res7: org.apache.parquet.io.api.Binary = null

scala> Binary.fromString(null.asInstanceOf[String])
java.lang.NullPointerException
  at 
org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:224)
  at 
org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:214)
  at org.apache.parquet.io.api.Binary.fromString(Binary.java:554)
  ... 52 elided

scala> null.asInstanceOf[java.lang.Long]
res9: Long = null

scala> null.asInstanceOf[java.lang.Boolean]
res10: Boolean = null

scala> 
Option(null).map(_.asInstanceOf[Number].intValue.asInstanceOf[Integer]).orNull
res11: Integer = null

scala> null.asInstanceOf[Number].intValue.asInstanceOf[Integer]
java.lang.NullPointerException
  ... 52 elided
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

201 - 300 of 369 matches

Mail list logo