date:20170905

[GitHub] spark issue #18903: [SPARK-21590][SS]Window start time should support negati...

2017-09-05 Thread KevinZwx

Github user KevinZwx commented on the issue:

https://github.com/apache/spark/pull/18903
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81441/testReport)**
 for PR 19136 at commit 
[`a824d44`](https://github.com/apache/spark/commit/a824d44f9a4aac0518c5cd30893c34b36a094798).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-05 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19122#discussion_r137175343
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,24 @@ def _fit(self, dataset):
 randCol = self.uid + "_rand"
 df = dataset.select("*", rand(seed).alias(randCol))
 metrics = [0.0] * numModels
+
+pool = ThreadPool(processes=min(self.getParallelism(), numModels))
+
 for i in range(nFolds):
 validateLB = i * h
 validateUB = (i + 1) * h
 condition = (df[randCol] >= validateLB) & (df[randCol] < 
validateUB)
-validation = df.filter(condition)
+validation = df.filter(condition).cache()
 train = df.filter(~condition)
-models = est.fit(train, epm)
-for j in range(numModels):
-model = models[j]
+
+def singleTrain(index):
+model = est.fit(train, epm[index])
 # TODO: duplicate evaluator to take extra params from input
-metric = eva.evaluate(model.transform(validation, epm[j]))
-metrics[j] += metric/nFolds
+metric = eva.evaluate(model.transform(validation, 
epm[index]))
+metrics[index] += metric/nFolds
+
+pool.map(singleTrain, range(numModels))
--- End diff --

Oh, I think this work well. We already have PRs do similar things #19110 
and #16774 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-05 Thread jmchung

Github user jmchung commented on the issue:

https://github.com/apache/spark/pull/18865
  
Could @gatorsmile and @HyukjinKwon please share some instructions for 
revised details on exception message? The current message indicates the reason 
of disallowance when users just select the `_corrupt_record`, and provides the 
alternative way to get corrupt records. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19050
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19050
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81439/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19050
  
**[Test build #81439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81439/testReport)**
 for PR 19050 at commit 
[`c1325fb`](https://github.com/apache/spark/commit/c1325fb9b1f8501b1a31b61e9b39bf1213b021f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174591
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
--- End diff --

Yep!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174463
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

Yep. I agree that it's a little urgly now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174337
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

Is it okay to use the following?
```scala
val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
val newDataSchema = StructType(catalogTable.dataSchema ++ columns)

SchemaUtils.checkColumnNameDuplication(
  reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
  conf.caseSensitiveAnalysis)
DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))

catalog.alterTableSchema(
  table, catalogTable.schema.copy(fields = reorderedSchema.toArray))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174215
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

Ur, actually. Excluding partition columns was intentional.
Maybe, I used a misleading PR title and description here.
So far, I checked `dataSchema` only. I think partition columns are okay 
because they are not a part of Parquet/ORC file schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137173190
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -130,10 +130,12 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case CreateTable(tableDesc, mode, None) if 
DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc)
   CreateDataSourceTableCommand(tableDesc, ignoreIfExists = mode == 
SaveMode.Ignore)
 
 case CreateTable(tableDesc, mode, Some(query))
 if query.resolved && DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc.copy(schema = query.schema))
   CreateDataSourceTableAsSelectCommand(tableDesc, mode, query)
 
 case InsertIntoTable(l @ LogicalRelation(_: InsertableRelation, _, _, 
_),
--- End diff --

Oh, I'll remove it from Hive serde table case. Checking on the existing 
table during INSERT INTO seems to be actually no-op.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] ORC/Parquet table should not create i...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19124
  
Oh, thank you for review, @viirya, @HyukjinKwon and @gatorsmile !
I'll follow up your comments!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137171798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -130,10 +130,12 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case CreateTable(tableDesc, mode, None) if 
DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc)
   CreateDataSourceTableCommand(tableDesc, ignoreIfExists = mode == 
SaveMode.Ignore)
 
 case CreateTable(tableDesc, mode, Some(query))
 if query.resolved && DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc.copy(schema = query.schema))
   CreateDataSourceTableAsSelectCommand(tableDesc, mode, query)
 
 case InsertIntoTable(l @ LogicalRelation(_: InsertableRelation, _, _, 
_),
--- End diff --

You did the check for Hive serde tables, but no check is done in data 
source tables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137171539
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

```Scala
val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)

SchemaUtils.checkColumnNameDuplication(
  reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
  conf.caseSensitiveAnalysis)
DDLUtils.checkFieldNames(catalogTable.copy(schema = newSchema))

catalog.alterTableSchema(table, newSchema)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137171079
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

This should be moved to `verifyAlterTableAddColumn`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137170969
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+  OrcFileFormat.checkFieldNames(table.dataSchema)
+} else if (serde == HiveSerDe.sourceToSerDe("parquet").get.serde) {
--- End diff --

We could have different Parquet serde. For example, 
`parquet.hive.serde.ParquetHiveSerDe` and 
`org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe`. How about ORC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137170635
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
--- End diff --

This way is not right. Let use your previous way with a foreach loop
```
table.provider.foreach {
  _.toLowerCase(Locale.ROOT) match {
case "hive" =>
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137170172
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+  OrcFileFormat.checkFieldNames(table.dataSchema)
+} else if (serde == HiveSerDe.sourceToSerDe("parquet").get.serde) {
+  ParquetSchemaConverter.checkFieldNames(table.dataSchema)
+} else {
+  table.provider.get.toLowerCase(Locale.ROOT) match {
--- End diff --

`table.provider` could be `None` in the previous versions of Spark. Thus, 
`.get` is risky.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137169805
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

Oh, right, that is java...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137169608
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

`parseName` looks not public though .. I don't like this line too but could 
not think of another alternative for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137169152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

This seems being equal to call `TypeDescription.parseName(name)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18935: [SPARK-9104][CORE] Expose Netty memory metrics in...

2017-09-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18935


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-09-05 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18935
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17254
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81437/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17254
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17254
  
**[Test build #81437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81437/testReport)**
 for PR 17254 at commit 
[`36a3463`](https://github.com/apache/spark/commit/36a34632dbb000799c35727c00d1542d4bb1ce00).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/18692
  
@cloud-fan : In event when the (set of join keys) is a superset of (child 
node's partitioning keys), its possible to avoid shuffle : 
https://github.com/apache/spark/pull/19054 ... this can help with 2 cases - 
when users unknowingly join over extra columns in addition to bucket columns
- the one you mentioned (ie. inferred conditions).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19050
  
Thanks @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19050
  
**[Test build #81440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81440/testReport)**
 for PR 19050 at commit 
[`8550828`](https://github.com/apache/spark/commit/85508287ca1b98f3a3c341efd3ac70f99b56bc73).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19050
  
LGTM pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] ORC/Parquet table should not create i...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81435/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] ORC/Parquet table should not create i...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] ORC/Parquet table should not create i...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81435/testReport)**
 for PR 19124 at commit 
[`c6e9ab6`](https://github.com/apache/spark/commit/c6e9ab6291dda034fe39263202ea5bc2373cd86c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137167260
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -875,4 +876,70 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   assert(e.message.contains("cannot resolve '`a`' given input columns: 
[t.i, t.j]"))
 }
   }
+
+  test("SPARK-21835: Join in correlated subquery should be 
duplicateResolved: case 1") {
+withTable("t1") {
+  withTempPath { path =>
+Seq(1 -> "a").toDF("i", "j").write.parquet(path.getCanonicalPath)
+sql(s"CREATE TABLE t1 USING parquet LOCATION '${path.toURI}'")
+
+val sqlText =
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin
+val optimizedPlan = sql(sqlText).queryExecution.optimizedPlan
+val join = optimizedPlan.collect {
+  case j: Join => j
+}.head.asInstanceOf[Join]
+assert(join.duplicateResolved)
+assert(optimizedPlan.resolved)
+  }
+}
+  }
+
+  test("SPARK-21835: Join in correlated subquery should be 
duplicateResolved: case 2") {
+withTable("t1", "t2", "t3") {
+  withTempPath { path =>
+val data = Seq((1, 1, 1), (2, 0, 2))
+
+data.toDF("t1a", "t1b", "t1c").write.parquet(path.getCanonicalPath 
+ "/t1")
+data.toDF("t2a", "t2b", "t2c").write.parquet(path.getCanonicalPath 
+ "/t2")
+data.toDF("t3a", "t3b", "t3c").write.parquet(path.getCanonicalPath 
+ "/t3")
+
+sql(s"CREATE TABLE t1 USING parquet LOCATION '${path.toURI}/t1'")
+sql(s"CREATE TABLE t2 USING parquet LOCATION '${path.toURI}/t2'")
+sql(s"CREATE TABLE t3 USING parquet LOCATION '${path.toURI}/t3'")
+
+val sqlText =
+  s"""
+ |SELECT *
+ |FROM   (SELECT *
+ |FROM   t2
+ |WHERE  t2c IN (SELECT t1c
+ |   FROM   t1
+ |   WHERE  t1a = t2a)
+ |UNION
+ |SELECT *
+ |FROM   t3
+ |WHERE  t3a IN (SELECT t2a
+ |   FROM   t2
+ |   UNION ALL
+ |   SELECT t1a
+ |   FROM   t1
+ |   WHERE  t1b > 0)) t4
+ |WHERE  t4.t2b IN (SELECT Min(t3b)
+ |  FROM   t3
+ |  WHERE  t4.t2a = t3a)
+   """.stripMargin
+val optimizedPlan = sql(sqlText).queryExecution.optimizedPlan
+val joinNodes = optimizedPlan.collect {
+  case j: Join => j
+}.map(_.asInstanceOf[Join])
+joinNodes.map(j => assert(j.duplicateResolved))
--- End diff --

```Scala
val joinNodes = optimizedPlan.collect { case j: Join => j }
joinNodes.foreach(j => assert(j.duplicateResolved))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137167216
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -875,4 +876,70 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   assert(e.message.contains("cannot resolve '`a`' given input columns: 
[t.i, t.j]"))
 }
   }
+
+  test("SPARK-21835: Join in correlated subquery should be 
duplicateResolved: case 1") {
+withTable("t1") {
+  withTempPath { path =>
+Seq(1 -> "a").toDF("i", "j").write.parquet(path.getCanonicalPath)
+sql(s"CREATE TABLE t1 USING parquet LOCATION '${path.toURI}'")
+
+val sqlText =
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin
+val optimizedPlan = sql(sqlText).queryExecution.optimizedPlan
+val join = optimizedPlan.collect {
+  case j: Join => j
+}.head.asInstanceOf[Join]
--- End diff --

```Scala
val join = optimizedPlan.collectFirst { case j: Join => j }.get
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-05 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/19132
  
@ajbozarth I have updated the implementation which only access FS in 
FSHistoryServerProvider


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-05 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19131
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-09-05 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18966#discussion_r137164501
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -769,16 +769,27 @@ class CodegenContext {
   foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): 
String = {
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
+val defaultMaxLines = 100
+val maxLines = if (SparkEnv.get != null) {
+  
SparkEnv.get.conf.getInt("spark.sql.codegen.expressions.maxCodegenLinesPerFunction",
--- End diff --

I see.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

2017-09-05 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18704
  
ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-05 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19140
  
@redsanket can you please test this with a secure Hadoop environment using 
spark-submit (not Oozie), I don't want to bring in any regression here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19122#discussion_r137162378
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,24 @@ def _fit(self, dataset):
 randCol = self.uid + "_rand"
 df = dataset.select("*", rand(seed).alias(randCol))
 metrics = [0.0] * numModels
+
+pool = ThreadPool(processes=min(self.getParallelism(), numModels))
+
 for i in range(nFolds):
 validateLB = i * h
 validateUB = (i + 1) * h
 condition = (df[randCol] >= validateLB) & (df[randCol] < 
validateUB)
-validation = df.filter(condition)
+validation = df.filter(condition).cache()
 train = df.filter(~condition)
-models = est.fit(train, epm)
-for j in range(numModels):
-model = models[j]
+
+def singleTrain(index):
+model = est.fit(train, epm[index])
 # TODO: duplicate evaluator to take extra params from input
-metric = eva.evaluate(model.transform(validation, epm[j]))
-metrics[j] += metric/nFolds
+metric = eva.evaluate(model.transform(validation, 
epm[index]))
+metrics[index] += metric/nFolds
+
+pool.map(singleTrain, range(numModels))
--- End diff --

The actual fitting and evaluation methods run here might include CPU bound 
codes. So I am not sure if multithreading here can well boost the performance.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19122#discussion_r137162089
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,23 @@ def _fit(self, dataset):
 randCol = self.uid + "_rand"
 df = dataset.select("*", rand(seed).alias(randCol))
 metrics = [0.0] * numModels
+
+pool = ThreadPool(processes=min(self.getParallelism(), numModels))
+
 for i in range(nFolds):
 validateLB = i * h
 validateUB = (i + 1) * h
 condition = (df[randCol] >= validateLB) & (df[randCol] < 
validateUB)
-validation = df.filter(condition)
+validation = df.filter(condition).cache()
--- End diff --

That's right, but seems we don't check if input dataset is cached or not 
here? Should we cache it if it is not cached?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19050
  
**[Test build #81439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81439/testReport)**
 for PR 19050 at commit 
[`c1325fb`](https://github.com/apache/spark/commit/c1325fb9b1f8501b1a31b61e9b39bf1213b021f7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19110
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19110
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81438/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19110
  
**[Test build #81438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81438/testReport)**
 for PR 19110 at commit 
[`edcf85c`](https://github.com/apache/spark/commit/edcf85c08f25044520d43b919e0475e0f047001b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137159908
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -49,6 +49,30 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
 }
   }
 
+  def dedupJoin(plan: LogicalPlan): LogicalPlan = {
--- End diff --

ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137159871
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -875,4 +876,71 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   assert(e.message.contains("cannot resolve '`a`' given input columns: 
[t.i, t.j]"))
 }
   }
+
+  test("SPARK-21835: Join in correlated subquery should be 
duplicateResolved: case 1") {
+withTable("t1") {
+  withTempPath { path =>
+Seq(1 -> "a").toDF("i", "j").write.parquet(path.getCanonicalPath)
+sql(s"CREATE TABLE t1 USING parquet LOCATION '${path.toURI}'")
+
+val sqlText =
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin
+val ds = sql(sqlText)
--- End diff --

Yes, missing this. I'll remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137159884
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -98,6 +122,7 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
   }
+  dedupJoin(rewritten)
--- End diff --

Fair point. Will follow it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137159896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -49,6 +49,30 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
 }
   }
 
+  def dedupJoin(plan: LogicalPlan): LogicalPlan = {
+plan transform {
+  case j @ Join(left, right, joinType, joinCond) =>
--- End diff --

Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19056: [SPARK-21765] Check that optimization doesn't affect isS...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19056
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81434/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19056: [SPARK-21765] Check that optimization doesn't affect isS...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19056
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19056: [SPARK-21765] Check that optimization doesn't affect isS...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19056
  
**[Test build #81434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81434/testReport)**
 for PR 19056 at commit 
[`a3ec0f2`](https://github.com/apache/spark/commit/a3ec0f2cf3ec92aa30327c856820722ae7f22e7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18692
  
> After adding the inferred join conditions, it might lead to the child 
node's partitioning NOT satisfying the JOIN node's requirements which otherwise 
could have.

Isn't it an existing problem? the current constraint propagation framework 
infers as many predicates as possible, so we may already hit this problem. I 
think we should revisit the constraint propagation framework to think about how 
to avoid adding more shuffles, instead of stopping improving this framework to 
infer more predicates.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadat...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19129
  
So, it's removed before 2.0.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19110
  
**[Test build #81438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81438/testReport)**
 for PR 19110 at commit 
[`edcf85c`](https://github.com/apache/spark/commit/edcf85c08f25044520d43b919e0475e0f047001b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18628: [SPARK-18061][ThriftServer] Add spnego auth suppo...

2017-09-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18628


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-05 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/19132
  
Thanks for your recommendation @ajbozarth .Could you put a link for your 
pr?For the problem you mentioned,i have thought about them.
1ãFsHistoryServer will always use FS to get event log
2ãFor spark ui, my implementation will not access FS.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19110
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81436/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19110
  
**[Test build #81436 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81436/testReport)**
 for PR 19110 at commit 
[`7d0849e`](https://github.com/apache/spark/commit/7d0849eae7601eb3e24240cb8462985e95932f85).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-09-05 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18628
  
Thanks @jiangxb1987 , let me merge it to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19110
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17254
  
**[Test build #81437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81437/testReport)**
 for PR 17254 at commit 
[`36a3463`](https://github.com/apache/spark/commit/36a34632dbb000799c35727c00d1542d4bb1ce00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19110
  
**[Test build #81436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81436/testReport)**
 for PR 19110 at commit 
[`7d0849e`](https://github.com/apache/spark/commit/7d0849eae7601eb3e24240cb8462985e95932f85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137153437
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2000,4 +2000,38 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   assert(setOfPath.size() == pathSizeToDeleteOnExit)
 }
   }
+
+  test("SPARK-21912 ORC/Parquet table should not create invalid column 
names") {
+Seq(" ", ",", ";", "{", "}", "(", ")", "\n", "\t", "=").foreach { name 
=>
+  withTable("t21912") {
+Seq("ORC", "PARQUET").foreach { source =>
+  val m = intercept[AnalysisException] {
+sql(s"CREATE TABLE t21912(`col$name` INT) USING $source")
+  }.getMessage
+  assert(m.contains(s"contains invalid character(s)"))
+
+  val m2 = intercept[AnalysisException] {
+sql(s"CREATE TABLE t21912 USING $source AS SELECT 1 
`col$name`")
+  }.getMessage
+  assert(m2.contains(s"contains invalid character(s)"))
+
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
+val m3 = intercept[AnalysisException] {
+  sql(s"CREATE TABLE t21912(`col$name` INT) USING hive OPTIONS 
(fileFormat '$source')")
+}.getMessage
+assert(m3.contains(s"contains invalid character(s)"))
+  }
+}
+
+// TODO: After SPARK-21929, we need to check ORC, too.
+Seq("PARQUET").foreach { source =>
--- End diff --

I added only `Parquet` test case due to SPARK-21929.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137153372
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

For this command, it's not easy to get `CatalogTable` at 
`DataSourceStrategy`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] ORC/Parquet table should not create i...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81435/testReport)**
 for PR 19124 at commit 
[`c6e9ab6`](https://github.com/apache/spark/commit/c6e9ab6291dda034fe39263202ea5bc2373cd86c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19102: [SPARK-21859][CORE] Fix SparkFiles.get failed on ...

2017-09-05 Thread lgrcyanny

Github user lgrcyanny closed the pull request at:

https://github.com/apache/spark/pull/19102


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19117
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81433/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-05 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19086
  
@gatorsmile More comments on this ?
Regarding the behavior change, should we follow Spark previous behavior or 
follow Hive? I'm ok with both.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19117
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19140: [SPARK-21890] Credentials not being passed to add...

2017-09-05 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19140#discussion_r137152903
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 ---
@@ -103,15 +103,17 @@ private[deploy] class 
HadoopFSDelegationTokenProvider(fileSystems: Configuration
 
   private def getTokenRenewalInterval(
   hadoopConf: Configuration,
-  filesystems: Set[FileSystem]): Option[Long] = {
+  filesystems: Set[FileSystem],
+  creds:Credentials): Option[Long] = {
 // We cannot use the tokens generated with renewer yarn. Trying to 
renew
 // those will fail with an access control issue. So create new tokens 
with the logged in
 // user as renewer.
-val creds = fetchDelegationTokens(
+val fetchCreds = fetchDelegationTokens(
--- End diff --

That code was in `getTokenRenewalInterval`; that call is only needed when 
principal and keytab are provided, so adding the code back should be ok. It 
shouldn't cause any issues if it's not there, though, aside from a wasted round 
trip to the NNs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19117
  
**[Test build #81433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81433/testReport)**
 for PR 19117 at commit 
[`595e502`](https://github.com/apache/spark/commit/595e502e8bd6ac6570d0975188bd6039498ece2a).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19142: When the number of attempting to restart receiver greate...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19142
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19142: When the number of attempting to restart receiver...

2017-09-05 Thread liuxianjiao

GitHub user liuxianjiao opened a pull request:

https://github.com/apache/spark/pull/19142

When the number of attempting to restart receiver greater than 0,spark do 
nothing in 'else'

When the number of attempting to restart receiver greater than 0,spark do 
nothing in 'else'.So I think we should log trace to let users know why.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liuxianjiao/spark master0905

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19142


commit c4edc1b4304f5b540b576ea60e260f5caef303c2
Author: liuxianjiao 
Date:   2017-09-06T01:03:47Z

[SPARK-21930]When the number of attempting to restart receiver greater than 
0,spark do nothing in 'else'




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19135: [SPARK-21923][CORE]Avoid call reserveUnrollMemoryForThis...

2017-09-05 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/19135
  
hi @cloud-fan, The previous writing is the same as `putIteratorAsValues`. 
Now I have modified the code, each application for an additional `chunkSize` 
bytes of memory, because the size of `ChunkedByteBufferOutputStream` each 
growth is just `chunkSize`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18865
  
@jmchung, just to be clear, sure, let's go in this way and I guess we have 
the only comment left to address now:

> Please update the error message and also add it to the migration guide.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19124
  
I created SPARK-21929 for **"Support `ALTER TABLE table_name ADD 
COLUMNS(..)` for ORC data source"**.

For Parquet ALTER TABLE, yes. I think I can include that here.
But, for the title of PR, I'm not sure. It's not clear because it's partial.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81432/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81432/testReport)**
 for PR 19124 at commit 
[`8ee87dd`](https://github.com/apache/spark/commit/8ee87dd0d799d0e4504ca11c1f1d31f1141a0844).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19056: [SPARK-21765] Check that optimization doesn't affect isS...

2017-09-05 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/19056
  
LGTM. Will merge after tests pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19140
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19124
  
Could this PR cover this scenario?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81431/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18865
  
@gatorsmile, Thanks for elaborating this. Looks a fair point.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19140
  
**[Test build #81431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81431/testReport)**
 for PR 19140 at commit 
[`d72c08f`](https://github.com/apache/spark/commit/d72c08f72d02b2288e09566f191bfe310d6cfbc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19124
  
For that, no. It's not considered yet like the other code path.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19124
  
Altering table add column with illegal column names will issue an error 
message? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19124
  
Parquet works. I tested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19124
  
How about Parquet?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19050: [SPARK-21835][SQL] RewritePredicateSubquery shoul...

2017-09-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r137148036
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -98,6 +122,7 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
   }
+  dedupJoin(rewritten)
--- End diff --

After rethinking it, we can be more conservative. Instead of doing a dedup 
at the end, we should do it when we convert it to the `Join`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasource table...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19124
  
Ah, there are too many missing things in ORC code path. 
`AlterTableAddColumnsCommand` seems not to allow ORC in 
[verifyAlterTableAddColumn](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L237-L241
). It seems to be blocked by different reason, but it looks like we need to 
solve that first in order to add test cases. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-09-05 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19020
  
Looks good. cc @jkbradley Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19141: [SPARK-21384] [YARN] Spark 2.2 + YARN without spark.yarn...

2017-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19141
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19126: [SPARK-21915][ML][PySpark]Model 1 and Model 2 ParamMaps ...

2017-09-05 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19126
  
Yeah, I checked and this is not a problem in master since #17849 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19141: [SPARK-21384] [YARN] Spark 2.2 + YARN without spa...

2017-09-05 Thread devaraj-kavali

GitHub user devaraj-kavali opened a pull request:

https://github.com/apache/spark/pull/19141

[SPARK-21384] [YARN] Spark 2.2 + YARN without spark.yarn.jars / 
spark.yarn.archive fails

## What changes were proposed in this pull request?

When the libraries temp directory(i.e. __spark_libs__*.zip dir) file system 
and staging dir(destination) file systems are the same then the 
__spark_libs__*.zip is not copying to the staging directory. But after making 
this decision the libraries zip file is getting deleted immediately and 
becoming unavailable for the Node Manager's localization. 

This change removes the deletion of the libraries zip file immediately and 
allowing it to delete as part of the ShutdownHookManager deletion of paths.

## How was this patch tested?

I have verified it manually in yarn/cluster and yarn/client modes with hdfs 
and local file systems.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/devaraj-kavali/spark SPARK-21384

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19141


commit 208bb685cc899b705aadb7c5aba51334f2d340f0
Author: Devaraj K 
Date:   2017-09-06T00:22:54Z

[SPARK-21384] [YARN] Spark 2.2 + YARN without spark.yarn.jars /
spark.yarn.archive fails




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasourc...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137146867
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -130,10 +130,12 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case CreateTable(tableDesc, mode, None) if 
DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc)
   CreateDataSourceTableCommand(tableDesc, ignoreIfExists = mode == 
SaveMode.Ignore)
 
 case CreateTable(tableDesc, mode, Some(query))
 if query.resolved && DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc.copy(schema = query.schema))
   CreateDataSourceTableAsSelectCommand(tableDesc, mode, query)
 
 case InsertIntoTable(l @ LogicalRelation(_: InsertableRelation, _, _, 
_),
--- End diff --

So far, it looks different than CTAS.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC/Parquet datasourc...

2017-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137146817
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -130,10 +130,12 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case CreateTable(tableDesc, mode, None) if 
DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc)
   CreateDataSourceTableCommand(tableDesc, ignoreIfExists = mode == 
SaveMode.Ignore)
 
 case CreateTable(tableDesc, mode, Some(query))
 if query.resolved && DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc.copy(schema = query.schema))
   CreateDataSourceTableAsSelectCommand(tableDesc, mode, query)
 
 case InsertIntoTable(l @ LogicalRelation(_: InsertableRelation, _, _, 
_),
--- End diff --

Sorry, but I'm not sure when `INSERT INTO TABLE` has this kind of issue.
In case of `INSERT INTO`, the table already exists.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 514 matches

Mail list logo