date:20170906

[GitHub] spark issue #19141: [SPARK-21384] [YARN] Spark + YARN fails with LocalFileSy...

2017-09-06 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19141
  
I see, thanks for the explanation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

2017-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18981
  
Merged to master and branch-2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-06 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19086
  
It's not ok to follow Spark current behavior?(It will be different from 
Hive)
I make this pr because we are migrating from Hive to Spark and lots of our 
users are using this function.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19151: [SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery s...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19151
  
**[Test build #81489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81489/testReport)**
 for PR 19151 at commit 
[`f05f281`](https://github.com/apache/spark/commit/f05f281eb5fda2b68e7e5f7a1a61a87a7a4bc467).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18982
  
Hmmm, I can repeat the error with Python3, I'll look into it tomorrow


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18982
  
No problem @holdenk, I updated using `transform()` on the test.  See if it 
looks ok to you now (pending Jenkins). Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18982
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81487/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18982
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18982
  
**[Test build #81487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81487/testReport)**
 for PR 18982 at commit 
[`482c025`](https://github.com/apache/spark/commit/482c02507e38909e934a9f2b7ea06612eaea5ce0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17096
  
**[Test build #81488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81488/testReport)**
 for PR 17096 at commit 
[`830b4fe`](https://github.com/apache/spark/commit/830b4fe1f71befb97debd9286306b3f872eb1c09).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-09-06 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17096
  
Jenkins retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19151: [SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery s...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19151
  
**[Test build #81486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81486/testReport)**
 for PR 19151 at commit 
[`4fc4d05`](https://github.com/apache/spark/commit/4fc4d05fd8dfa5397f790051196893d2b6fb2ca5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18982
  
**[Test build #81487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81487/testReport)**
 for PR 18982 at commit 
[`482c025`](https://github.com/apache/spark/commit/482c02507e38909e934a9f2b7ea06612eaea5ce0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19151: [SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery s...

2017-09-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19151
  
cc @gatorsmile for review. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19151: [SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSub...

2017-09-06 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/19151

[SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery should not produce 
unresolved query plans

## What changes were proposed in this pull request?

This is a follow-up of #19050 to deal with `ExistenceJoin` case.

## How was this patch tested?

Added test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-21835-followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19151.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19151


commit 4fc4d05fd8dfa5397f790051196893d2b6fb2ca5
Author: Liang-Chi Hsieh 
Date:   2017-09-07T00:04:07Z

Deal with ExistenceJoin case.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18982
  
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19150
  
**[Test build #81485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81485/testReport)**
 for PR 19150 at commit 
[`ab339b3`](https://github.com/apache/spark/commit/ab339b31b311035ebb75e8f079000d306cab16b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18982
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81484/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18982
  
**[Test build #81484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81484/testReport)**
 for PR 18982 at commit 
[`482c025`](https://github.com/apache/spark/commit/482c02507e38909e934a9f2b7ea06612eaea5ce0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18982
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19150: [SPARK-21939][TEST] Use TimeLimits instead of Tim...

2017-09-06 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19150

[SPARK-21939][TEST] Use TimeLimits instead of Timeouts

## What changes were proposed in this pull request?

Since ScalaTest 3.0.0, `org.scalatest.concurrent.TimeLimits` is deprecated.
This PR replaces the deprecated one with 
`org.scalatest.concurrent.TimeLimits`.

```scala
-import org.scalatest.concurrent.Timeouts._
+import org.scalatest.concurrent.TimeLimits._
```

## How was this patch tested?

Pass the existing test suites.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-21939

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19150


commit ab339b31b311035ebb75e8f079000d306cab16b8
Author: Dongjoon Hyun 
Date:   2017-09-06T23:22:11Z

[SPARK-21939][TEST] Use TimeLimits instead of Timeouts




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18982
  
**[Test build #81484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81484/testReport)**
 for PR 18982 at commit 
[`482c025`](https://github.com/apache/spark/commit/482c02507e38909e934a9f2b7ea06612eaea5ce0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19140
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81477/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19140
  
**[Test build #81477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81477/testReport)**
 for PR 19140 at commit 
[`98f0ff2`](https://github.com/apache/spark/commit/98f0ff2a655c398e5b502ce2b340dfac88b385e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-06 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/19136
  
Thanks for pinging me. I left comments on the older PR, since other 
discussion was already there. If you'd prefer comments here, just let me know.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137416501
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

I think this PR passes the above two test cases, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137416371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

DDLSuite and HiveDDLSuite have them here.
  - 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala#L2045-L2070
  - 
https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L1736


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19149
  
**[Test build #81483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81483/testReport)**
 for PR 19149 at commit 
[`e5501e1`](https://github.com/apache/spark/commit/e5501e1f46317a82b915d952f4ee192e5eb8e61d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137415411
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

Could you add test cases and ensure the partitioning columns with special 
characters work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137413347
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.{SparkFunSuite, TestUtils}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkFunSuite with Timeouts 
{
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
--- End diff --

This single warehouse seems to make a failure. Maybe, Spark 2.0.2 tries to 
read 2.2.0?
```
build/sbt -Phive "project hive" "test-only 
*.HiveExternalCatalogVersionsSuite"
...
[info] - backward compatibility *** FAILED *** (17 seconds, 712 
milliseconds)
[info]   spark-submit returned with exit code 1.
...
[info]   2017-09-06 16:07:41.744 - stderr> Caused by: 
java.sql.SQLException: Database at 
/Users/dongjoon/PR-19148/target/tmp/warehouse-d2818ad2-f141-4fc7-bc68-e7f67c89f3f4/metastore_db
 has an incompatible format with the current version of the software.  The 
database was created by or upgraded by version 10.12.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137412267
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.{SparkFunSuite, TestUtils}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkFunSuite with Timeouts 
{
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  // NOTE: This is an expensive operation in terms of time (10 seconds+). 
Use sparingly.
+  // This is copied from org.apache.spark.deploy.SparkSubmitSuite
+  private def runSparkSubmit(args: Seq[String], sparkHomeOpt: 
Option[String] = None): Unit = {
+val sparkHome = sparkHomeOpt.getOrElse(
+  sys.props.getOrElse("spark.test.home", fail("spark.test.home is not 
set!")))
+val history = ArrayBuffer.empty[String]
+val sparkSubmit = if (Utils.isWindows) {
+  // On Windows, `ProcessBuilder.directory` does not change the 
current working directory.
+  new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath
+} else {
+  "./bin/spark-submit"
+}
+val commands = Seq(sparkSubmit) ++ args
+val commandLine = commands.mkString("'", "' '", "'")
+
+val builder = new ProcessBuilder(commands: _*).directory(new 
File(sparkHome))
+val env = builder.environment()
+env.put("SPARK_TESTING", "1")
+env.put("SPARK_HOME", sparkHome)
+
+def captureOutput(source: String)(line: String): Unit = {
+  // This test suite has some weird behaviors when executed on Jenkins:
+  //
+  // 1. Sometimes it gets extremely slow out of unknown reason on 
Jenkins.  Here we add a
+  //timestamp to provide more diagnosis information.
+  // 2. Log lines are not correctly redirected to unit-tests.log as 
expected, so here we print
+  //them out for debugging purposes.
+  val logLine = s"${new Timestamp(new Date().getTime)} - $source> 
$line"
+  // scalastyle:off println
+  println(logLine)
+  // scalastyle:on println
+  history += logLine
+}
+
+val process = builder.start()
+new ProcessOutputCapturer(process.getInputStream, 
captureOutput("stdout")).start()
+new ProcessOutputCapturer(process.getErrorStream, 
captureOutput("stderr")).start()
+
+try {
+  val exitCode = failAfter(300.seconds) { process.waitFor() }
+  if (exitCode != 0) {
+// include logs in output. Note that logging is async and may not 
have completed
+// at the time this exception is raised
+Thread.sleep(1000)
+val historyLog = history.mkString("\n")
+fail {

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectori...

2017-09-06 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18659#discussion_r137412019
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
 ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.python
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, TaskContext}
+import org.apache.spark.api.python.{ChainedPythonFunctions, 
PythonEvalType, PythonRunner}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.arrow.{ArrowConverters, ArrowPayload}
+import org.apache.spark.sql.types.{DataType, StructField, StructType}
+import org.apache.spark.util.Utils
+
+
+/**
+ * A physical plan that evaluates a [[PythonUDF]],
+ */
+case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], output: 
Seq[Attribute], child: SparkPlan)
+  extends SparkPlan {
+
+  def children: Seq[SparkPlan] = child :: Nil
+
+  override def producedAttributes: AttributeSet = 
AttributeSet(output.drop(child.output.length))
+
+  private def collectFunctions(udf: PythonUDF): (ChainedPythonFunctions, 
Seq[Expression]) = {
+udf.children match {
+  case Seq(u: PythonUDF) =>
+val (chained, children) = collectFunctions(u)
+(ChainedPythonFunctions(chained.funcs ++ Seq(udf.func)), children)
+  case children =>
+// There should not be any other UDFs, or the children can't be 
evaluated directly.
+assert(children.forall(_.find(_.isInstanceOf[PythonUDF]).isEmpty))
+(ChainedPythonFunctions(Seq(udf.func)), udf.children)
+}
+  }
+
+  protected override def doExecute(): RDD[InternalRow] = {
+val inputRDD = child.execute().map(_.copy())
+val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536)
+val reuseWorker = 
inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true)
+
+inputRDD.mapPartitions { iter =>
+
+  // The queue used to buffer input rows so we can drain it to
+  // combine input with output from Python.
+  val queue = HybridRowQueue(TaskContext.get().taskMemoryManager(),
+new File(Utils.getLocalDir(SparkEnv.get.conf)), 
child.output.length)
+  TaskContext.get().addTaskCompletionListener({ ctx =>
+queue.close()
+  })
+
+  val (pyFuncs, inputs) = udfs.map(collectFunctions).unzip
+
+  // flatten all the arguments
+  val allInputs = new ArrayBuffer[Expression]
+  val dataTypes = new ArrayBuffer[DataType]
+  val argOffsets = inputs.map { input =>
+input.map { e =>
+  if (allInputs.exists(_.semanticEquals(e))) {
+allInputs.indexWhere(_.semanticEquals(e))
+  } else {
+allInputs += e
+dataTypes += e.dataType
+allInputs.length - 1
+  }
+}.toArray
+  }.toArray
+  val projection = newMutableProjection(allInputs, child.output)
+  val schema = StructType(dataTypes.zipWithIndex.map { case (dt, i) =>
+StructField(s"_$i", dt)
+  })
+
+  // Input iterator to Python: input rows are grouped so we send them 
in batches to Python.
+  // For each row, add it to the queue.
+  val projectedRowIter = iter.map { inputRow =>
+queue.add(inputRow.asInstanceOf[UnsafeRow])
+projection(inputRow)
+  }
+
+  val context = TaskContext.get()
+
+  val inputIterator = ArrowConverters.toPayloadIterator(
+  projectedRowIter, schema, conf.arrowMaxRecordsPerBatch, context).
+map(_.asPythonSerializable)
+
+  val schemaOut = 
St

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18659
  
Cool!




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19112: [SPARK-21901][SS] Define toString for StateOperat...

2017-09-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19112


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19112: [SPARK-21901][SS] Define toString for StateOperatorProgr...

2017-09-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19112
  
LGTM. Thanks! Merging to master and 2.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19096: [SPARK-21869][SS] A cached Kafka producer should ...

2017-09-06 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19096#discussion_r137409030
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
 ---
@@ -43,8 +43,10 @@ private[kafka010] class KafkaWriteTask(
* Writes key value data out to topics.
*/
   def execute(iterator: Iterator[InternalRow]): Unit = {
-producer = CachedKafkaProducer.getOrCreate(producerConfiguration)
+val paramsSeq = CachedKafkaProducer.paramsToSeq(producerConfiguration)
 while (iterator.hasNext && failedWrite == null) {
+  // Prevent producer to get expired/evicted from guava 
cache.(SPARK-21869)
+  producer = CachedKafkaProducer.getOrCreate(paramsSeq)
--- End diff --

This is really hacky. I'm wondering if we can track if a producer is using 
or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18975
  
**[Test build #81482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81482/testReport)**
 for PR 18975 at commit 
[`6c24b1b`](https://github.com/apache/spark/commit/6c24b1be90fdf0e65c80ae24f81c75d34f7e1542).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12646
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81473/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12646
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12646
  
**[Test build #81473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81473/testReport)**
 for PR 12646 at commit 
[`859510e`](https://github.com/apache/spark/commit/859510e6d796d138b08b10507f25535a077beffb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-06 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19046
  
Yeah if we are releasing and they reacquiring right away over and over 
again that would be bad, but I don't know when we would do that so more details 
would be great if possible.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...

2017-09-06 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18253
  
@bOOm-X 

I pushed some code to my repo: 
https://github.com/vanzin/spark/tree/SPARK-18838

Which is an attempt to do things the way I've been trying to explain. It 
tries to keep changes as local as possible to `LiveListenerBus`, creating a few 
types that implement the grouping and asynchronous behavior. You could do 
filtering by extending the new `AsyncListener`, for example, and adding it to 
the live listener bus.

It's just a p.o.c. so I cut a few corners (like metrics), and I only ran 
`SparkListenerSuite`, but I'm just trying to show a different approach that 
leaves the `ListenerBus` hierarchy mostly the same as now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137406552
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2346,6 +2347,45 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("insert overwrite directory") {
--- End diff --

moved.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137406502
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
+import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => 
FileSinkDesc}
+
+// Base trait from which all hive insert statement physical execution 
extends.
+private[hive] trait SaveAsHiveFile extends DataWritingCommand {
+
+  protected def saveAsHiveFile(
+  sparkSession: SparkSession,
+  plan: SparkPlan,
+  hadoopConf: Configuration,
+  fileSinkConf: FileSinkDesc,
+  outputLocation: String,
+  partitionAttributes: Seq[Attribute] = Nil,
+  bucketSpec: Option[BucketSpec] = None,
+  options: Map[String, String] = Map.empty): Unit = {
--- End diff --

removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18975
  
**[Test build #81481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81481/testReport)**
 for PR 18975 at commit 
[`28fcb39`](https://github.com/apache/spark/commit/28fcb39028d93ec6ecea9eecf289c0e88b6c9ae6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18975
  
Looks pretty good! Will do the final pass after addressing the above 
comments. 

Thanks for your work!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectori...

2017-09-06 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18659#discussion_r137402913
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2112,7 +2113,7 @@ def wrapper(*args):
 
 
 @since(1.3)
-def udf(f=None, returnType=StringType()):
+def udf(f=None, returnType=StringType(), vectorized=False):
--- End diff --

@felixcheung does this fit your idea for a more generic decorator?  Not 
exclusively labeled as `pandas_udf`, just enable vectorization with a flag, 
e.g. `@udf(DoubleType(), vectorized=True)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137403025
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2346,6 +2347,45 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("insert overwrite directory") {
--- End diff --

DDLSuite.scala is becoming bigger and bigger.

Move these two data source only test cases to 
`org.apache.spark.sql.sources.InsertSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-06 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19046
  
I was told that the preemption issue was fixed in YARN (that's YARN-6210); 
so I don't think there's a need for this code currently (just use a fixed YARN 
if you want reliable preemption).

Wilfred is on vacation so I can ask him details, but:

> It turns out that the AM is releasing and then acquiring the reservations 
again and again until it has enough to run all tasks that it needs.

That could be an issue (don't remember details of the Spark allocator 
code), but at the same time it sounds like a different issue than this was 
trying to fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137401845
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
+import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => 
FileSinkDesc}
+
+// Base trait from which all hive insert statement physical execution 
extends.
+private[hive] trait SaveAsHiveFile extends DataWritingCommand {
+
+  protected def saveAsHiveFile(
+  sparkSession: SparkSession,
+  plan: SparkPlan,
+  hadoopConf: Configuration,
+  fileSinkConf: FileSinkDesc,
+  outputLocation: String,
+  partitionAttributes: Seq[Attribute] = Nil,
+  bucketSpec: Option[BucketSpec] = None,
+  options: Map[String, String] = Map.empty): Unit = {
--- End diff --

remove `bucketSpec ` and `options `? Add them back only when we need it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18659
  
**[Test build #81480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81480/testReport)**
 for PR 18659 at commit 
[`4f6c950`](https://github.com/apache/spark/commit/4f6c95092066ee31a670ca827fbb892ac66df870).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137401720
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
+import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => 
FileSinkDesc}
+
+// Base trait from which all hive insert statement physical execution 
extends.
+private[hive] trait SaveAsHiveFile extends DataWritingCommand {
+
+  protected def saveAsHiveFile(
+  sparkSession: SparkSession,
+  plan: SparkPlan,
+  hadoopConf: Configuration,
+  fileSinkConf: FileSinkDesc,
+  outputLocation: String,
+  partitionAttributes: Seq[Attribute] = Nil,
+  bucketSpec: Option[BucketSpec] = None,
+  options: Map[String, String] = Map.empty): Unit = {
+
+val sessionState = sparkSession.sessionState
--- End diff --

removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137401187
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -234,12 +82,8 @@ case class InsertIntoHiveTable(
   override def run(sparkSession: SparkSession, children: Seq[SparkPlan]): 
Seq[Row] = {
 assert(children.length == 1)
 
-val sessionState = sparkSession.sessionState
 val externalCatalog = sparkSession.sharedState.externalCatalog
-val hiveVersion = 
externalCatalog.asInstanceOf[HiveExternalCatalog].client.version
-val hadoopConf = sessionState.newHadoopConf()
-val stagingDir = hadoopConf.get("hive.exec.stagingdir", 
".hive-staging")
--- End diff --

ok. added it back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137401217
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
+import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => 
FileSinkDesc}
+
+// Base trait from which all hive insert statement physical execution 
extends.
+private[hive] trait SaveAsHiveFile extends DataWritingCommand {
+
+  protected def saveAsHiveFile(
+  sparkSession: SparkSession,
+  plan: SparkPlan,
+  hadoopConf: Configuration,
+  fileSinkConf: FileSinkDesc,
+  outputLocation: String,
+  partitionAttributes: Seq[Attribute] = Nil,
+  bucketSpec: Option[BucketSpec] = None,
+  options: Map[String, String] = Map.empty): Unit = {
+
+val sessionState = sparkSession.sessionState
--- End diff --

remove this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18659
  
**[Test build #81479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81479/testReport)**
 for PR 18659 at commit 
[`fdea603`](https://github.com/apache/spark/commit/fdea603ae0ac6a8c27ec8161920f8c77549784e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19149
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81476/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19149
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19149
  
**[Test build #81476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81476/testReport)**
 for PR 19149 at commit 
[`6fc7140`](https://github.com/apache/spark/commit/6fc714051bed53e01941c2d661ce955137534c95).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137399745
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -234,12 +82,8 @@ case class InsertIntoHiveTable(
   override def run(sparkSession: SparkSession, children: Seq[SparkPlan]): 
Seq[Row] = {
 assert(children.length == 1)
 
-val sessionState = sparkSession.sessionState
 val externalCatalog = sparkSession.sharedState.externalCatalog
-val hiveVersion = 
externalCatalog.asInstanceOf[HiveExternalCatalog].client.version
-val hadoopConf = sessionState.newHadoopConf()
-val stagingDir = hadoopConf.get("hive.exec.stagingdir", 
".hive-staging")
--- End diff --

How about keeping `stagingDir `?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-06 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19046
  
It might help if you can give an exact scenario where you see the issue and 
perhaps configs if those matter.  meaning do you have like some set of minimum 
containers, a small idle timeout, etc..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137399408
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala 
---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.{File, IOException}
+import java.net.URI
+import java.text.SimpleDateFormat
+import java.util.{Date, Locale, Random}
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.exec.TaskRunner
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.command.RunnableCommand
+import org.apache.spark.sql.hive.HiveExternalCatalog
+import org.apache.spark.sql.hive.client.HiveVersion
+
+// Base trait for getting a temporary location for writing data
+private[hive] trait HiveTmpPath extends RunnableCommand {
+
+  var createdTempDir: Option[Path] = None
+
+  private var stagingDir: String = ""
+
+  def getExternalTmpPath(
+  sparkSession: SparkSession,
+  hadoopConf: Configuration,
+  path: Path): Path = {
+import org.apache.spark.sql.hive.client.hive._
+
+// Before Hive 1.1, when inserting into a table, Hive will create the 
staging directory under
+// a common scratch directory. After the writing is finished, Hive 
will simply empty the table
+// directory and move the staging directory to it.
+// After Hive 1.1, Hive will create the staging directory under the 
table directory, and when
+// moving staging directory to table directory, Hive will still empty 
the table directory, but
+// will exclude the staging directory there.
+// We have to follow the Hive behavior here, to avoid troubles. For 
example, if we create
+// staging directory under the table director for Hive prior to 1.1, 
the staging directory will
+// be removed by Hive when Hive is trying to empty the table directory.
+val hiveVersionsUsingOldExternalTempPath: Set[HiveVersion] = Set(v12, 
v13, v14, v1_0)
+val hiveVersionsUsingNewExternalTempPath: Set[HiveVersion] = Set(v1_1, 
v1_2, v2_0, v2_1)
+
+// Ensure all the supported versions are considered here.
+assert(hiveVersionsUsingNewExternalTempPath ++ 
hiveVersionsUsingOldExternalTempPath ==
+  allSupportedHiveVersions)
+
+val externalCatalog = sparkSession.sharedState.externalCatalog
+val hiveVersion = 
externalCatalog.asInstanceOf[HiveExternalCatalog].client.version
+stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
+val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+
+if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
+  oldVersionExternalTempPath(path, hadoopConf, scratchDir)
+} else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) 
{
+  newVersionExternalTempPath(path, hadoopConf, stagingDir)
+} else {
+  throw new IllegalStateException("Unsupported hive version: " + 
hiveVersion.fullVersion)
+}
+  }
+
+  def deleteExternalTmpPath(hadoopConf : Configuration) : Unit = {
+// Attempt to delete the staging directory and the inclusive files. If 
failed, the files are
+// expected to be dropped at the normal termination of VM since 
deleteOnExit is used.
+try {
+  createdTempDir.foreach { path =>
+val fs = path.getFileSystem(hadoopConf)
+if (fs.delete(path, true)) {
+  // If we successfully delete the staging directory, remove it 
from FileSystem's cache.
+  fs.cancelDeleteOnExit(path)
+}
+  }
+} catch {
+  case NonFatal(e) =>
+logWarning(s"Unable to delete staging directory: $stagi

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19140
  
LGTM pending tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137398964
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala 
---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.{File, IOException}
+import java.net.URI
+import java.text.SimpleDateFormat
+import java.util.{Date, Locale, Random}
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.exec.TaskRunner
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.command.RunnableCommand
+import org.apache.spark.sql.hive.HiveExternalCatalog
+import org.apache.spark.sql.hive.client.HiveVersion
+
+// Base trait for getting a temporary location for writing data
+private[hive] trait HiveTmpPath extends RunnableCommand {
+
+  var createdTempDir: Option[Path] = None
+
+  private var stagingDir: String = ""
+
+  def getExternalTmpPath(
+  sparkSession: SparkSession,
+  hadoopConf: Configuration,
+  path: Path): Path = {
+import org.apache.spark.sql.hive.client.hive._
+
+// Before Hive 1.1, when inserting into a table, Hive will create the 
staging directory under
+// a common scratch directory. After the writing is finished, Hive 
will simply empty the table
+// directory and move the staging directory to it.
+// After Hive 1.1, Hive will create the staging directory under the 
table directory, and when
+// moving staging directory to table directory, Hive will still empty 
the table directory, but
+// will exclude the staging directory there.
+// We have to follow the Hive behavior here, to avoid troubles. For 
example, if we create
+// staging directory under the table director for Hive prior to 1.1, 
the staging directory will
+// be removed by Hive when Hive is trying to empty the table directory.
+val hiveVersionsUsingOldExternalTempPath: Set[HiveVersion] = Set(v12, 
v13, v14, v1_0)
+val hiveVersionsUsingNewExternalTempPath: Set[HiveVersion] = Set(v1_1, 
v1_2, v2_0, v2_1)
+
+// Ensure all the supported versions are considered here.
+assert(hiveVersionsUsingNewExternalTempPath ++ 
hiveVersionsUsingOldExternalTempPath ==
+  allSupportedHiveVersions)
+
+val externalCatalog = sparkSession.sharedState.externalCatalog
+val hiveVersion = 
externalCatalog.asInstanceOf[HiveExternalCatalog].client.version
+stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
+val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+
+if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
+  oldVersionExternalTempPath(path, hadoopConf, scratchDir)
+} else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) 
{
+  newVersionExternalTempPath(path, hadoopConf, stagingDir)
+} else {
+  throw new IllegalStateException("Unsupported hive version: " + 
hiveVersion.fullVersion)
+}
+  }
+
+  def deleteExternalTmpPath(hadoopConf : Configuration) : Unit = {
+// Attempt to delete the staging directory and the inclusive files. If 
failed, the files are
+// expected to be dropped at the normal termination of VM since 
deleteOnExit is used.
+try {
+  createdTempDir.foreach { path =>
+val fs = path.getFileSystem(hadoopConf)
+if (fs.delete(path, true)) {
+  // If we successfully delete the staging directory, remove it 
from FileSystem's cache.
+  fs.cancelDeleteOnExit(path)
+}
+  }
+} catch {
+  case NonFatal(e) =>
+logWarning(s"Unable to delete staging directory: $stagi

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137398917
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -360,6 +360,27 @@ case class InsertIntoTable(
 }
 
 /**
+ * Insert query result into a directory.
--- End diff --

added


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137398869
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -360,6 +360,27 @@ case class InsertIntoTable(
 }
 
 /**
+ * Insert query result into a directory.
+ *
+ * @param isLocal Indicates whether the specified directory is local 
directory
+ * @param storage Info about output file, row and what serialization format
+ * @param provider Specifies what data source to use; only used for data 
source file.
+ * @param child The query to be executed
+ * @param overwrite If true, the existing directory will be overwritten
+ */
+case class InsertIntoDir(
+isLocal: Boolean,
+storage: CatalogStorageFormat,
+provider: Option[String],
+child: LogicalPlan,
+overwrite: Boolean = true)
+  extends LogicalPlan {
+
+  override def children: Seq[LogicalPlan] = child :: Nil
+  override def output: Seq[Attribute] = Seq.empty
--- End diff --

updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137398776
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala 
---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.{File, IOException}
+import java.net.URI
+import java.text.SimpleDateFormat
+import java.util.{Date, Locale, Random}
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.exec.TaskRunner
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.command.RunnableCommand
+import org.apache.spark.sql.hive.HiveExternalCatalog
+import org.apache.spark.sql.hive.client.HiveVersion
+
+// Base trait for getting a temporary location for writing data
+private[hive] trait HiveTmpPath extends RunnableCommand {
--- End diff --

removed RunnableCommand, the classes that exten HiveTmpPath already extends 
RunnableCommand


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137398299
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -178,11 +179,60 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Add an INSERT INTO [TABLE]/INSERT OVERWRITE TABLE operation to the 
logical plan.
+   * Parameters used for writing query to a table:
+   *   (tableIdentifier, partitionKeys, overwrite, exists).
+   */
+  type InsertTableParams = (TableIdentifier, Map[String, Option[String]], 
Boolean, Boolean)
+
+  /**
+   * Parameters used for writing query to a directory: (isLocal, 
CatalogStorageFormat, provider).
+   */
+  type InsertDirParams = (Boolean, CatalogStorageFormat, Option[String])
+
+  /**
+   * Add an
+   *   INSERT INTO [TABLE] or
+   *   INSERT OVERWRITE TABLE or
+   *   INSERT OVERWRITE [LOCAL] DIRECTORY
+   * operation to logical plan
*/
   private def withInsertInto(
   ctx: InsertIntoContext,
   query: LogicalPlan): LogicalPlan = withOrigin(ctx) {
+ctx match {
+  case table : InsertIntoTableContext =>
--- End diff --

updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19149
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19149
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81475/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137397811
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -178,11 +179,60 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Add an INSERT INTO [TABLE]/INSERT OVERWRITE TABLE operation to the 
logical plan.
+   * Parameters used for writing query to a table:
+   *   (tableIdentifier, partitionKeys, overwrite, exists).
+   */
+  type InsertTableParams = (TableIdentifier, Map[String, Option[String]], 
Boolean, Boolean)
+
+  /**
+   * Parameters used for writing query to a directory: (isLocal, 
CatalogStorageFormat, provider).
+   */
+  type InsertDirParams = (Boolean, CatalogStorageFormat, Option[String])
+
+  /**
+   * Add an
+   *   INSERT INTO [TABLE] or
+   *   INSERT OVERWRITE TABLE or
+   *   INSERT OVERWRITE [LOCAL] DIRECTORY
--- End diff --

added


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19149
  
**[Test build #81475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81475/testReport)**
 for PR 19149 at commit 
[`0be3b67`](https://github.com/apache/spark/commit/0be3b670713e3de5951b0c3d7abb53f2cbe9d96a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-06 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19046
  
Unfortunately that isn't clear to me as to the cause or what the issue is 
with the yarn side.
I'm not sure what he means by "AM is releasing and then acquiring the 
reservations again and again until it has enough to run all tasks that it needs"

Also do you know if "Due to the sustained backlog trigger doubling the 
request size each time it floods the scheduler with requests."  Is talking 
about the exponential increase in the container reqeusts that spark does?  ie 
if we did all the requests at once up front would it be better.

I would also like to know what issue this causes in yarn.

His last statement is also not always true.  MR AM only takes headroom into 
account with slowstart and preempting reduces to run maps.  It may very well be 
that you are using slow start and if you have it configured very aggressive 
(meaning start reduces very early) it could be spread out quite a bit but you 
are also possibly wasting resources for the tradeoff of maybe finishing sooner 
depending on the job.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81478/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18659
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18659
  
**[Test build #81478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81478/testReport)**
 for PR 18659 at commit 
[`3efa7f2`](https://github.com/apache/spark/commit/3efa7f2cd9ea222ff27f9d39420c1c2c256d4fc2).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19135: [SPARK-21923][CORE]Avoid call reserveUnrollMemoryForThis...

2017-09-06 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19135
  
It would be great to test the perf on executors with various 
corsPerExecutor settings to ensure we don't bring in regressions by the code 
change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18659
  
**[Test build #81478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81478/testReport)**
 for PR 18659 at commit 
[`3efa7f2`](https://github.com/apache/spark/commit/3efa7f2cd9ea222ff27f9d39420c1c2c256d4fc2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-06 Thread Tagar

Github user Tagar commented on the issue:

https://github.com/apache/spark/pull/19046
  
@tgravescs, here's quote from Wilfred Spiegelenburg - hope it answers both 
of your questions.

> The behaviour I discussed earlier around the Spark AM reservations is not 
optimal. It turns out that the AM is releasing and then acquiring the 
reservations again and again until it has enough to run all tasks that it 
needs. This seems to be triggered by getting a container assigned to the 
application by the scheduler. Due to the sustained backlog trigger doubling the 
request size each time it floods the scheduler with requests. This issue will 
be logged as an internal jira at first. The next steps will be to discuss that 
behaviour with the Spark team with the goal of making it behave better on the 
cluster. The MR AM does behave better in this respect as it takes into account 
the available resources for the application via what is called "headroom". The 
Spark AM does not do this. 

thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-06 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19046
  
@Tagar   can you be more specific about the problems you are seeing?  how 
does this affect preemption?  Why don't you see the same issues on 
MapReduce/Tez?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137390245
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

It's okay. Inside `checkDataSchemaFieldNames`, we only uses 
`table.dataSchema` like the following.
```
ParquetSchemaConverter.checkFieldNames(table.dataSchema)
```

For the partition columns, we have been allowing the special characters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137388410
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -360,6 +360,27 @@ case class InsertIntoTable(
 }
 
 /**
+ * Insert query result into a directory.
--- End diff --

Please add the comment like
> Note that this plan is unresolved and has to be replaced by the concrete 
implementations during analysis.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137388263
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -360,6 +360,27 @@ case class InsertIntoTable(
 }
 
 /**
+ * Insert query result into a directory.
+ *
+ * @param isLocal Indicates whether the specified directory is local 
directory
+ * @param storage Info about output file, row and what serialization format
+ * @param provider Specifies what data source to use; only used for data 
source file.
+ * @param child The query to be executed
+ * @param overwrite If true, the existing directory will be overwritten
+ */
+case class InsertIntoDir(
+isLocal: Boolean,
+storage: CatalogStorageFormat,
+provider: Option[String],
+child: LogicalPlan,
+overwrite: Boolean = true)
+  extends LogicalPlan {
+
+  override def children: Seq[LogicalPlan] = child :: Nil
+  override def output: Seq[Attribute] = Seq.empty
--- End diff --

Set `override lazy val resolved: Boolean = false`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137387725
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala 
---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.{File, IOException}
+import java.net.URI
+import java.text.SimpleDateFormat
+import java.util.{Date, Locale, Random}
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.exec.TaskRunner
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.command.RunnableCommand
+import org.apache.spark.sql.hive.HiveExternalCatalog
+import org.apache.spark.sql.hive.client.HiveVersion
+
+// Base trait for getting a temporary location for writing data
+private[hive] trait HiveTmpPath extends RunnableCommand {
--- End diff --

No need to extend `RunnableCommand `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137387842
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala 
---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.{File, IOException}
+import java.net.URI
+import java.text.SimpleDateFormat
+import java.util.{Date, Locale, Random}
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.exec.TaskRunner
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.command.RunnableCommand
+import org.apache.spark.sql.hive.HiveExternalCatalog
+import org.apache.spark.sql.hive.client.HiveVersion
+
+// Base trait for getting a temporary location for writing data
+private[hive] trait HiveTmpPath extends RunnableCommand {
--- End diff --

Instead, we should extend it in the class that extend HiveTmpPath


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137386317
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -178,11 +179,60 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Add an INSERT INTO [TABLE]/INSERT OVERWRITE TABLE operation to the 
logical plan.
+   * Parameters used for writing query to a table:
+   *   (tableIdentifier, partitionKeys, overwrite, exists).
+   */
+  type InsertTableParams = (TableIdentifier, Map[String, Option[String]], 
Boolean, Boolean)
+
+  /**
+   * Parameters used for writing query to a directory: (isLocal, 
CatalogStorageFormat, provider).
+   */
+  type InsertDirParams = (Boolean, CatalogStorageFormat, Option[String])
+
+  /**
+   * Add an
+   *   INSERT INTO [TABLE] or
+   *   INSERT OVERWRITE TABLE or
+   *   INSERT OVERWRITE [LOCAL] DIRECTORY
+   * operation to logical plan
*/
   private def withInsertInto(
   ctx: InsertIntoContext,
   query: LogicalPlan): LogicalPlan = withOrigin(ctx) {
+ctx match {
+  case table : InsertIntoTableContext =>
--- End diff --

Nit: `table :`-> `table:`
The same issue in line 206


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137385849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -178,11 +179,60 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Add an INSERT INTO [TABLE]/INSERT OVERWRITE TABLE operation to the 
logical plan.
+   * Parameters used for writing query to a table:
+   *   (tableIdentifier, partitionKeys, overwrite, exists).
+   */
+  type InsertTableParams = (TableIdentifier, Map[String, Option[String]], 
Boolean, Boolean)
+
+  /**
+   * Parameters used for writing query to a directory: (isLocal, 
CatalogStorageFormat, provider).
+   */
+  type InsertDirParams = (Boolean, CatalogStorageFormat, Option[String])
+
+  /**
+   * Add an
+   *   INSERT INTO [TABLE] or
+   *   INSERT OVERWRITE TABLE or
+   *   INSERT OVERWRITE [LOCAL] DIRECTORY
--- End diff --

Let us put the complete syntax here.
```
{{{
  INSERT OVERWRITE TABLE tableIdentifier [partitionSpec [IF NOT EXISTS]]?   

  INSERT INTO [TABLE] tableIdentifier [partitionSpec]   
   
  INSERT OVERWRITE [LOCAL] DIRECTORY path=STRING [rowFormat] 
[createFileFormat]
  INSERT OVERWRITE [LOCAL] DIRECTORY [path=STRING] tableProvider [OPTIONS 
options=tablePropertyList]
}}}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18975
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81468/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18975
  
**[Test build #81468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81468/testReport)**
 for PR 18975 at commit 
[`b461e00`](https://github.com/apache/spark/commit/b461e00e425660e33fdbc24a75884a2a2e2da4b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19086
  
After more discussion with others, this behavior change really bothers us. 
Both changes do not sound good to us. Where is the requirement from?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137383267
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

`newSchema ` also contains `partition schema`. How about partition schema? 
Do we have the same limits on it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19140
  
**[Test build #81477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81477/testReport)**
 for PR 19140 at commit 
[`98f0ff2`](https://github.com/apache/spark/commit/98f0ff2a655c398e5b502ce2b340dfac88b385e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

2017-09-06 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19083#discussion_r137378351
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -98,19 +99,23 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("SPARK-18091: split large if expressions into blocks due to JVM 
code size limit") {
-var strExpr: Expression = Literal("abc")
-for (_ <- 1 to 150) {
-  strExpr = Decode(Encode(strExpr, "utf-8"), "utf-8")
-}
+// Set the max value at `WHOLESTAGE_HUGE_METHOD_LIMIT` to compile 
gen'd code by janino
+withSQLConf(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key -> 
Int.MaxValue.toString) {
--- End diff --

Why do we need to change value for `WHOLESTAGE_HUGE_METHOD_LIMIT` while 
this test is not for whole-stage codegen?
We could select better naming for `SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-06 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17435
  
Gentle ping (going through old PySpark PRs) :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19149
  
**[Test build #81476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81476/testReport)**
 for PR 19149 at commit 
[`6fc7140`](https://github.com/apache/spark/commit/6fc714051bed53e01941c2d661ce955137534c95).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction betwe...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19149
  
**[Test build #81475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81475/testReport)**
 for PR 19149 at commit 
[`0be3b67`](https://github.com/apache/spark/commit/0be3b670713e3de5951b0c3d7abb53f2cbe9d96a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19140
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19140
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81474/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19140: [SPARK-21890] Credentials not being passed to add the to...

2017-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19140
  
**[Test build #81474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81474/testReport)**
 for PR 19140 at commit 
[`1184a73`](https://github.com/apache/spark/commit/1184a735124d6119f515257a75b0721efc3adff9).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

101 - 200 of 502 matches

Mail list logo