date:20170821

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18975
  
**[Test build #80959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80959/testReport)**
 for PR 18975 at commit 
[`0882dd1`](https://github.com/apache/spark/commit/0882dd1f3c300f832d731b69a0d57ef461e55038).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19015
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19015
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80956/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18704
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19015
  
**[Test build #80956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80956/testReport)**
 for PR 19015 at commit 
[`191bde1`](https://github.com/apache/spark/commit/191bde194bbb56c40f5d33e8fbaf5c3505d792cc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18704
  
**[Test build #80958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80958/testReport)**
 for PR 18704 at commit 
[`a24a971`](https://github.com/apache/spark/commit/a24a971ed61f054766e3ed8212c2035f1d391d54).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18704
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80958/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18704
  
**[Test build #80958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80958/testReport)**
 for PR 18704 at commit 
[`a24a971`](https://github.com/apache/spark/commit/a24a971ed61f054766e3ed8212c2035f1d391d54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-08-21 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18641
  
ping @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18492: [SPARK-19326] Speculated task attempts do not get...

2017-08-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18492#discussion_r134387763
  
--- Diff: 
core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala ---
@@ -188,6 +188,40 @@ class ExecutorAllocationManagerSuite
 assert(numExecutorsTarget(manager) === 10)
   }
 
+  test("add executors when speculative tasks added") {
+sc = createSparkContext(0, 10, 0)
+val manager = sc.executorAllocationManager.get
+
+// Verify that we're capped at number of tasks including the 
speculative ones in the stage
+sc.listenerBus.postToAll(SparkListenerSpeculativeTaskSubmitted(1))
+assert(numExecutorsTarget(manager) === 0)
+assert(numExecutorsToAdd(manager) === 1)
+assert(addExecutors(manager) === 1)
+sc.listenerBus.postToAll(SparkListenerSpeculativeTaskSubmitted(1))
+sc.listenerBus.postToAll(SparkListenerSpeculativeTaskSubmitted(1))
+
sc.listenerBus.postToAll(SparkListenerStageSubmitted(createStageInfo(1, 2)))
+assert(numExecutorsTarget(manager) === 1)
+assert(numExecutorsToAdd(manager) === 2)
+assert(addExecutors(manager) === 2)
+assert(numExecutorsTarget(manager) === 3)
+assert(numExecutorsToAdd(manager) === 4)
+assert(addExecutors(manager) === 2)
+assert(numExecutorsTarget(manager) === 5)
+assert(numExecutorsToAdd(manager) === 1)
+
+// Verify that running a task doesn't affect the target
--- End diff --

can you explain more about this test? Why the first 3 
`SparkListenerSpeculativeTaskSubmitted` events can trigger to allocate more 
executors, but here we don't?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading remote r...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18962
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading remote r...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18962
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80951/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading remote r...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18962
  
**[Test build #80951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80951/testReport)**
 for PR 18962 at commit 
[`16ce99f`](https://github.com/apache/spark/commit/16ce99fc1cea9260a96dae98f031bda9f8ed18f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19015
  
**[Test build #80957 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80957/testReport)**
 for PR 19015 at commit 
[`391c4db`](https://github.com/apache/spark/commit/391c4db6cc338f3fcbf8e8d4fd43e3a6dcb365ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80952/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80952/testReport)**
 for PR 18973 at commit 
[`8857cf5`](https://github.com/apache/spark/commit/8857cf51f142865063c53e4a7089dd027db4d3c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSui...

2017-08-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19015#discussion_r134385759
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandSuite.scala
 ---
@@ -22,19 +22,26 @@ import java.util.Locale
 
 import scala.reflect.{classTag, ClassTag}
 
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans
+import org.apache.spark.sql.catalyst.dsl.plans.DslLogicalPlan
+import org.apache.spark.sql.catalyst.expressions.JsonTuple
 import org.apache.spark.sql.catalyst.parser.ParseException
 import org.apache.spark.sql.catalyst.plans.PlanTest
-import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.catalyst.plans.logical.{Generate, LogicalPlan, 
Project, ScriptTransformation}
 import org.apache.spark.sql.execution.SparkSqlParser
 import org.apache.spark.sql.execution.datasources.CreateTable
 import org.apache.spark.sql.internal.{HiveSerDe, SQLConf}
+import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType}
 
 
 // TODO: merge this with DDLSuite (SPARK-14441)
-class DDLCommandSuite extends PlanTest {
+class DDLCommandSuite extends PlanTest with SharedSQLContext {
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19015
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSui...

2017-08-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19015#discussion_r134385680
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandSuite.scala
 ---
@@ -22,19 +22,26 @@ import java.util.Locale
 
 import scala.reflect.{classTag, ClassTag}
 
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans
+import org.apache.spark.sql.catalyst.dsl.plans.DslLogicalPlan
+import org.apache.spark.sql.catalyst.expressions.JsonTuple
 import org.apache.spark.sql.catalyst.parser.ParseException
 import org.apache.spark.sql.catalyst.plans.PlanTest
-import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.catalyst.plans.logical.{Generate, LogicalPlan, 
Project, ScriptTransformation}
 import org.apache.spark.sql.execution.SparkSqlParser
 import org.apache.spark.sql.execution.datasources.CreateTable
 import org.apache.spark.sql.internal.{HiveSerDe, SQLConf}
+import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType}
 
 
 // TODO: merge this with DDLSuite (SPARK-14441)
-class DDLCommandSuite extends PlanTest {
+class DDLCommandSuite extends PlanTest with SharedSQLContext {
--- End diff --

shall we rename it to `DDLParserSuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18957: [SPARK-21744][CORE] Add retry logic for new broadcast in...

2017-08-21 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18957
  
Currently we don't retry for Broadcast, maybe it's positive to add a 
general-purpose retry logic for Broadcast so it would keep consistent with 
spark Job. We shouldn't just retry for a specific senario though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18974: [SPARK-21750][SQL] Use Arrow 0.6.0

2017-08-21 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18974
  
Thanks for this @kiszk.  I was thinking we would need to do an upgrade for 
DecimalType support.  I'm going to help out with that on the Arrow side, but it 
still might not be ready until 1 or 2 more releases.  I'm not sure what the 
general Spark stance is on updating dependencies like Arrow, but I can say that 
I did test 0.6 myself and did not see anything that might cause issues.  Maybe 
someone else can share the policies on upgrading?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-21 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17849
  
What do you think about this ? @jkbradley


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18968
  
After we correctly define the data type of `ListQuery`, can we remove the 
special handling of `ListQuery` in `In.checkInputDataTypes`?

we can add `ListQuery.childOutputs: Seq[Attribute]`, so that even we extend 
the project list of `ListQuery.plan`, we still keep the corrected data type:
```
def dataType = if (childOutputs.length > 1) childOutputs.toStructType else 
childOutputs.head.dataType
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-08-21 Thread janewangfb

Github user janewangfb commented on the issue:

https://github.com/apache/spark/pull/18975
  
@gatorsmile plan-pasring unittests are already added in DDLCommandSuite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134383117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -142,10 +142,14 @@ object UnsupportedOperationChecker {
 "Distinct aggregations are not supported on streaming 
DataFrames/Datasets. Consider " +
   "using approx_count_distinct() instead.")
 
+
--- End diff --

reverted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-21 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18945
  
Thanks for clarifying @HyukjinKwon , I see what you mean now.  Since pandas 
will iterate over `self.collect()` anyway I don't think your solution would 
impact performance at all right?  So your way might be better, but it is 
slightly more complicated..

Just to sum things up - @logannc does this still meet your requirements? 
Instead of having the `strict = True` option we do the following:
```
for each nullable int32 column:
if there are null values:
change column type to float32
else:
change column type to int32
```

I'm also guessing we will have the same problem with nullable ShortType - 
maybe others? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382990
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.datasources._
+
+/**
+ * A command used to write the result of a query to a directory.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   INSERT OVERWRITE DIRECTORY (path=STRING)?
+ *   USING format OPTIONS ([option1_name "option1_value", option2_name 
"option2_value", ...])
+ *   SELECT ...
+ * }}}
+ */
+case class InsertIntoDataSourceDirCommand(
+storage: CatalogStorageFormat,
+provider: Option[String],
+query: LogicalPlan) extends RunnableCommand {
+
+  override def innerChildren: Seq[LogicalPlan] = Seq(query)
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+assert(innerChildren.length == 1)
+assert(!storage.locationUri.isEmpty)
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134383033
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -1509,4 +1509,84 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
   query: LogicalPlan): LogicalPlan = {
 RepartitionByExpression(expressions, query, conf.numShufflePartitions)
   }
+
+  /**
+   * Return the parameters for [[InsertIntoDir]] logical plan.
+   *
+   * Expected format:
+   * {{{
+   *   INSERT OVERWRITE DIRECTORY
+   *   [path]
+   *   [OPTIONS table_property_list]
+   *   select_statement;
+   * }}}
+   */
+  override def visitInsertOverwriteDir(
+  ctx: InsertOverwriteDirContext): InsertDirParams = withOrigin(ctx) {
+val options = 
Option(ctx.options).map(visitPropertyKeyValues).getOrElse(Map.empty)
+var storage = DataSource.buildStorageFormatFromOptions(options)
+
+val path = Option(ctx.path) match {
+  case Some(s) => string(s)
+  case None => ""
+}
+
+if (!path.isEmpty && storage.locationUri.isDefined) {
+  throw new ParseException(
+"Directory path and 'path' in OPTIONS are both used to indicate 
the directory path, " +
+  "you can only specify one of them.", ctx)
+}
+if (path.isEmpty && !storage.locationUri.isDefined) {
+  throw new ParseException(
+"You need to specify directory path or 'path' in OPTIONS, but not 
both", ctx)
+}
+
+if (!path.isEmpty) {
+  val customLocation = Some(CatalogUtils.stringToURI(path))
+  storage = storage.copy(locationUri = customLocation)
+}
+
+val provider = ctx.tableProvider.qualifiedName.getText
+
+(false, storage, Some(provider))
+  }
+
+  /**
+   * Return the parameters for [[InsertIntoDir]] logical plan.
+   *
+   * Expected format:
+   * {{{
+   *   INSERT OVERWRITE DIRECTORY
+   *   path
--- End diff --

added [LOCAL]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382855
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2040,4 +2040,80 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   assert(setOfPath.size() == pathSizeToDeleteOnExit)
 }
   }
+
+  test("insert overwrite to dir from hive metastore table") {
+import org.apache.spark.util.Utils
+
+val path = Utils.createTempDir()
+path.delete()
+checkAnswer(
+  sql(s"INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}' SELECT * 
FROM src where key < 10"),
+  Seq.empty[Row])
+
+checkAnswer(
+  sql(s"""INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}'
+ |STORED AS orc
+ |SELECT * FROM src where key < 10""".stripMargin),
+  Seq.empty[Row])
+
+// use orc data source to check the data of path is right.
+sql(
+  s"""CREATE TEMPORARY TABLE orc_source
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${path.getCanonicalPath}'
+ |)
+   """.stripMargin)
+checkAnswer(
+  sql("select * from orc_source"),
+  sql("select * from src where key < 10").collect()
+)
+
+Utils.deleteRecursively(path)
+dropTempTable("orc_source")
+  }
+
+  test("insert overwrite to dir from temp table") {
+import org.apache.spark.util.Utils
+
+sparkContext
+  .parallelize(1 to 10)
+  .map(i => TestData(i, i.toString))
+  .toDF()
+  .registerTempTable("test_insert_table")
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382822
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2040,4 +2040,80 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   assert(setOfPath.size() == pathSizeToDeleteOnExit)
 }
   }
+
+  test("insert overwrite to dir from hive metastore table") {
+import org.apache.spark.util.Utils
+
+val path = Utils.createTempDir()
+path.delete()
+checkAnswer(
+  sql(s"INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}' SELECT * 
FROM src where key < 10"),
+  Seq.empty[Row])
+
+checkAnswer(
+  sql(s"""INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}'
+ |STORED AS orc
+ |SELECT * FROM src where key < 10""".stripMargin),
+  Seq.empty[Row])
+
+// use orc data source to check the data of path is right.
+sql(
+  s"""CREATE TEMPORARY TABLE orc_source
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${path.getCanonicalPath}'
+ |)
+   """.stripMargin)
+checkAnswer(
+  sql("select * from orc_source"),
+  sql("select * from src where key < 10").collect()
+)
+
+Utils.deleteRecursively(path)
+dropTempTable("orc_source")
+  }
+
+  test("insert overwrite to dir from temp table") {
+import org.apache.spark.util.Utils
+
+sparkContext
+  .parallelize(1 to 10)
+  .map(i => TestData(i, i.toString))
+  .toDF()
+  .registerTempTable("test_insert_table")
+
+val path = Utils.createTempDir()
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382831
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2040,4 +2040,80 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   assert(setOfPath.size() == pathSizeToDeleteOnExit)
 }
   }
+
+  test("insert overwrite to dir from hive metastore table") {
+import org.apache.spark.util.Utils
+
+val path = Utils.createTempDir()
+path.delete()
+checkAnswer(
+  sql(s"INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}' SELECT * 
FROM src where key < 10"),
+  Seq.empty[Row])
+
+checkAnswer(
+  sql(s"""INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}'
+ |STORED AS orc
+ |SELECT * FROM src where key < 10""".stripMargin),
+  Seq.empty[Row])
+
+// use orc data source to check the data of path is right.
+sql(
+  s"""CREATE TEMPORARY TABLE orc_source
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${path.getCanonicalPath}'
+ |)
+   """.stripMargin)
+checkAnswer(
+  sql("select * from orc_source"),
+  sql("select * from src where key < 10").collect()
+)
+
+Utils.deleteRecursively(path)
+dropTempTable("orc_source")
+  }
+
+  test("insert overwrite to dir from temp table") {
+import org.apache.spark.util.Utils
+
+sparkContext
+  .parallelize(1 to 10)
+  .map(i => TestData(i, i.toString))
+  .toDF()
+  .registerTempTable("test_insert_table")
+
+val path = Utils.createTempDir()
+path.delete()
+checkAnswer(
+  sql(
+s"""
+   |INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}'
+   |ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+   |SELECT * FROM test_insert_table
+ """.stripMargin),
+  Seq.empty[Row])
+
+checkAnswer(
+  sql(s"""
+INSERT OVERWRITE LOCAL DIRECTORY '${path.toString}'
+ |STORED AS orc
+ |SELECT * FROM test_insert_table""".stripMargin),
+  Seq.empty[Row])
+
+// use orc data source to check the data of path is right.
+sql(
+  s"""CREATE TEMPORARY TABLE orc_source
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${path.getCanonicalPath}'
+ |)
+   """.stripMargin)
+checkAnswer(
+  sql("select * from orc_source"),
+  sql("select * from test_insert_table").collect()
+)
+Utils.deleteRecursively(path)
+dropTempTable("test_insert_table")
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382724
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
+import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => 
FileSinkDesc}
+
+// Base trait from which all hive insert statement physical execution 
extends.
+private[hive] trait SaveAsHiveFile extends DataWritingCommand {
+
+  protected def saveAsHiveFile(sparkSession: SparkSession,
+   plan: SparkPlan,
+   hadoopConf: Configuration,
+   fileSinkConf: FileSinkDesc,
+   outputLocation: String,
+   partitionAttributes: Seq[Attribute] = Nil,
+   bucketSpec: Option[BucketSpec] = None,
+   options: Map[String, String] = Map.empty): 
Unit = {
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382590
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala
 ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Properties
+
+import scala.language.existentials
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.plan.TableDesc
+import org.apache.hadoop.hive.serde.serdeConstants
+import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
+import org.apache.hadoop.mapred._
+
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.util.Utils
+
+
+case class InsertIntoHiveDirCommand(
+isLocal: Boolean,
+storage: CatalogStorageFormat,
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134382618
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.datasources._
+
+/**
+ * A command used to write the result of a query to a directory.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   INSERT OVERWRITE DIRECTORY (path=STRING)?
+ *   USING format OPTIONS ([option1_name "option1_value", option2_name 
"option2_value", ...])
+ *   SELECT ...
+ * }}}
+ */
+case class InsertIntoDataSourceDirCommand(
+storage: CatalogStorageFormat,
+provider: Option[String],
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19015
  
**[Test build #80956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80956/testReport)**
 for PR 19015 at commit 
[`191bde1`](https://github.com/apache/spark/commit/191bde194bbb56c40f5d33e8fbaf5c3505d792cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

2017-08-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19015
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19015: [SPARK-21803] [TEST] Remove the HiveDDLCommandSui...

2017-08-21 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/19015

[SPARK-21803] [TEST] Remove the HiveDDLCommandSuite

## What changes were proposed in this pull request?
We do not have any Hive-specific parser. It does not make sense to keep a 
parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
This PR is to remove it.  

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark combineDDL

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19015


commit b4de53dd8e99a1b43f723048360f947c2648bc0c
Author: gatorsmile 
Date:   2017-08-22T04:14:41Z

remove HiveDDLCommandSuite.scala

commit 191bde194bbb56c40f5d33e8fbaf5c3505d792cc
Author: gatorsmile 
Date:   2017-08-22T04:22:09Z

style fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134381632
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -359,6 +359,17 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = false
 }
 
+case class InsertIntoDir(
--- End diff --

ok. added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-21 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r134381489
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -740,6 +750,7 @@ nonReserved
 | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | 
INTERVAL | MACRO | OR | STRATIFY | THEN
 | UNBOUNDED | WHEN
 | DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT 
| CURRENT_DATE | CURRENT_TIMESTAMP
+| DIRECTORY
--- End diff --

it is already in TableIdentifierParserSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-21 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/17849
  
@holdenk , do you think this is good to go now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18989: [SPARK-21781][SQL] Modify DataSourceScanExec to use conc...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18989
  
**[Test build #80955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80955/testReport)**
 for PR 18989 at commit 
[`9effea9`](https://github.com/apache/spark/commit/9effea9379313b0aac1f392ca11ce0f678bb1e0c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-08-21 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18982
  
Thanks for reviewing @holdenk !  You brought up some good points, let me 
know if you prefer me to change them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-21 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134380704
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,13 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if param in self._paramMap:
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if param in self._defaultParamMap:
--- End diff --

We usually make the assumption that Python defines the same default values 
as Java, in Spark ML at least, but given the circumstances of the JIRA - they 
defined their own Model - then it's still possible for `hasDefault` or the 
default value to return something different that Python would.  So I'm just 
being overly cautious here, but it's pretty cheap to just transfer the default 
values anyway right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-21 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134380293
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+def test_preserve_set_state(self):
+model = Binarizer()
+self.assertFalse(model.isSet("threshold"))
+model._transfer_params_to_java()
--- End diff --

yeah, it would be a little better to call the actual `transform`, but we 
would still need to call `_transfer_params_from_java` or check `isSet` with a 
direct call to Java via py4j.  I was going to do this, but the `ParamTest` 
class doesn't already create a `SparkSession` - I'm sure it's just a small 
amount of overhead but that's why I thought to just use 
`_transfer_params_to_java`.

Do you think it would be worth it to change `ParamTests` to inherit from 
`SparkSessionTestCase` so a session is created and I could make a `DataFrame` 
to transform?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-08-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18931
  
@maropu Thanks for running the benchmark and getting the numbers!

So looks like SPARK-21603 actually affect the performance improvement of 
this PR?

It shows this PR can significantly improve long codegen queries like Q17 
and Q66.

And SPARK-21603 with default setting downgrades many queries including Q66.

cc @cloud-fan @gatorsmile @kiszk 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15435
  
**[Test build #80954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80954/testReport)**
 for PR 15435 at commit 
[`67c57e5`](https://github.com/apache/spark/commit/67c57e547b654ec2816fe4f33e067072a05c4d5e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15435
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19013: [SPARK-21728][core] Allow SparkSubmit to use Logging.

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80949/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19013: [SPARK-21728][core] Allow SparkSubmit to use Logging.

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19013: [SPARK-21728][core] Allow SparkSubmit to use Logging.

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19013
  
**[Test build #80949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80949/testReport)**
 for PR 19013 at commit 
[`3d6016e`](https://github.com/apache/spark/commit/3d6016e14eb3fab5cea1bd24452842e59f721cad).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-21 Thread stanzhai

Github user stanzhai commented on the issue:

https://github.com/apache/spark/pull/18986
  
@gatorsmile @DonnyZone  When comparing a string to a int in Hive, it will 
cast string type to double.

```
hive> select * from tb;
0   0
0.1 0
true0
19157170390056971   0
hive> select * from tb where a = 0;
0   0
hive> select * from tb where a = 19157170390056973L;
WARNING: Comparing a bigint and a string may result in a loss of precision.
19157170390056973   0
hive> select 1 = 'true';
NULL
hive> select 19157170390056973L = '19157170390056971';
WARNING: Comparing a bigint and a string may result in a loss of precision.
true
```

So, I think that cast a string to double type when compare with a numeric 
is more reasonable.

Actually, my usage scenarios are for Spark compatibility. The problem I 
found when I upgraded Spark to 2.2.0, and lots of SQL's results are wrong.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-08-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18931
  
sorry for late response (I rerun many-times in various settings...), see: 
https://docs.google.com/spreadsheets/d/1LsnRIWDoqNtGhrWJ4jKVfYizL9X8KIJc04WisKHJig8/edit#gid=4103073


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19014: [MINOR][CORE] Add missing kvstore module in Laucher and ...

2017-08-21 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19014
  
CC @vanzin , please help to review, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19014: [MINOR][CORE] Add missing kvstore module in Laucher and ...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19014
  
**[Test build #80953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80953/testReport)**
 for PR 19014 at commit 
[`bf03c76`](https://github.com/apache/spark/commit/bf03c76f670eba3efc721fe1482ed8f05531b5af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19014: [MINOR][CORE] Add missing kvstore module in Lauch...

2017-08-21 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/19014

[MINOR][CORE] Add missing kvstore module in Laucher and SparkSubmit code

There're two code in Launcher and SparkSubmit will will explicitly list all 
the Spark submodules, newly added kvstore module is missing in this two parts, 
so submitting a minor PR to fix this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark missing-kvstore

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19014


commit bf03c76f670eba3efc721fe1482ed8f05531b5af
Author: jerryshao 
Date:   2017-08-22T03:06:45Z

Add missing kvstore module in Laucher and SparkSubmit code

Change-Id: I35109bca61f9c0a246b7a9842a98947bc580c6dd




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-21 Thread stanzhai

Github user stanzhai commented on the issue:

https://github.com/apache/spark/pull/18986
  
@DonnyZone @gatorsmile @cloud-fan  PostgreSQL will throw an error when 
comparing a string to  a int.

```
postgres=# select * from tb;
  a   | b
--+---
 0.1  | 1
 a| 1
 true | 1
(3 rows)

postgres=# select * from tb where a>0;
ERROR:  operator does not exist: character varying > integer
LINE 1: select * from tb where a>0;
^
HINT:  No operator matches the given name and argument type(s). You might 
need to add explicit type casts.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80947/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15435
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15435
  
**[Test build #80947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80947/testReport)**
 for PR 15435 at commit 
[`67c57e5`](https://github.com/apache/spark/commit/67c57e547b654ec2816fe4f33e067072a05c4d5e).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18892: [SPARK-21520][SQL]Improvement a special case for ...

2017-08-21 Thread heary-cao

Github user heary-cao closed the pull request at:

https://github.com/apache/spark/pull/18892


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80952/testReport)**
 for PR 18973 at commit 
[`8857cf5`](https://github.com/apache/spark/commit/8857cf51f142865063c53e4a7089dd027db4d3c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

2017-08-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18734


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18734
  
Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading remote r...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18962
  
**[Test build #80951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80951/testReport)**
 for PR 18962 at commit 
[`16ce99f`](https://github.com/apache/spark/commit/16ce99fc1cea9260a96dae98f031bda9f8ed18f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18734
  
LGTM

I am going to credit this to @rgbkrk per 
(http://spark.apache.org/contributing.html)

> In case several people contributed, prefer to assign to the more 
âjuniorâ, non-committer contributor

I just double checked if the tests passes with Python 3.6.0, and if I could 
run pi example with pypy manually (SPARK-21753), with the current status.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18734
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18734
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80950/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18734
  
**[Test build #80950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80950/testReport)**
 for PR 18734 at commit 
[`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18957: [SPARK-21744][CORE] Add retry logic for new broadcast in...

2017-08-21 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/18957
  
In my opinionï¼i think it is better to retry.Since yarn has health checker 
which can find bad disk later.For current job i think it should not failed by 
this bad disk. @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-08-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18931
  
ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-08-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18931
  
@maropu I saw you may open a follow-up to fix the default value, maybe you 
can do that in the follow-up too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18974: [SPARK-21750][SQL] Use Arrow 0.6.0

2017-08-21 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18974
  
ping @srowen @ueshin @BryanCutler


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18734
  
**[Test build #80950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80950/testReport)**
 for PR 18734 at commit 
[`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-08-21 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18966#discussion_r134368200
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -582,6 +582,15 @@ object SQLConf {
 .intConf
 .createWithDefault(2667)
 
+  val CODEGEN_MAX_CHARS_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxCharactersPerFunction")
--- End diff --

In this PR, do we change this parameter to use `number of lines` instead of 
`number of characters`, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18968
  
@gatorsmile @cloud-fan Any more comments on this change? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18734
  
I am merging this because:

cloudpickle looks initially ported from 
https://github.com/cloudpipe/cloudpickle/commit/7aebb7ed42258a9392c2ada9b4bb390d566630cc
 and 
https://github.com/cloudpipe/cloudpickle/commit/c4f885116126b2ac49deae7a31d4941d006f319f
 (-> 
https://github.com/apache/spark/commit/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8),
 where I see both are identical.

After 
https://github.com/apache/spark/commit/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8,
 we have diff - 
https://github.com/apache/spark/commit/e044705b4402f86d0557ecd146f3565388c7eeb4,
 
https://github.com/apache/spark/commit/55204181004c105c7a3e8c31a099b37e48bfd953,
 
https://github.com/apache/spark/commit/ee913e6e2d58dfac20f3f06ff306081bd0e48066,
 
https://github.com/apache/spark/commit/d48935400ca47275f677b527c636976af09332c8,
 
https://github.com/apache/spark/commit/dbfc7aa4d0d5457bc92e1e66d065c6088d476843,
 
https://github.com/apache/spark/commit/20e6280626fe243b170a2e7c5e018c67f3dac1db 
and 
https://github.com/apache/spark/commit/6297697f975960a3006c4e58b4964d9ac40eeaf5

**[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__**, 
https://github.com/apache/spark/commit/e044705b4402f86d0557ecd146f3565388c7eeb4:
 I think this part is only what we are worried of. It looks supporting 
`classmethod`, `staticmethod` and `property`. We have a test:


https://github.com/apache/spark/blob/96608310501a43fa4ab9f2697f202d655dba98c5/python/pyspark/sql/tests.py#L141-L173


https://github.com/apache/spark/blob/96608310501a43fa4ab9f2697f202d655dba98c5/python/pyspark/sql/tests.py#L898-L927

**[SPARK-10542] [PYSPARK] fix serialize namedtuple**, 
https://github.com/apache/spark/commit/55204181004c105c7a3e8c31a099b37e48bfd953:
 We keep the changes:


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L1090-L1095


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L433-L436

and the related test pass:


https://github.com/apache/spark/blob/77cc0d67d5a7ea526f8efd37b2590923953cb8e0/python/pyspark/tests.py#L211-L219

**[SPARK-13697] [PYSPARK] Fix the missing module name of 
TransformFunctionSerializer.loads**, 
https://github.com/apache/spark/commit/ee913e6e2d58dfac20f3f06ff306081bd0e48066:
 We keep this change:


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L528


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L1022-L1029

and the related test pass:


https://github.com/apache/spark/blob/77cc0d67d5a7ea526f8efd37b2590923953cb8e0/python/pyspark/tests.py#L233-L237

We should probably port this one to `cloudpipe/cloudpickle`.


**[SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule()**, 
https://github.com/apache/spark/commit/d48935400ca47275f677b527c636976af09332c8:
 We keep this change:


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L325-L330


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L620-L625

This patch even should be safer as I and @rgbkrk verified this with some 
tests:

https://github.com/cloudpipe/cloudpickle/pull/112


**[SPARK-17472] [PYSPARK] Better error message for serialization failures 
of large objects in Python**, 
https://github.com/apache/spark/commit/dbfc7aa4d0d5457bc92e1e66d065c6088d476843:
 We keep this change:


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L240-L249

Probably, we should port this change into `cloudpipe/cloudpickle`.

**[SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port 
cloudpickle changes for PySpark to work with Python 3.6.0**, 
https://github.com/apache/spark/commit/20e6280626fe243b170a2e7c5e018c67f3dac1db

This change was ported from `cloudpipe/cloudpickle`. I tested our PySpark 
tests pass with Python 3.6.0 in my local manually - 
https://github.com/apache/spark/pull/18734#issuecomment-319558550

**[SPARK-19505][PYTHON] AttributeError on Exception.message in Python3**, 
https://github.com/apache/spark/commit/6297697f975960a3006c4e58b4964d9ac40eeaf5:
 We keep this change:


https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L240-L249




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18734
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18973: [SPARK-21765] Set isStreaming on leaf nodes for s...

2017-08-21 Thread joseph-torres

Github user joseph-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/18973#discussion_r134367553
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala 
---
@@ -118,8 +122,15 @@ case class MemoryStream[A : Encoder](id: Int, 
sqlContext: SQLContext)
   batches.slice(sliceStart, sliceEnd)
 }
 
-logDebug(
-  s"MemoryBatch [$startOrdinal, $endOrdinal]: 
${newBlocks.flatMap(_.collect()).mkString(", ")}")
+logDebug({
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18973: [SPARK-21765] Set isStreaming on leaf nodes for s...

2017-08-21 Thread joseph-torres

Github user joseph-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/18973#discussion_r134367543
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -420,8 +420,10 @@ class SQLContext private[sql](val sparkSession: 
SparkSession)
* converted to Catalyst rows.
*/
   private[sql]
-  def internalCreateDataFrame(catalystRows: RDD[InternalRow], schema: 
StructType) = {
-sparkSession.internalCreateDataFrame(catalystRows, schema)
+  def internalCreateDataFrame(catalystRows: RDD[InternalRow],
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18973: [SPARK-21765] Set isStreaming on leaf nodes for s...

2017-08-21 Thread joseph-torres

Github user joseph-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/18973#discussion_r134367548
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -728,7 +729,16 @@ class FakeDefaultSource extends FakeSource {
 
   override def getBatch(start: Option[Offset], end: Offset): DataFrame 
= {
 val startOffset = 
start.map(_.asInstanceOf[LongOffset].offset).getOrElse(-1L) + 1
-spark.range(startOffset, end.asInstanceOf[LongOffset].offset + 
1).toDF("a")
+val ds = new Dataset[java.lang.Long](
--- End diff --

I've tried addressing this a few different ways, and I can't come up with 
anything cleaner than the current solution. Directly creating a DF doesn't set 
the isStreaming bit, and a bunch of copying and casting is required to get it 
set; using LocalRelation requires explicitly handling the encoding of the rows, 
since LocalRelation requires InternalRow input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18734
  
Yea, it looks so. Named tuple one reminds me of the workaround we have for 
named tuple to make picklable - 
https://github.com/apache/spark/blob/d03aebbe6508ba441dc87f9546f27aeb27553d77/python/pyspark/serializers.py#L395-L446

Maybe, we could take a look and see if we could get rid of this or port 
this. Anyway, let me take a final look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80943/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18810
  
ok, I'll make a pr as follow-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80943/testReport)**
 for PR 18973 at commit 
[`c837069`](https://github.com/apache/spark/commit/c83706921157bdf2af4f2b697244054bc1e8ffad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17373
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17373
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80946/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80942/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17373
  
**[Test build #80946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80946/testReport)**
 for PR 17373 at commit 
[`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-21 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/18968
  
@viirya Thank you !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80942/testReport)**
 for PR 18973 at commit 
[`e55abe6`](https://github.com/apache/spark/commit/e55abe6be316f251bc51f845fc9108f4f721c601).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18992: [SPARK-19762][ML][FOLLOWUP]Add necessary comments...

2017-08-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18992


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19013: [SPARK-21728][core] Allow SparkSubmit to use Logging.

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19013
  
**[Test build #80949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80949/testReport)**
 for PR 19013 at commit 
[`3d6016e`](https://github.com/apache/spark/commit/3d6016e14eb3fab5cea1bd24452842e59f721cad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18992: [SPARK-19762][ML][FOLLOWUP]Add necessary comments to L2R...

2017-08-21 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18992
  
Merged into master. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19012: [SPARK-17742][core] Fail launcher app handle if child pr...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19012
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80941/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19012: [SPARK-17742][core] Fail launcher app handle if child pr...

2017-08-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19012
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18984: [SPARK-21773][BUILD][DOCS] Installs mkdocs if missing in...

2017-08-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18984
  
Thanks for your effort @shaneknapp. I just checked the green.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19012: [SPARK-17742][core] Fail launcher app handle if child pr...

2017-08-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19012
  
**[Test build #80941 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80941/testReport)**
 for PR 19012 at commit 
[`4d5cc53`](https://github.com/apache/spark/commit/4d5cc5313c319f900abf1a7f2da0392bc2c396a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 346 matches

Mail list logo