date:20170302

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73815 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73815/testReport)**
 for PR 16981 at commit 
[`ddc06cf`](https://github.com/apache/spark/commit/ddc06cf46b3b2730dc5ec8f49e12225c60d05b7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #73829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73829/testReport)**
 for PR 16910 at commit 
[`15c0a77`](https://github.com/apache/spark/commit/15c0a77714eb4ed5221f47d54ed31fcc10a95303).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17081: [SPARK-18726][SQL]resolveRelation for FileFormat ...

2017-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17081


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17081: [SPARK-18726][SQL]resolveRelation for FileFormat DataSou...

2017-03-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17081
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17096
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73822/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17096
  
**[Test build #73822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73822/testReport)**
 for PR 17096 at commit 
[`cd235a7`](https://github.com/apache/spark/commit/cd235a7f641da8a350b8ace0e4c0691ccac189f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17096
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17147: [Minor][Doc] Fix doc for web UI https configuration

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17147
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17147: [Minor][Doc] Fix doc for web UI https configuration

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17147
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73826/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17147: [Minor][Doc] Fix doc for web UI https configuration

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17147
  
**[Test build #73826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73826/testReport)**
 for PR 17147 at commit 
[`22aa879`](https://github.com/apache/spark/commit/22aa879bd1ec8f51fbb2af62cc62ce71662542f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16944
  
**[Test build #73828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73828/testReport)**
 for PR 16944 at commit 
[`95af481`](https://github.com/apache/spark/commit/95af4810b9c85b2b8680d7791cf298ed147e33c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73827/testReport)**
 for PR 17001 at commit 
[`e3a467e`](https://github.com/apache/spark/commit/e3a467e52b73dc1f67fb2b669d551a7b9bb904b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17096
  
@holdenk and @viirya, I got rid of the changes in `types.py` and only left 
that I am pretty sure.

There are two kind of changes here that look used in the only local scope.

One seems for used `getattr` I guess it is fine as below:

```python
>>> getattr("a", u"__str__")

>>> getattr("a", "__str__")

```

and other one seems used for setting an parameter to JVM which seems 
already used in the code base much more.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17136: [SPARK-19783][SQL] Treat shorter/longer lengths of token...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17136
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73816/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17136: [SPARK-19783][SQL] Treat shorter/longer lengths of token...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17136
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17136: [SPARK-19783][SQL] Treat shorter/longer lengths of token...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17136
  
**[Test build #73816 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73816/testReport)**
 for PR 17136 at commit 
[`5a01a9d`](https://github.com/apache/spark/commit/5a01a9dcbe1fb922a7e240fdee3bd4b7fa4e471a).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17122
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73813/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17147: [Minor][Doc] Fix doc for web UI https configuration

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17147
  
**[Test build #73826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73826/testReport)**
 for PR 17147 at commit 
[`22aa879`](https://github.com/apache/spark/commit/22aa879bd1ec8f51fbb2af62cc62ce71662542f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17122
  
**[Test build #73813 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73813/testReport)**
 for PR 17122 at commit 
[`7f095c0`](https://github.com/apache/spark/commit/7f095c0bdae1ff15859bec399fdd705bff379be0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-03-02 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/16944
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-02 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r104101806
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -905,3 +934,91 @@ object SPARK_18989_DESC_TABLE {
 }
   }
 }
+
+object SPARK_19667_CREATE_TABLE {
+  def main(args: Array[String]): Unit = {
+val spark = SparkSession.builder().enableHiveSupport().getOrCreate()
+try {
+  val warehousePath = 
s"file:${spark.sharedState.warehousePath.stripSuffix("/")}"
+  val defaultDB = 
spark.sessionState.catalog.getDatabaseMetadata("default")
+  // default database use warehouse path as its location
+  assert(defaultDB.locationUri.stripSuffix("/") == warehousePath)
+  spark.sql("CREATE TABLE t(a string)")
+
+  val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+  // table in default database use the location of default database 
which is also warehouse path
+  assert(table.location.stripSuffix("/") == s"$warehousePath/t")
+  spark.sql("INSERT INTO TABLE t SELECT 1")
+  assert(spark.sql("SELECT * FROM t").count == 1)
+
+  spark.sql("CREATE DATABASE not_default")
+  spark.sql("USE not_default")
+  spark.sql("CREATE TABLE t1(b string)")
+  val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1"))
+  // table in not default database use the location of its own database
+  assert(table1.location.stripSuffix("/") == 
s"$warehousePath/not_default.db/t1")
+} finally {
+  spark.sql("USE default")
+}
+  }
+}
+
+object SPARK_19667_VERIFY_TABLE_PATH {
+  def main(args: Array[String]): Unit = {
+val spark = SparkSession.builder().enableHiveSupport().getOrCreate()
+try {
+  val warehousePath = 
s"file:${spark.sharedState.warehousePath.stripSuffix("/")}"
--- End diff --

I am doing this modify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17147: [Minor][Doc] Fix doc for web UI https configurati...

2017-03-02 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/17147

[Minor][Doc] Fix doc for web UI https configuration

## What changes were proposed in this pull request?

Doc about enabling web UI https is not correct, "spark.ui.https.enabled" is 
not existed, actually enabling SSL is enough for https.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark fix-doc-ssl

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17147


commit 22aa879bd1ec8f51fbb2af62cc62ce71662542f3
Author: jerryshao 
Date:   2017-03-03T07:30:41Z

Fix doc for https web ui

Change-Id: I77e0e0806a94e50e366d199c9a9d98739ed326c7




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query result d...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17145
  
**[Test build #73825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73825/testReport)**
 for PR 17145 at commit 
[`f5a35f6`](https://github.com/apache/spark/commit/f5a35f6bc3ec032f429137676b99b888ae326acc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-03-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r104101253
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -116,22 +116,22 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
 withTempView("test") {
--- End diff --

is this test duplicated with the newly added limit test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query type dos...

2017-03-02 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17145
  
unrelated failure: ` 
org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite.stress
 test for failOnDataLoss=false`. retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-03-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r104101031
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.statsEstimation
+
+import org.apache.spark.sql.catalyst.CatalystConf
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference, Literal}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types.IntegerType
+
+
+class StatsEstimationSuite extends StatsEstimationTestBase {
--- End diff --

`BasicStatsEstimationSuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73823/testReport)**
 for PR 17094 at commit 
[`d7dceeb`](https://github.com/apache/spark/commit/d7dceebb5fecc22c74a4ba2a334ab8ca492a518b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16696
  
**[Test build #73824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73824/testReport)**
 for PR 16696 at commit 
[`5692939`](https://github.com/apache/spark/commit/56929391719053e72791abe127b10a3316b51141).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17096
  
**[Test build #73822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73822/testReport)**
 for PR 17096 at commit 
[`cd235a7`](https://github.com/apache/spark/commit/cd235a7f641da8a350b8ace0e4c0691ccac189f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-03-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r104100931
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsConfSuite.scala
 ---
@@ -1,64 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.statsEstimation
-
-import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference}
-import org.apache.spark.sql.catalyst.plans.logical.{ColumnStat, 
LogicalPlan, Statistics}
-import org.apache.spark.sql.types.IntegerType
-
-
-class StatsConfSuite extends StatsEstimationTestBase {
--- End diff --

why remove this test suite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17135: SPARK-19794 Release HDFS Client after read/write checkpo...

2017-03-02 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/17135
  
I remember FileSystem will be cached internally by default. Closing it 
probably will introduce some performance regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-02 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17094
  
Removed WIP, think it's ready now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query type dos...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17145
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73817/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query type dos...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17145
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query type dos...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17145
  
**[Test build #73817 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73817/testReport)**
 for PR 17145 at commit 
[`f5a35f6`](https://github.com/apache/spark/commit/f5a35f6bc3ec032f429137676b99b888ae326acc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-03-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16696
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73821/testReport)**
 for PR 17094 at commit 
[`76eda69`](https://github.com/apache/spark/commit/76eda69de903f2d9aae4ce17a1b9555b0403588d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73820/testReport)**
 for PR 17094 at commit 
[`46630d1`](https://github.com/apache/spark/commit/46630d1bb928ae0dea056e78afd02d76bc0da6af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73819/testReport)**
 for PR 17094 at commit 
[`f7e9169`](https://github.com/apache/spark/commit/f7e91699ac2af2a0baed3bc7fe0befafa21a862f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-03-02 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/15505
  

[SPARK-18890_20170303](https://github.com/witgo/spark/commits/SPARK-18890_20170303)
 `s code  is older but the test case running time is 5.2 s


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17096
  
Let me check if each is fine for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

2017-03-02 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17096
  
@viirya, thank you so much for taking a look and your time. 

So, basically, the second case it compares str to unicode as below:

```python
>>> u"æ¸¬è©¦" == u"æ¸¬è©¦".encode("utf-8")
False
```

Apparently, it seems we could pass unicode as is? Let me raise another 
issue for this after testing and looking into this. Actually, the support in 
`StructType.add` seems not the problem specified in the JIRA. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...

2017-03-02 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17065#discussion_r104098256
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -95,15 +84,16 @@ case class FilterEstimation(plan: Filter, catalystConf: 
CatalystConf) extends Lo
* @param condition the compound logical expression
* @param update a boolean flag to specify if we need to update 
ColumnStat of a column
*   for subsequent conditions
-   * @return a double value to show the percentage of rows meeting a given 
condition.
+   * @return an optional double value to show the percentage of rows 
meeting a given condition.
* It returns None if the condition is not supported.
*/
   def calculateFilterSelectivity(condition: Expression, update: Boolean = 
true): Option[Double] = {
-
 condition match {
   case And(cond1, cond2) =>
-(calculateFilterSelectivity(cond1, update), 
calculateFilterSelectivity(cond2, update))
-match {
+// For ease of debugging, we compute percent1 and percent2 in 2 
statements.
+val percent1 = calculateFilterSelectivity(cond1, update)
+val percent2 = calculateFilterSelectivity(cond2, update)
+(percent1, percent2) match {
   case (Some(p1), Some(p2)) => Some(p1 * p2)
   case (Some(p1), None) => Some(p1)
--- End diff --

@cloud-fan @ron8hu I'm a little confused about this, for Not expression, it 
always becomes under-estimation if we do over-estimation, no matter it's nested 
or not. So should we remove support for `nested Not` or `Not`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73818/testReport)**
 for PR 16981 at commit 
[`4efae36`](https://github.com/apache/spark/commit/4efae36533895d47e0ced19be23adf4579eb285d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73809/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16981
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73809 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73809/testReport)**
 for PR 16981 at commit 
[`0d087b0`](https://github.com/apache/spark/commit/0d087b0f66571759ae7ea802c41ac0047d154e3c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17074: [SPARK-18646][REPL] Set parent classloader as null for E...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17074
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73805/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17074: [SPARK-18646][REPL] Set parent classloader as null for E...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17074
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2017-03-02 Thread jerryshao

Github user jerryshao closed the pull request at:

https://github.com/apache/spark/pull/14789


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...

2017-03-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17095#discussion_r104095925
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1843,10 +1843,12 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
  |OPTIONS(path "$dir")
""".stripMargin)
 val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
-assert(table.location == dir.getAbsolutePath)
+val dirPath = new Path(dir.getAbsolutePath)
+val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf())
--- End diff --

Can you create a helper function to avoid the duplicate codes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-02 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/14731
  
@srowen Waiting for your final OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104084997
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -253,7 +255,18 @@ object KMeansModel extends MLReadable[KMeansModel] {
 @Since("1.5.0")
 class KMeans @Since("1.5.0") (
 @Since("1.5.0") override val uid: String)
-  extends Estimator[KMeansModel] with KMeansParams with 
DefaultParamsWritable {
+  extends Estimator[KMeansModel]
+with KMeansParams with HasInitialModel[KMeansModel] with MLWritable {
+
+  /**
+   * A KMeansModel to use for warm start.
+   * Note the cluster count of initial model must be equal with [[k]],
+   * otherwise, throws IllegalArgumentException.
+   * @group param
+   */
+  @Since("2.2.0")
+  final val initialModel: Param[KMeansModel] =
--- End diff --

I prefer doing this in the same way that ALS does it. By having separate 
param traits `KMeansParams extends KMeansModelParams with HasInitialModel`. 
It's more explicit since now our `KMeans` class would have extra params on top 
of `KMeansParams`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104084877
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -123,7 +126,8 @@ class KMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val predictUDF = udf((vector: Vector) => predict(vector))
+val tmpParent: MLlibKMeansModel = parentModel
--- End diff --

Can we change it to `localParent`? That's the convention we have taken 
elsewhere when we want to get a separate pointer to a class member.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104095197
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -182,6 +224,7 @@ object KMeansSuite {
 "predictionCol" -> "myPrediction",
 "k" -> 3,
 "maxIter" -> 2,
-"tol" -> 0.01
+"tol" -> 0.01,
+"initialModel" -> generateRandomKMeansModel(3, 3)
--- End diff --

It would be nicer to change `testEstimatorAndModelReadWrite` to accept 
`estimatorTestParams` and `modelTestParams` separately so we don't have to hard 
code certain params to be filtered out inside that method. Though we wouldn't 
have to that in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104091867
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -418,6 +418,8 @@ object KMeans {
   val RANDOM = "random"
   @Since("0.8.0")
   val K_MEANS_PARALLEL = "k-means||"
+  @Since("2.2.0")
+  val K_MEANS_INITIAL_MODEL = "initialModel"
--- End diff --

It can be private I think. That, or we should update the valid options for 
the `setInitializationMode` doc. But I think it's best to make it private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104092158
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -22,22 +22,28 @@ import scala.util.Random
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.ml.linalg.{Vector, Vectors}
 import org.apache.spark.ml.param.ParamMap
-import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTestingUtils}
-import org.apache.spark.mllib.clustering.{KMeans => MLlibKMeans}
+import org.apache.spark.ml.util.{DefaultReadWriteTest, Identifiable, 
MLTestingUtils}
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.clustering.{KMeans => MLlibKMeans, 
KMeansModel => MLlibKMeansModel}
+import org.apache.spark.mllib.linalg.{Vectors => MLlibVectors}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
 
 private[clustering] case class TestRow(features: Vector)
 
 class KMeansSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
 
+  import testImplicits._
+
   final val k = 5
   @transient var dataset: Dataset[_] = _
+  @transient var rData: Dataset[_] = _
 
   override def beforeAll(): Unit = {
 super.beforeAll()
 
 dataset = KMeansSuite.generateKMeansData(spark, 50, 3, k)
+rData = 
GaussianMixtureSuite.rData.map(GaussianMixtureSuite.FeatureData).toDF()
--- End diff --

`GaussianMixtureSuite.rData.map(Tuple1.apply).toDF()` ? Mapping the dummy 
case class from another test suite is less clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104090529
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -337,15 +366,61 @@ class KMeans @Since("1.5.0") (
 
   @Since("1.5.0")
   override def transformSchema(schema: StructType): StructType = {
+if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL) {
+  if (isSet(initialModel)) {
+val initialModelK = $(initialModel).parentModel.k
+if (initialModelK != $(k)) {
+  throw new IllegalArgumentException("The initial model's cluster 
count = " +
+s"$initialModelK, mismatched with k = $k.")
+}
+  } else {
+throw new IllegalArgumentException("Users must set param 
initialModel if you choose " +
+  "'initialModel' as the initialization algorithm.")
+  }
+} else {
+  if (isSet(initialModel)) {
+logWarning(s"Param initialModel will take no effect when initMode 
is $initMode.")
+  }
+}
 validateAndTransformSchema(schema)
   }
+
+  @Since("2.2.0")
+  override def write: MLWriter = new KMeans.KMeansWriter(this)
 }
 
 @Since("1.6.0")
-object KMeans extends DefaultParamsReadable[KMeans] {
+object KMeans extends MLReadable[KMeans] {
 
   @Since("1.6.0")
   override def load(path: String): KMeans = super.load(path)
+
+  @Since("2.2.0")
+  override def read: MLReader[KMeans] = new KMeansReader
+
+  /** [[MLWriter]] instance for [[KMeans]] */
+  private[KMeans] class KMeansWriter(instance: KMeans) extends MLWriter {
+
+override protected def saveImpl(path: String): Unit = {
+  DefaultParamsWriter.saveInitialModel(instance, path)
+  DefaultParamsWriter.saveMetadata(instance, path, sc)
+}
+  }
+
+  private class KMeansReader extends MLReader[KMeans] {
+
+override def load(path: String): KMeans = {
+  val metadata = DefaultParamsReader.loadMetadata(path, sc, 
classOf[KMeans].getName)
+  val instance = new KMeans(metadata.uid)
+
+  DefaultParamsReader.getAndSetParams(instance, metadata)
+  DefaultParamsReader.loadInitialModel[KMeansModel](path, sc) match {
--- End diff --

This can be done as:

scala
 DefaultParamsReader.loadInitialModel[KMeansModel](path, 
sc).foreach(instance.setInitialModel)


I think it's nicer, but I'm not sure if there is a universal preference for 
side effects with options in Spark, so I'll leave it to you to decide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104094526
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
@@ -111,12 +113,20 @@ trait DefaultReadWriteTest extends TempDirectory { 
self: Suite =>
 val estimator2 = testDefaultReadWrite(estimator)
 testParams.foreach { case (p, v) =>
   val param = estimator.getParam(p)
-  assert(estimator.get(param).get === estimator2.get(param).get)
+  if (param.name == "initialModel") {
+// Estimator's `initialModel` has same type as the model produced 
by this estimator.
--- End diff --

This is an assumption, and is not enforced by the compiler. There is 
nothing in the trait `HasInitialModel[T <: Model[T]]`that prevents us from 
creating an estimator with an initialModel type that is not the same type of 
the model that the estimator produces. We can discuss whether or not we'd like 
to enforce this assumption, but if we do not then this method should probably 
be changed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104090273
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -337,15 +366,61 @@ class KMeans @Since("1.5.0") (
 
   @Since("1.5.0")
   override def transformSchema(schema: StructType): StructType = {
+if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL) {
--- End diff --

It might be nice to factor this logic out into a method like 
`assertInitialModelValid` or something similar. Actually, we could add an 
abstract method to the `HasInitialModel` trait that each subclass can implement 
differently. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104092773
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -152,6 +158,35 @@ class KMeansSuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultR
 val kmeans = new KMeans()
 testEstimatorAndModelReadWrite(kmeans, dataset, 
KMeansSuite.allParamSettings, checkModelData)
   }
+
+  test("training with initial model") {
+val kmeans = new KMeans().setK(2).setSeed(1)
+val model1 = kmeans.fit(rData)
+val model2 = 
kmeans.setInitMode("initialModel").setInitialModel(model1).fit(rData)
+model2.clusterCenters.zip(model1.clusterCenters)
+  .foreach { case (center2, center1) => assert(center2 ~== center1 
absTol 1E-8) }
+  }
+
+  test("training with initial model, error cases") {
+val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1)
+
+// Sets initMode with 'initialModel', but does not specify initial 
model.
+intercept[IllegalArgumentException] {
--- End diff --

I'm not sure I agree with the behavior. We discussed it quite a bit in the 
other PR - maybe you can summarize the reason you went away from the previous 
decisions? At any rate, it seems currently we have the following behavior:

| k  | initMode   | initialModel  | result |
--- | --- | --- | ---
| ?| not set | set | ignore InitialModel |
| ?| set | not set | error |
|  set (k != initialModelK)   | set | set | error |
|  set (k == initialModelK)   | set | set | use initialModel |

If we keep this behavior, we should add a test for the first case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-03-02 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/15505
  
Yes, maybe a multithreaded serialization task code can have a better 
performance, let me close the PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15505: [SPARK-18890][CORE] Move task serialization from ...

2017-03-02 Thread witgo

Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/15505


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17133: [SPARK-19793] Use clock.getTimeMillis when mark task as ...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17133
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73807/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17133: [SPARK-19793] Use clock.getTimeMillis when mark task as ...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17133
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17133: [SPARK-19793] Use clock.getTimeMillis when mark task as ...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17133
  
**[Test build #73807 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73807/testReport)**
 for PR 17133 at commit 
[`37f26e3`](https://github.com/apache/spark/commit/37f26e3e51d77548aa285856d22834683d37e889).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...

2017-03-02 Thread skambha

Github user skambha commented on the issue:

https://github.com/apache/spark/pull/17067
  
Thanks a lot Xiao. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/16883
  
@VinceShieh I added some minor comments.  This is a nice feature!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104094424
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -163,25 +187,28 @@ class StringIndexerModel (
 }
 transformSchema(dataset.schema, logging = true)
 
+val metadata = NominalAttribute.defaultAttr
+  .withName($(outputCol)).withValues(labels).toMetadata()
+// If we are skipping invalid records, filter them out.
+val (filteredDataset, keepInvalid) = getHandleInvalid match {
--- End diff --

actually, I think returning a tuple here just makes things more confusing.  
Maybe you can move the check outside of the match.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104093892
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -105,7 +125,11 @@ class StringIndexer @Since("1.4.0") (
 
 @Since("1.6.0")
 object StringIndexer extends DefaultParamsReadable[StringIndexer] {
-
+  private[feature] val SKIP_UNSEEN_LABEL: String = "skip"
+  private[feature] val ERROR_UNSEEN_LABEL: String = "error"
+  private[feature] val KEEP_UNSEEN_LABEL: String = "keep"
--- End diff --

It would make me even happier if these were public and could be used by the 
test code, but I think it's up to the commiters (jkbradley)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13320: [SPARK-13184][SQL] Add a datasource-specific option minP...

2017-03-02 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/13320
  
@gatorsmile Could you check this and give me comments, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104093629
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -163,25 +187,28 @@ class StringIndexerModel (
 }
 transformSchema(dataset.schema, logging = true)
 
+val metadata = NominalAttribute.defaultAttr
+  .withName($(outputCol)).withValues(labels).toMetadata()
+// If we are skipping invalid records, filter them out.
+val (filteredDataset, keepInvalid) = getHandleInvalid match {
--- End diff --

minor style comment: instead of keepInvalid, do you think that indexInvalid 
might be a better name (?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query type dos...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17145
  
**[Test build #73817 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73817/testReport)**
 for PR 17145 at commit 
[`f5a35f6`](https://github.com/apache/spark/commit/f5a35f6bc3ec032f429137676b99b888ae326acc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104093452
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -163,25 +190,28 @@ class StringIndexerModel (
 }
 transformSchema(dataset.schema, logging = true)
 
+val metadata = NominalAttribute.defaultAttr
+  .withName($(outputCol)).withValues(labels).toMetadata()
--- End diff --

I think he means that "labels" above should also include the invalid 
bucket.  In previous ML frameworks I've worked on we've just called this 
"unknown".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104093159
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -105,7 +125,11 @@ class StringIndexer @Since("1.4.0") (
 
 @Since("1.6.0")
 object StringIndexer extends DefaultParamsReadable[StringIndexer] {
-
+  private[feature] val SKIP_UNSEEN_LABEL: String = "skip"
+  private[feature] val ERROR_UNSEEN_LABEL: String = "error"
+  private[feature] val KEEP_UNSEEN_LABEL: String = "keep"
--- End diff --

this is very nice, good use of constants, I really like to see this type of 
code :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104093069
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -71,18 +92,17 @@ class StringIndexer @Since("1.4.0") (
   def this() = this(Identifiable.randomUID("strIdx"))
 
   /** @group setParam */
-  @Since("1.6.0")
-  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
-  setDefault(handleInvalid, "error")
-
-  /** @group setParam */
   @Since("1.4.0")
   def setInputCol(value: String): this.type = set(inputCol, value)
 
   /** @group setParam */
   @Since("1.4.0")
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /** @group setParam */
+  @Since("2.2.0")
+  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
--- End diff --

can you keep the order of the params same as before?  also, why did the 
version change, this method existed before, seems it should remain as version 
1.6 (?)
also, minor style comment -- keep the setDefault(handleInvalid) below the 
set method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104092772
  
--- Diff: docs/ml-features.md ---
@@ -576,7 +579,22 @@ will be generated:
  2  | c| 1.0
 
 
-Notice that the row containing "d" does not appear.
+Notice that the rows containing "d" or "e" do not appear.
+
+If you call `setHandleInvalid("keep")`, the following dataset
+will be generated:
+
+
+ id | category | categoryIndex
+|--|---
+ 0  | a| 0.0
+ 1  | b| 2.0
+ 2  | c| 1.0
+ 3  | d| 3.0
+ 4  | e| 3.0
+
+
+Notice that the rows containing "d" or "e" are mapped with indices "3.0"
--- End diff --

doc suggestion:  rows containing "d" or "e" are mapped with indices "3.0" 
=>  rows containing "d" and "e" are mapped to index "3.0"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17136: [SPARK-19783][SQL] Treat shorter/longer lengths of token...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17136
  
**[Test build #73816 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73816/testReport)**
 for PR 17136 at commit 
[`5a01a9d`](https://github.com/apache/spark/commit/5a01a9dcbe1fb922a7e240fdee3bd4b7fa4e471a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104092723
  
--- Diff: docs/ml-features.md ---
@@ -576,7 +579,22 @@ will be generated:
  2  | c| 1.0
 
 
-Notice that the row containing "d" does not appear.
+Notice that the rows containing "d" or "e" do not appear.
+
+If you call `setHandleInvalid("keep")`, the following dataset
+will be generated:
+
+
+ id | category | categoryIndex
+|--|---
+ 0  | a| 0.0
+ 1  | b| 2.0
+ 2  | c| 1.0
+ 3  | d| 3.0
+ 4  | e| 3.0
+
+
+Notice that the rows containing "d" or "e" are mapped with indices "3.0"
--- End diff --

doc suggestion: mapped with indices "3.0" => mapped to index "3.0"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16944
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104092627
  
--- Diff: docs/ml-features.md ---
@@ -542,12 +543,13 @@ column, we should get the following:
 "a" gets index `0` because it is the most frequent, followed by "c" with 
index `1` and "b" with
 index `2`.
 
-Additionally, there are two strategies regarding how `StringIndexer` will 
handle
+Additionally, there are three strategies regarding how `StringIndexer` 
will handle
 unseen labels when you have fit a `StringIndexer` on one dataset and then 
use it
 to transform another:
 
 - throw an exception (which is the default)
 - skip the row containing the unseen label entirely
+- map the unseen labels with indices [numLabels]
--- End diff --

doc suggestion: "map the unseen labels to their own index"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16944
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73808/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15928: [SPARK-18478][SQL] Support codegen'd Hive UDFs

2017-03-02 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/15928
  
@rxin yea, I got x1.3-1.4 performance gains in this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16944
  
**[Test build #73808 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73808/testReport)**
 for PR 16944 at commit 
[`514ae06`](https://github.com/apache/spark/commit/514ae06e1dbe2640091c90d55354c3500857e6e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15928: [SPARK-18478][SQL] Support codegen'd Hive UDFs

2017-03-02 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15928
  
What do you mean? The improvement was small?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17136: [SPARK-19783][SQL] Treat shorter/longer lengths of token...

2017-03-02 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17136
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15928: [SPARK-18478][SQL] Support codegen'd Hive UDFs

2017-03-02 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/15928
  
I looked into this though, I got a little luck from this fix. So, I'll 
close for now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15928: [SPARK-18478][SQL] Support codegen'd Hive UDFs

2017-03-02 Thread maropu

Github user maropu closed the pull request at:

https://github.com/apache/spark/pull/15928


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17140: [SPARK-19796][CORE] Fix serialization of long property v...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73802/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17140: [SPARK-19796][CORE] Fix serialization of long property v...

2017-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17140
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16981
  
@gatorsmile okay, I'll fix the issues you mentioned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73815/testReport)**
 for PR 16981 at commit 
[`ddc06cf`](https://github.com/apache/spark/commit/ddc06cf46b3b2730dc5ec8f49e12225c60d05b7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17140: [SPARK-19796][CORE] Fix serialization of long property v...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17140
  
**[Test build #73802 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73802/testReport)**
 for PR 17140 at commit 
[`99692bf`](https://github.com/apache/spark/commit/99692bf9860f375eab7f7c35d17f83d2c726ae77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-02 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r104091814
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /**
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   *
+   * isShouldStopRequired: require to insert shouldStop() into the loop if 
true
+   */
+  def isShouldStopRequired: Boolean = {
+return shouldStopRequired && !(this.parent != null && 
!this.parent.isShouldStopRequired)
--- End diff --

Thank you for your suggestion. However, it caused an assertion failure at 
`"SPARK-7150 range api"` in DataFrameRangeSuite.

In the failure case, `isShouldStopRequired` is called in the class 
hierarchy by `parent`.
`  RangeExec -> FilterExec -> WholeStageCodegenExec`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistr...

2017-03-02 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r104091757
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,22 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  def strToStructType(schemaAsJson: String): StructType = Try {
--- End diff --

yes, I'll remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistr...

2017-03-02 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r104091471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,22 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  def strToStructType(schemaAsJson: String): StructType = Try {
+DataType.fromJson(schemaAsJson).asInstanceOf[StructType]
+  }.getOrElse {
--- End diff --

okay, I'll fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistr...

2017-03-02 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r104091422
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3007,7 +3008,7 @@ object functions {
* @since 2.1.0
*/
   def from_json(e: Column, schema: String, options: java.util.Map[String, 
String]): Column =
-from_json(e, DataType.fromJson(schema).asInstanceOf[StructType], 
options)
+from_json(e, JacksonUtils.strToStructType(schema), options)
--- End diff --

okay, I'll do.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17144: [SPARK-19803][TEST] flaky BlockManagerReplicationSuite t...

2017-03-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17144
  
**[Test build #73814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73814/testReport)**
 for PR 17144 at commit 
[`9ec5caf`](https://github.com/apache/spark/commit/9ec5cafb32a8137645dda50c958d95c26f3948bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistr...

2017-03-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r104091265
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -174,4 +174,22 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   .select(to_json($"struct").as("json"))
 checkAnswer(dfTwo, readBackTwo)
   }
+
+  test("SPARK-19637 Support to_json in SQL") {
+// to_json
--- End diff --

Nit: remove this comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 671 matches

Mail list logo