date:20170122

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71824/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16579
  
**[Test build #71824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71824/testReport)**
 for PR 16579 at commit 
[`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-01-22 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16638
  
I am sorry that I did't grasp the key points of your question. In Hive, if 
there are data files under the specified path while creating an external table, 
then Hive will identify the files as table data files.
In many spark applications, external table data is generated by other 
applications under the external table path. So, Hive did nothing with the 
directory specified in the LOCATION.
Thank you for your patience and guidance. @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71829/testReport)**
 for PR 16642 at commit 
[`c200b98`](https://github.com/apache/spark/commit/c200b986fed37015a30f99ba2f870dda84cc2ef6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper in Spar...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16566
  
**[Test build #71828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71828/testReport)**
 for PR 16566 at commit 
[`d36c23a`](https://github.com/apache/spark/commit/d36c23a3736cf985c9692f4a14e00945a2d38732).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16642: [SPARK-19284][SQL]append to partitioned datasourc...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16642#discussion_r97262909
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala
 ---
@@ -92,6 +111,16 @@ class PartitionedWriteSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("append data to an existed partitioned table without custom 
partition path") {
+withTable("t") {
+  withSQLConf("spark.sql.sources.commitProtocolClass" ->
--- End diff --

nit: SQLConf.FILE_COMMIT_PROTOCOL_CLASS.key -> 
classOf[OnlyDetectCustomPathFileCommitProtocol].getName


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16642: [SPARK-19284][SQL]append to partitioned datasourc...

2017-01-22 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16642#discussion_r97262157
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala
 ---
@@ -92,6 +96,47 @@ class PartitionedWriteSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("append data an existed partition in a datasource table," +
--- End diff --

thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16642: [SPARK-19284][SQL]append to partitioned datasourc...

2017-01-22 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16642#discussion_r97262179
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala
 ---
@@ -92,6 +96,47 @@ class PartitionedWriteSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("append data an existed partition in a datasource table," +
+"custom location sent to Task should be None ") {
+withTable("t") {
+  Seq((1, 2)).toDF("a", "b").write.partitionBy("b").saveAsTable("t")
+  val writer = Seq((3, 2)).toDF("a", 
"b").write.mode("append").partitionBy("b")
+
+  
spark.sessionState.executePlan(writer.createTableCommand(TableIdentifier("t")))
--- End diff --

good idea, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71827/testReport)**
 for PR 16642 at commit 
[`aff53dc`](https://github.com/apache/spark/commit/aff53dc40330176987056a827f81a01419ce1e1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16521: [SPARK-19139][core] New auth mechanism for transport lib...

2017-01-22 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16521
  
Made one pass. Looks good overall. Just some nits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #71826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71826/testReport)**
 for PR 1 at commit 
[`d1a2d6c`](https://github.com/apache/spark/commit/d1a2d6c9a83adc184dcc88ec3fd78b63ede39b89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary retur...

2017-01-22 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r97260863
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -225,10 +225,12 @@ setMethod("spark.kmeans", signature(data = 
"SparkDataFrame", formula = "formula"
 
 #' @param object a fitted k-means model.
 #' @return \code{summary} returns summary information of the fitted model, 
which is a list.
-#' The list includes the model's \code{k} (number of cluster 
centers),
+#' The list includes the model's \code{k} (the configured number 
of cluster centers),
 #' \code{coefficients} (model cluster centers),
-#' \code{size} (number of data points in each cluster), and 
\code{cluster}
-#' (cluster centers of the transformed data).
+#' \code{size} (number of data points in each cluster), 
\code{cluster}
+#' (cluster centers of the transformed data), and 
\code{clusterSize}
+#' (the actual number of cluster centers. When using initMode = 
"random",
--- End diff --

OK. I will add it. For bisecting kmeans, I haven't found a case like this. 
This case only occurs when initMode is random and this behavior was due to one 
fix to kmeans implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...

2017-01-22 Thread admackin

Github user admackin commented on the issue:

https://github.com/apache/spark/pull/16652
  
I've addressed all the problems I think âÂ code style now fixed, 
MLTestingUtils patched (and verified all MLLib test cases still pass), and 
added a test case for zero-valued labels


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: spark-19115

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
Please keep updating your PR description. For example, this PR is not 
relying on `manual tests`. In addition, you also need to summarize what this PR 
did. List more details to help reviewers understand your changes and impacts. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: spark-19115

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
Let me rephrase it. If the directory specified in the `LOCATION` spec 
contains the other files, what does Hive behave? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: spark-19115

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
First, please change the PR title to `[SPARK-19115] [SQL] Supporting Create 
External Table Like Location`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16645
  
**[Test build #71825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71825/testReport)**
 for PR 16645 at commit 
[`c55a1f9`](https://github.com/apache/spark/commit/c55a1f95491b10208ccd2cdf5910e6ec813c3522).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16671
  
The connectors by some DBMS vendors are using the UNLOAD utility, which 
performs much better, and build the RDD in the connectors.

Normally, JDBC is not a good option for large table fetching and writing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16654
  
Metrics evaluate the clustering though; the details of the algorithm are 
irrelevant. This still clusters points in a continuous space so you can measure 
WSSSE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71821/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16594
  
:- ) No perfect solution, but we should use the [metric 
prefix](https://en.wikipedia.org/wiki/Metric_prefix) when the number is huge. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16594
  
SQLServer has three ways to show the plan: graphical plans, text plans, and 
XML plans. Actually, it is pretty advanced. When using the text plans, users 
can set the output formats:

1. SHOWPLAN_ALL â A reasonably complete set of data showing the estimated 
execution
plan for the query.
2. SHOWPLAN_TEXT â Provides a very limited set of data for use with tools 
like osql.exe.
It, too, only shows the estimated execution plan
3. STATISTICS PROFILE â Similar to SHOWPLAN_ALL except it represents the 
data for
the actual execution plan.

I found a 300-pages book `SQL Server Execution Plans`. For details, you can 
[download and read 
it](http://download.red-gate.com/ebooks/SQL/eBOOK_SQLServerExecutionPlans_2Ed_G_Fritchey.pdf).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16675
  
@yanboliang Thanks. Seems to have passed tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

2017-01-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16659


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16579
  
**[Test build #71824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71824/testReport)**
 for PR 16579 at commit 
[`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16659
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16579
  
The only failure is irrelevant to this PR.
```
[info] - set spark.sql.warehouse.dir *** FAILED *** (5 minutes, 0 seconds)
[info]   Timeout of './bin/spark-submit' '--class' 
'org.apache.spark.sql.hive.SetWarehouseLocationTest' '--name' 
'SetSparkWarehouseLocationTest' '--master' 'local-cluster[2,1,1024]' '--conf' 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16668#discussion_r97254989
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3406,3 +3406,28 @@ setMethod("randomSplit",
 }
 sapply(sdfs, dataFrame)
   })
+
+#' getNumPartitions
+#'
+#' Return the number of partitions
+#' Note: in order to compute the number of partition the SparkDataFrame 
has to be converted into a
+#' RDD temporarily internally.
+#'
+#' @param x A SparkDataFrame
+#' @family SparkDataFrame functions
+#' @aliases getNumPartitions,SparkDataFrame-method
+#' @rdname getNumPartitions
+#' @name getNumPartitions
+#' @export
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df <- createDataFrame(cars, numPartitions = 2)
+#' getNumPartitions(df)
+#' }
+#' @note getNumPartitions since 2.1.1
+setMethod("getNumPartitions",
+  signature(x = "SparkDataFrame"),
+  function(x) {
+getNumPartitionsRDD(toRDD(x))
--- End diff --

shall we add the `getNumPartitions` to `DataFrame/Dataset` at scala side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16579
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16594
  
As of MySQL 5.7.3, the EXPLAIN statement is changed so that the effect of 
the EXTENDED keyword is always enabled. 
```
mysql> EXPLAIN EXTENDED
-> SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G
*** 1. row ***
   id: 1
  select_type: PRIMARY
table: t1
 type: index
possible_keys: NULL
  key: PRIMARY
  key_len: 4
  ref: NULL
 rows: 4
 filtered: 100.00
Extra: Using index
*** 2. row ***
   id: 2
  select_type: SUBQUERY
table: t2
 type: index
possible_keys: a
  key: a
  key_len: 5
  ref: NULL
 rows: 3
 filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.00 sec)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71822/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16579
  
**[Test build #71822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71822/testReport)**
 for PR 16579 at commit 
[`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71818/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16659
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16659
  
**[Test build #71818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71818/testReport)**
 for PR 16659 at commit 
[`0753ee6`](https://github.com/apache/spark/commit/0753ee6da4d5698d3a30d89e60ec45aca9e18f35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16594
  
PostgreSQL has [a few different options in the EXPLAIN 
command](https://www.postgresql.org/docs/9.3/static/sql-explain.html):
```
EXPLAIN SELECT * FROM foo WHERE i = 4;

 QUERY PLAN
--
 Index Scan using fi on foo  (cost=0.00..5.98 rows=1 width=4)
   Index Cond: (i = 4)
(2 rows)
```
The same plan with cost estimates suppressed:
```
EXPLAIN (COSTS FALSE) SELECT * FROM foo WHERE i = 4;

QUERY PLAN

 Index Scan using fi on foo
   Index Cond: (i = 4)
(2 rows)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16552#discussion_r97253775
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1353,6 +1353,15 @@ class HiveDDLSuite
   sql("INSERT INTO t SELECT 2, 'b'")
   checkAnswer(spark.table("t"), Row(9, "x") :: Row(2, "b") :: Nil)
 
+  Seq(10 -> "y").toDF("i", "j")
--- End diff --

please add a new test, to append to a hive table, also test append to a 
data source table with hive provider and check the error message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16594
  
DB2 has a tool to format the contents of the EXPLAIN tables. Below is an 
example of the output with explanation:

![screenshot 2017-01-22 21 05 
45](https://cloud.githubusercontent.com/assets/11567269/22192191/b054c198-e0e6-11e6-8d64-807c5e196e1b.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16344
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16344
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71823/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16344
  
**[Test build #71823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71823/testReport)**
 for PR 16344 at commit 
[`54da2cb`](https://github.com/apache/spark/commit/54da2cbbb53ddde3a91ef6d0d98128d8c7f3deb8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16672: [SPARK-19329][SQL]insert data to a not exist location da...

2017-01-22 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16672
  
In hive:
1. read a table with non-existing path, no exception and return 0 rows
2. read a table with non-permission path, throw runtime exception
```
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
Unable to determine if hdfs:/tmp/noownerpermission is encrypted: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=test, access=READ, inode="/tmp/noownerpermission":hadoop:hadoop:drwxr-x--x
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)

```
 
3. write to a non-exist path ,it will create it and insert data to it, 
everything is ok

4. write to a non-permission path, it will throw an exception

5.  alter table set location to a non-permission path, it is ok



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16659
  
LGTM pending test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16587
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16579
  
LGTM pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16669: [SPARK-16101][SQL] Refactoring CSV read path to be consi...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16669
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16675
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71820/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16675
  
**[Test build #71820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71820/testReport)**
 for PR 16675 at commit 
[`97b0a1c`](https://github.com/apache/spark/commit/97b0a1c9e5f7bfdae2407d5017418f3dda9a1e71).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16669: [SPARK-16101][SQL] Refactoring CSV read path to b...

2017-01-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16669


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...

2017-01-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16587


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16594
  
Let us do some research how the other RDBMSs are doing it? For example, 
Oracle
```
SQL> explain plan for select * from product;
Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-
Plan hash value: 3917577207
-
| Id  | Operation  | Name| Rows  | Bytes |
-
|   0 | SELECT STATEMENT   | | 15856 |  1254K|
|   1 |  TABLE ACCESS FULL | PRODUCT | 15856 |  1254K|
-
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16594
  
@rxin Can we add a flag to enable or disable it? Currently there's no other 
way to see size and row count except debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...

2017-01-22 Thread djvulee

Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16671
  
@HyukjinKwon One assumption behind this design is that the specified column 
has index in most real scenario, so the table scan cost is not much high. 

What I observed is that most large table has sharding, so count cost is 
acceptable, this is the reason 
why we cost less time in a 5M rows table than in a 1M rows table. If we use 
the `repartition`,  there is a bottleneck  when loading data from DB and high 
cost for `repartition`.

Anyway, this solution is expensive indeed and not a good one, maybe the 
best way is using the Spark connectors provided by the DBMS vendors as 
@gatorsmile suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16587
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16579
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16594
  
sorry this explain plan makes no sense -- it is impossible to read.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97250719
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

Oh, I understand. Thanks. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71819/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16675
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16675
  
**[Test build #71819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71819/testReport)**
 for PR 16675 at commit 
[`c2b4132`](https://github.com/apache/spark/commit/c2b41324f8f6e2e1db3bd121b9e29fd9d6a5d98c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97250587
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

oh, i meant the final Jenkins test result is failed. nvm, i think it is 
still useful so we can better infer which test causes the failure if we don't 
interfere other tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16636
  
Ideally the table schema must be specified or inferred before saving to 
metastore, however, for hive serde tables, we have to save it to metastore 
first, and let the hive metastore to infer the schema. Is it possible we can 
extract the schema inference logic from hive metastore? so that we can make 
data source tables and hive serde tables more consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97250343
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

Maybe, we are confusing on *terms*.
- You meant the other test *statements*.
- I meant the other test *cases*


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97250196
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

:) The point is `the other test cases` are still running.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97250113
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

it is failed. isn't it??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249959
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

The whole Jenkins test does not fail. You can see the test report in the PR 
description. Here.


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249854
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

hmm, when it throws an exception, the whole test fails. does it still 
matter it interferes other tests or not. :-)

it is harmless to keep this try block, anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16344
  
@yanboliang Thanks so much for your detailed review. Your suggestions make 
lots of sense and I have included all of them in the new commit. Let me know if 
there is any other change needed. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...

2017-01-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16671
  
FWIW, I am negative of this approach too. It does not look a good solution 
to require full table scans to resolve skew between partitions.

As said, it is not good for a large table. Then, why don't we just 
repartition if the data is expected to be not quite large if we _should_ 
resolve the skew?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16344
  
**[Test build #71823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71823/testReport)**
 for PR 16344 at commit 
[`54da2cb`](https://github.com/apache/spark/commit/54da2cbbb53ddde3a91ef6d0d98128d8c7f3deb8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249644
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

Yes, but we need to clean up `spark.test` in order not to interrupt the 
other test cases here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249538
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

but you don't catch anything actually? so if any regression in the future, 
is it different with a try or not? you still see an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249317
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

However, IMO, it's needed if there occurs some regression for this case in 
the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16579
  
**[Test build #71822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71822/testReport)**
 for PR 16579 at commit 
[`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16675
  
Looks good, I'll merge if it passes test. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249218
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

Ah, I see what you meant. Actually, previously, `SET -v` raises exceptions, 
so this case use `try` and `catch`. But, as you mentioned, now it's not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97249076
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+spark.sessionState.conf.clear()
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
--- End diff --

oh, i meant that you actually don't need a try {} finally {} here. you 
don't cache anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16579
  
**[Test build #71821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71821/testReport)**
 for PR 16579 at commit 
[`7061cd9`](https://github.com/apache/spark/commit/7061cd9ccd5684301efb2c6c6a8b05af36f65417).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16594
  
@hvanhovell I've updated the description which shows a simple example.

The explained plan will become hard to read when joining many tables and 
sizeInBytes is computed by the simple way (non-cbo way), i.e. we just multiply 
all the sizes of these tables, then sizeInBytes becomes a super large value 
(could be more than a hundred digits).
e.g. part of the explained plan of tpcds q31 looks like this (not using 
cbo):
```
== Optimized Logical Plan ==
Sort [ca_county#67 ASC NULLS FIRST], true: 
sizeInBytes=230,651,011,002,878,340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
+- Project [ca_county#67, d_year#38, CheckOverflow((web_sales#769 / 
web_sales#6), DecimalType(37,20)) AS web_q1_q2_increase#1, 
CheckOverflow((store_sales#387 / store_sales#5), DecimalType(37,20)) AS 
store_q1_q2_increase#2, CheckOverflow((web_sales#960 / web_sales#769), 
DecimalType(37,20)) AS web_q2_q3_increase#3, CheckOverflow((store_sales#578 / 
store_sales#387), DecimalType(37,20)) AS store_q2_q3_increase#4]: 
sizeInBytes=230,651,011,002,878,340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
   +- Join Inner, ((ca_county#271 = ca_county#1132) && (CASE WHEN 
(web_sales#769 > 0.00) THEN CheckOverflow((web_sales#960 / web_sales#769), 
DecimalType(37,20)) ELSE null END > CASE WHEN (store_sales#387 > 0.00) THEN 
CheckOverflow((store_sales#578 / store_sales#387), DecimalType(37,20)) ELSE 
null END)): 
sizeInBytes=288,313,763,753,597,950,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
  :- Project [ca_county#67, d_year#38, store_sales#5, store_sales#387, 
store_sales#578, ca_county#271, web_sales#6, web_sales#769]: 
sizeInBytes=19,387,614,432,995,145,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
  :  +- Join Inner, ((ca_county#271 = ca_county#941) && (CASE WHEN 
(web_sales#6 > 0.00) THEN CheckOverflow((web_sales#769 / web_sales#6), 
DecimalType(37,20)) ELSE null END > CASE WHEN (store_sales#5 > 0.00) THEN 
CheckOverflow((store_sales#387 / store_sales#5), DecimalType(37,20)) ELSE null 
END)): 
sizeInBytes=23,602,313,222,776,697,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
  : :- Join Inner, (ca_county#67 = ca_county#271): 
sizeInBytes=1,587,133,900,693,866,200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
  : :  :- Project [ca_county#67, d_year#38, store_sales#5, 
store_sales#387, store_sales#578]: 
sizeInBytes=106,726,573,575,883,570,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
  : :  :  +- Join Inner, (ca_county#559 = ca_county#750): 
sizeInBytes=182,959,840,415,800,400,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
 isBroadcastable=false
  : :  : :- Join Inner, (ca_county#67 = ca_county#559): 
sizeInBytes=3,338,025,720,406,215,000,000,000,000,000,000,000,000,000,000, 
isBroadcastable=false
  : :  : :  :- Aggregate [ca_county#67, d_qoy#42, d_year#38], 
[ca_county#67, d_year#38, 
MakeDecimal(sum(UnscaledValue(ss_ext_sales_price#24)),17,2) AS store_sales#5]: 
sizeInBytes=60,900,882,318,058,550,000,000, isBroadcastable=false
  : :  : :  :  +- Project [ss_ext_sales_price#24, d_year#38, 
d_qoy#42, ca_county#67]: sizeInBytes=66,990,970,549,864,410,000,000, 
isBroadcastable=false
  : :  : :  : +- Join Inner, (ss_addr_sk#15 = 
ca_address_sk#60): sizeInBytes=79,171,147,013,476,130,000,000, 
isBroadcastable=false
  : :  : :  ::- Project [ss_addr_sk#15, 
ss_ext_sales_price#24, d_year#38, d_qoy#42]: sizeInBytes=3,963,069,503,456,967, 
isBroadcastable=false
  : :  : :  ::  +- Join Inner, (ss_sold_date_sk#9 = 
d_date_sk#32): sizeInBytes=5,095,375,075,873,244, isBroadcastable=false
  : :  : :  :: :- Project [ss_sold_date_sk#9, 
ss_addr_sk#15, ss_ext_sales_price#24]: sizeInBytes=39,847,153,628, 
isBroadcastable=false
  : :  : :  :: :  +- Filter 
(isnotnull(ss_sold_date_sk#9) && isnotnull(ss_addr_sk#15)): 
sizeInBytes=245,724,114,045, isBroadcastable=false
  : :  : :  :: : +-

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16579
  
Thank you, @viirya .
I noticed that `spark.sessionState.conf.clear()` is useless. I removed that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16675
  
@yanboliang Thanks for the quick response. How about the new commit, where 
I just change the value from `getFamily` to lower case when necessary, i.e., in 
the calculation of p-value and dispersion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16675
  
**[Test build #71820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71820/testReport)**
 for PR 16675 at commit 
[`97b0a1c`](https://github.com/apache/spark/commit/97b0a1c9e5f7bfdae2407d5017418f3dda9a1e71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...

2017-01-22 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16579#discussion_r97248522
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 spark.sessionState.conf.clear()
   }
 
+  test("SPARK-19218 SET command should show a result in a sorted order") {
+val overrideConfs = sql("SET").collect()
+sql(s"SET test.key3=1")
+sql(s"SET test.key2=2")
+sql(s"SET test.key1=3")
+val result = sql("SET").collect()
+assert(result ===
+  (overrideConfs ++ Seq(
+Row("test.key1", "3"),
+Row("test.key2", "2"),
+Row("test.key3", "1"))).sortBy(_.getString(0))
+)
+  }
+
+  test("SPARK-19218 `SET -v` should not fail with null value 
configuration") {
+import SQLConf._
+val confEntry = 
SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null)
+
+try {
+  val result = sql("SET -v").collect()
+  assert(result === result.sortBy(_.getString(0)))
+  spark.sessionState.conf.clear()
+} finally {
--- End diff --

nit: try ... finally seems redundant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16675
  
@actuaryzhang I think the change is not appropriate, the function 
```getFamily``` should return the raw value that users specified, this is the 
cause that I didn't change them in #16516 . Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71816/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16579
  
**[Test build #71816 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71816/testReport)**
 for PR 16579 at commit 
[`387ab59`](https://github.com/apache/spark/commit/387ab590b8af301433e888e2d7731213e4e254a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16675
  
I would prefer that `getFamily` returns lower case values directly, because 
using `getFamily.toLowerCase` can get very cumbersome and I use this a lot in 
another PR #16344.  If we want to keep `getFamily` to retrieve the raw value of 
family, then I can create a private method `getFamilyLowerCase`.  Please advise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16675
  
**[Test build #71819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71819/testReport)**
 for PR 16675 at commit 
[`c2b4132`](https://github.com/apache/spark/commit/c2b41324f8f6e2e1db3bd121b9e29fd9d6a5d98c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...

2017-01-22 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16636#discussion_r97247351
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1527,6 +1527,21 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  test("create a data source table without schema") {
+import testImplicits._
+withTempPath { tempDir =>
+  withTable("tab1", "tab2") {
+(("a", "b") :: Nil).toDF().write.json(tempDir.getCanonicalPath)
+
+val e = intercept[AnalysisException] { sql("CREATE TABLE tab1 
USING json") }.getMessage
--- End diff --

we should also test a data source that can infer schema without files(e.g. 
LibSVM data source has fixed schema). Ideally we should only fail if the given 
data source can't infer schema without files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

2017-01-22 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/16479
  
i will just copy the conversion code over for now thx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16675: [SPARK-19155][ML] make getFamily case insensitive

2017-01-22 Thread actuaryzhang

GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/16675

[SPARK-19155][ML] make getFamily case insensitive

## What changes were proposed in this pull request?
This is a supplement to PR #16516 which did not make the value from 
`getFamily` case insensitive. This affects the calculation of `dispersion` and 
`pValue` since the value of family is checked there: ` model.getFamily == 
Binomial.name || model.getFamily == Poisson.name) `. Current tests of 
poisson/binomial glm with weight fail when specifying 'Poisson' or 'Binomial'. 

A simple fix is to is to convert the value of `getFamily` to lower case:
```
def getFamily: String = $(family).toLowerCase
```

## How was this patch tested?
Update existing tests for 'Poisson' and 'Binomial'. 

@yanboliang @felixcheung @imatiach-msft 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark family

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16675


commit d33e2f135ae62df20337e2752753bcda2756a73d
Author: actuaryzhang 
Date:   2017-01-23T02:59:12Z

make getFamily case insensitive




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16659
  
**[Test build #71818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71818/testReport)**
 for PR 16659 at commit 
[`0753ee6`](https://github.com/apache/spark/commit/0753ee6da4d5698d3a30d89e60ec45aca9e18f35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16344
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16344
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71817/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16579
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 292 matches

Mail list logo