[GitHub] spark issue #16990: [SPARK-19660][CORE][SQL] Replace the configuration prope...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16990
  
**[Test build #73205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73205/testReport)**
 for PR 16990 at commit 
[`d8e5862`](https://github.com/apache/spark/commit/d8e58627e866f114fd9df6bdb953947b47aaea95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-20 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17000
  
Is the speedup coming mostly from the `MultivariateOnlineSummarizer` stage?

See https://issues.apache.org/jira/browse/SPARK-19634 which is for porting 
this operation to use DataFrame UDAF and for computing only the required 
metrics (instead of forcing computing all as is done currently). I wonder how 
that will compare?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-20 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17001
  
cc @gatorsmile @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL][WIP]create table with hiveenabled in ...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL][WIP]create table with hiveenabled in ...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73201/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL][WIP]create table with hiveenabled in ...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73201/testReport)**
 for PR 17001 at commit 
[`a2c9168`](https://github.com/apache/spark/commit/a2c91682b3824160bd1095e5b61e932a022f3672).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-20 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
cc @srowen also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

2017-02-20 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16971#discussion_r102146260
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala 
---
@@ -78,7 +80,13 @@ object StatFunctions extends Logging {
 def apply(summaries: Array[QuantileSummaries], row: Row): 
Array[QuantileSummaries] = {
   var i = 0
   while (i < summaries.length) {
-summaries(i) = summaries(i).insert(row.getDouble(i))
+val item = row(i)
--- End diff --

This works, though perhaps we can do:

```scala
if (!row.isNullAt(i)) {
  val v = row.getDouble(i)
  if (!v.isNaN) {
summaries(i) = summaries(i).insert(v)
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

2017-02-20 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16971#discussion_r102145908
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -89,18 +89,17 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities of each 
column
*
-   * @note Rows containing any null or NaN values will be removed before 
calculation. If
-   *   the dataframe is empty or all rows contain null or NaN, null is 
returned.
+   * @note null and NaN values will be removed from the numerical column 
before calculation. If
+   *   the dataframe is empty, or all rows in some column contain null or 
NaN, null is returned.
*
* @since 2.2.0
*/
   def approxQuantile(
   cols: Array[String],
   probabilities: Array[Double],
   relativeError: Double): Array[Array[Double]] = {
-// TODO: Update NaN/null handling to keep consistent with the 
single-column version
 try {
-  StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): 
_*).na.drop(), cols,
+  StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*), 
cols,
 probabilities, relativeError).map(_.toArray).toArray
 } catch {
   case e: NoSuchElementException => null
--- End diff --

This went in for the other PR but I still question whether we should be 
returning `null` here. Is this standard in SparkSQL? What about returning an 
empty `Array`? cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

2017-02-20 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16971#discussion_r102145412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala 
---
@@ -54,6 +54,8 @@ object StatFunctions extends Logging {
*   Note that values greater than 1 are accepted but give the same 
result as 1.
*
* @return for each column, returns the requested approximations
+   *
+   * @note null and NaN values will be removed from the numerical column 
before calculation.
--- End diff --

I think "will be ignored" is more accurate than "will be removed"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

2017-02-20 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16971#discussion_r102145538
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -89,18 +89,17 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities of each 
column
*
-   * @note Rows containing any null or NaN values will be removed before 
calculation. If
-   *   the dataframe is empty or all rows contain null or NaN, null is 
returned.
+   * @note null and NaN values will be removed from the numerical column 
before calculation. If
--- End diff --

Again, "ignored" is slightly better than "removed from"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

2017-02-20 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16971#discussion_r102146144
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
@@ -214,20 +214,29 @@ class DataFrameStatSuite extends QueryTest with 
SharedSQLContext {
 val q1 = 0.5
 val q2 = 0.8
 val epsilon = 0.1
-val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0), 
Row(1.0, 1.0),
+val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0), 
Row(1.0, -1.0),
   Row(-1.0, Double.NaN), Row(Double.NaN, Double.NaN), Row(null, null), 
Row(null, 1.0),
   Row(-1.0, null), Row(Double.NaN, null)))
 val schema = StructType(Seq(StructField("input1", DoubleType, nullable 
= true),
   StructField("input2", DoubleType, nullable = true)))
 val dfNaN = spark.createDataFrame(rows, schema)
-val resNaN = dfNaN.stat.approxQuantile("input1", Array(q1, q2), 
epsilon)
-assert(resNaN.count(_.isNaN) === 0)
-assert(resNaN.count(_ == null) === 0)
+val resNaN1 = dfNaN.stat.approxQuantile("input1", Array(q1, q2), 
epsilon)
+assert(resNaN1.count(_.isNaN) === 0)
+assert(resNaN1.count(_ == null) === 0)
 
-val resNaN2 = dfNaN.stat.approxQuantile(Array("input1", "input2"),
+val resNaN2 = dfNaN.stat.approxQuantile("input2", Array(q1, q2), 
epsilon)
+assert(resNaN2.count(_.isNaN) === 0)
+assert(resNaN2.count(_ == null) === 0)
+
+val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2"),
   Array(q1, q2), epsilon)
-assert(resNaN2.flatten.count(_.isNaN) === 0)
-assert(resNaN2.flatten.count(_ == null) === 0)
+assert(resNaNAll.flatten.count(_.isNaN) === 0)
+assert(resNaNAll.flatten.count(_ == null) === 0)
+
+assert(resNaN1(0) === resNaNAll(0)(0))
+assert(resNaN1(1) === resNaNAll(0)(1))
+assert(resNaN2(0) === resNaNAll(1)(0))
+assert(resNaN2(1) === resNaNAll(1)(1))
--- End diff --

Do we need a test for one column all nulls (that it returns null)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16594
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73200/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16594
  
**[Test build #73200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73200/testReport)**
 for PR 16594 at commit 
[`491ec8f`](https://github.com/apache/spark/commit/491ec8f3529bfb552fdae9dcd9c13bc2984f91ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-20 Thread ZunwenYou
Github user ZunwenYou commented on the issue:

https://github.com/apache/spark/pull/17000
  
Hi, @hhbyyh 

In our experiment, the class **_MultivariateOnlineSummarizer_** contains 8 
arrays, if the dimension reaches 20 million, the memory of 
MultivariateOnlineSummarizer is 1280M(8Bit* 20M * 8).

The experiment configuration as follows:
spark.driver.maxResultSize 6g
spark.kryoserializer.buffer.max 2047m
driver-memory 20g 
num-executors 100 
executor-cores 2 
executor-memory 15g

RDD and aggregate parameter:
RDD partition number 300
treeAggregate depth 5
As the description of configuration, treeAggregate will run into four 
stages, each stage task number is 300, 75, 18, 4.
At the last stage of treeAggrate, tasks will be killed, because executors 
throw exception _**java.lang.OutOfMemoryError: Requested array size exceeds VM 
limit**_. 
I set treeAggregate depth=7, executor-memory=30g, the last stage still 
failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102143747
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -217,7 +217,8 @@ class HiveSparkSubmitSuite
 runSparkSubmit(args)
   }
 
-  test("set hive.metastore.warehouse.dir") {
+  // TODO: SPARK-19540 re-enable this test
+  ignore("set hive.metastore.warehouse.dir") {
--- End diff --

What is the reasons we need to ignore them? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102143663
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSessionStateSuite.scala 
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.{SparkContext, SparkFunSuite}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.test.SharedSQLContext
+
+class HiveSessionStateSuite  extends SparkFunSuite with SharedSQLContext {
--- End diff --

I am worrying about the test case coverage. How about also extending the 
test suite for `SessionState`? It can automatically running them without 
copying and pasting the test cases here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17009
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73199/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17009
  
**[Test build #73199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73199/testReport)**
 for PR 17009 at commit 
[`f24bf52`](https://github.com/apache/spark/commit/f24bf52e9712bf7879deef4a4565fcc5d9497237).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16826
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16826
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73202/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16826
  
**[Test build #73202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73202/testReport)**
 for PR 16826 at commit 
[`847b484`](https://github.com/apache/spark/commit/847b484ca1ef416ae16952c7de156c6cade23cf1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102142543
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -117,9 +122,11 @@ class SparkSession private(
   @InterfaceStability.Unstable
   @transient
   lazy val sessionState: SessionState = {
-SparkSession.reflect[SessionState, SparkSession](
-  SparkSession.sessionStateClassName(sparkContext.conf),
-  self)
+parentSessionState
+  .map(_.copy(this))
--- End diff --

As this is a lazy val, the cloned `SparkSession` will not copy parent 
`SessionState` immediately. So if the parent `SessionState` is changed, e.g., 
new functions registered, before this lazy variable initialization, the cloned 
one will also have new functions.

 It will not match the description `...Changes to base session are not 
propagated to cloned session, cloned is independent after creation...` below.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17005: [SPARK-14659][ML] RFormula supports setting base level b...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17005
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17005: [SPARK-14659][ML] RFormula supports setting base level b...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73203/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17005: [SPARK-14659][ML] RFormula supports setting base level b...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17005
  
**[Test build #73203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73203/testReport)**
 for PR 17005 at commit 
[`7cef3bc`](https://github.com/apache/spark/commit/7cef3bc6f200457b782dedd1247ed8bf81beb902).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-02-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/11601#discussion_r102141627
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.SparkException
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+/**
+ * Params for [[Imputer]] and [[ImputerModel]].
+ */
+private[feature] trait ImputerParams extends Params with HasInputCol with 
HasOutputCol {
+
+  /**
+   * The imputation strategy.
+   * If "mean", then replace missing values using the mean value of the 
feature.
+   * If "median", then replace missing values using the approximate median 
value of the feature.
+   * Default: mean
+   *
+   * @group param
+   */
+  final val strategy: Param[String] = new Param(this, "strategy", 
"strategy for imputation. " +
+"If mean, then replace missing values using the mean value of the 
feature. " +
+"If median, then replace missing values using the median value of the 
feature.",
+
ParamValidators.inArray[String](Imputer.supportedStrategyNames.toArray))
+
+  /** @group getParam */
+  def getStrategy: String = $(strategy)
+
+  /**
+   * The placeholder for the missing values. All occurrences of 
missingValue will be imputed.
+   * Note that null values are always treated as missing.
+   * Default: Double.NaN
+   *
+   * @group param
+   */
+  final val missingValue: DoubleParam = new DoubleParam(this, 
"missingValue",
+"The placeholder for the missing values. All occurrences of 
missingValue will be imputed")
+
+  /** @group getParam */
+  def getMissingValue: Double = $(missingValue)
+
+  /** Validates and transforms the input schema. */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+val inputType = schema($(inputCol)).dataType
+SchemaUtils.checkColumnTypes(schema, $(inputCol), Seq(DoubleType, 
FloatType))
+SchemaUtils.appendColumn(schema, $(outputCol), inputType)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Imputation estimator for completing missing values, either using the 
mean or the median
+ * of the column in which the missing values are located. The input column 
should be of
+ * DoubleType or FloatType. Currently Imputer does not support categorical 
features yet
+ * (SPARK-15041) and possibly creates incorrect values for a categorical 
feature.
+ *
+ * Note that the mean/median value is computed after filtering out missing 
values.
+ * All Null values in the input column are treated as missing, and so are 
also imputed.
+ */
+@Experimental
+class Imputer @Since("2.1.0")(override val uid: String)
+  extends Estimator[ImputerModel] with ImputerParams with 
DefaultParamsWritable {
+
+  @Since("2.1.0")
+  def this() = this(Identifiable.randomUID("imputer"))
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setInputCol(value: String): this.type = set(inputCol, value)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setOutputCol(value: String): this.type = set(outputCol, value)
+
+  /**
+   * Imputation strategy. Available options are ["mean", "median"].
+   * @group setParam
+   */
+  @Since("2.1.0")
+  def setStrategy(value: String): this.type = set(strategy, value)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setMissingValue(value: Double): this.type = set(missingValue, value)
+

[GitHub] spark issue #16020: [SPARK-18596][ML] add checking and caching to bisecting ...

2017-02-20 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/16020
  
Close this as it's better resolved in 
https://issues.apache.org/jira/browse/SPARK-18608.
Thanks for the comments and discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16979: [SPARK-19617][SS]Fix the race condition when star...

2017-02-20 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16979#discussion_r102141431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -63,8 +63,34 @@ class HDFSMetadataLog[T <: AnyRef : 
ClassTag](sparkSession: SparkSession, path:
   val metadataPath = new Path(path)
   protected val fileManager = createFileManager()
 
-  if (!fileManager.exists(metadataPath)) {
-fileManager.mkdirs(metadataPath)
+  runUninterruptiblyIfLocal {
+if (!fileManager.exists(metadataPath)) {
+  fileManager.mkdirs(metadataPath)
+}
+  }
+
+  private def runUninterruptiblyIfLocal[T](body: => T): T = {
+if (fileManager.isLocalFileSystem && 
Thread.currentThread.isInstanceOf[UninterruptibleThread]) {
--- End diff --

have to change the condition here because StreamExecution will create a 
HDFSMetadata in a non UninterruptibleThread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...

2017-02-20 Thread hhbyyh
Github user hhbyyh closed the pull request at:

https://github.com/apache/spark/pull/16020


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17003: [SPARK-19646][BUILD][HOTFIX] Fix compile error fr...

2017-02-20 Thread srowen
Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/17003


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73204/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17010
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17010
  
**[Test build #73204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73204/testReport)**
 for PR 17010 at commit 
[`c4a02bc`](https://github.com/apache/spark/commit/c4a02bca4594ca10473050a85165b4bf96a4ba4e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102141351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala ---
@@ -46,4 +46,10 @@ class ExperimentalMethods private[sql]() {
 
   @volatile var extraOptimizations: Seq[Rule[LogicalPlan]] = Nil
 
+  def copy: ExperimentalMethods = {
--- End diff --

`def copy` -> `def copy()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17003: [SPARK-19646][BUILD][HOTFIX] Fix compile error from cher...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17003
  
(@srowen just a kind reminder that it seems not closed..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102140755
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -22,38 +22,58 @@ import java.io.File
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 
+import org.apache.spark.SparkContext
 import org.apache.spark.sql._
-import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.{Analyzer, FunctionRegistry}
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.optimizer.Optimizer
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.command.AnalyzeTableCommand
 import org.apache.spark.sql.execution.datasources._
-import org.apache.spark.sql.streaming.{StreamingQuery, 
StreamingQueryManager}
+import org.apache.spark.sql.streaming.StreamingQueryManager
 import org.apache.spark.sql.util.ExecutionListenerManager
 
 
 /**
  * A class that holds all session-specific state in a given 
[[SparkSession]].
  */
-private[sql] class SessionState(sparkSession: SparkSession) {
+private[sql] class SessionState(
+sparkContext: SparkContext,
+val conf: SQLConf,
+val experimentalMethods: ExperimentalMethods,
+val functionRegistry: FunctionRegistry,
+val catalog: SessionCatalog,
+val sqlParser: ParserInterface,
+val analyzer: Analyzer,
+val streamingQueryManager: StreamingQueryManager,
+val queryExecutionCreator: LogicalPlan => QueryExecution,
+val jarClassLoader: NonClosableMutableURLClassLoader) {
--- End diff --

`jarClassLoader ` is from SharedState. Do we still need it as an input parm 
for SessionState?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17006
  
For adding a test, I usually go to the blame button and check where the 
tests were added in the recent commits. It seems `FilteredScanSuite` is the 
right place assuming from 
https://github.com/ndimiduk/spark/commit/7bc9a8c6249300ded31ea931c463d0a8f798e193.

Yes, I saw that JIRA was closed. I think anyone definitely can re-open the 
JIRA if anyone is pretty sure and it could be fixed separately as a coherent 
unit. I think that's fine.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17010
  
**[Test build #73204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73204/testReport)**
 for PR 17010 at commit 
[`c4a02bc`](https://github.com/apache/spark/commit/c4a02bca4594ca10473050a85165b4bf96a4ba4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17010
  
cc @watermen @yhuai @liancheng since this PR is related to 
https://github.com/apache/spark/pull/7030


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17010
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102138925
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

https://en.wikipedia.org/wiki/Metric_prefix

Even if we do not have a unit, we still can use K, M, G, T, P, E?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102138379
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
--- End diff --

How about improving `bytesToString` and make it support PB or higher? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17007: [SPARK-19671]change 'var' to 'val' for better Spe...

2017-02-20 Thread 10110346
Github user 10110346 closed the pull request at:

https://github.com/apache/spark/pull/17007


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16997: Updated the SQL programming guide to explain about the E...

2017-02-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16997
  
You are still bold-facing code elements, and now back-ticked a string, 
which isn't code. There are still typos like "create dataset" instead of 
"create a Dataset". Do you mean to write something to indicate a class name 
will be in the message? then write something like "[class name]". There is no 
object name here. Please review carefully before you ask for another review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137730
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -92,7 +92,8 @@ case class ExecutedCommandExec(cmd: RunnableCommand) 
extends SparkPlan {
 case class ExplainCommand(
 logicalPlan: LogicalPlan,
 extended: Boolean = false,
-codegen: Boolean = false)
+codegen: Boolean = false,
+cost: Boolean = false)
--- End diff --

Please add `@parm` like the other parameters 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137661
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 if (statement == null) {
   null  // This is enough since ParseException will raise later.
 } else if (isExplainableStatement(statement)) {
-  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null)
+  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null,
+cost = ctx.COST != null)
--- End diff --

Need to fix the style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread kunalkhamar
Github user kunalkhamar commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102137355
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1178,4 +1178,34 @@ class SessionCatalog(
 }
   }
 
+  /**
+   * Get an identical copy of the `SessionCatalog`.
+   * The temporary tables and function registry are retained.
+   * The table relation cache will not be populated.
+   * @note `externalCatalog` and `globalTempViewManager` are from shared 
state, don't need deep copy
+   * `FunctionResourceLoader` is effectively stateless, also does not need 
deep copy.
+   * All arguments passed in should be associated with a particular 
`SparkSession`.
+   */
+  def copy(
+  conf: CatalystConf,
+  hadoopConf: Configuration,
+  functionRegistry: FunctionRegistry,
+  parser: ParserInterface): SessionCatalog = {
+
+val catalog = new SessionCatalog(
+  externalCatalog,
+  globalTempViewManager,
+  functionResourceLoader,
+  functionRegistry,
+  conf,
+  hadoopConf,
+  parser)
+
+catalog.currentDb = currentDb
--- End diff --

yes!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17005: [SPARK-14659][ML] RFormula supports setting base level b...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17005
  
**[Test build #73203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73203/testReport)**
 for PR 17005 at commit 
[`7cef3bc`](https://github.com/apache/spark/commit/7cef3bc6f200457b782dedd1247ed8bf81beb902).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread kunalkhamar
Github user kunalkhamar commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102137396
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -217,6 +217,7 @@ class HiveSparkSubmitSuite
 runSparkSubmit(args)
   }
 
+  /*  TODO: SPARK-19540 re-enable this test
   test("set hive.metastore.warehouse.dir") {
--- End diff --

updated to use `ignore`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137390
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -197,20 +197,32 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
   """.stripMargin.trim
   }
 
-  override def toString: String = {
+  override def toString: String = completeString(appendStats = false)
+
+  def toStringWithStats: String = completeString(appendStats = true)
+
+  def completeString(appendStats: Boolean): String = {
--- End diff --

private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...

2017-02-20 Thread ndimiduk
Github user ndimiduk commented on the issue:

https://github.com/apache/spark/pull/17006
  
Thanks for having a look @HyukjinKwon. I agree a test is needed. I'm new to 
Spark code base, so please direct me -- I assume there's an existing suite or 
suites that I can extend with coverage for this case. Also, please track the 
history I refer to on the ticket -- my use-case is not parquet, but the Elastic 
Search connector. This patch is enough to push necessary information down to 
ES, at least with my basic testing.

My original ticket was closed as a dupe of the Parquet issue; notice also 
my comment requesting SPARK-17636 be renamed so as to not be Parquet-specific. 
Would it be better to re-open my original ticket and address the catalyst 
changes there? After that, 17636 can make subsequent changes to get fixes into 
Parquet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17005: [SPARK-14659][ML] RFormula supports setting base level b...

2017-02-20 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17005
  
@HyukjinKwon Thanks. I'll try retesting this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137142
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

Always represent it using scientific notation? Or only do it when the 
number is too large?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16826
  
**[Test build #73202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73202/testReport)**
 for PR 16826 at commit 
[`847b484`](https://github.com/apache/spark/commit/847b484ca1ef416ae16952c7de156c6cade23cf1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL][WIP]create table with hiveenabled in ...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73201/testReport)**
 for PR 17001 at commit 
[`a2c9168`](https://github.com/apache/spark/commit/a2c91682b3824160bd1095e5b61e932a022f3672).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread lvdongr
Github user lvdongr commented on the issue:

https://github.com/apache/spark/pull/17010
  
Before spark1.4.x, the ThriftServer name is "SparkSQL:localhostname",while 
https://issues.apache.org/jira/browse/SPARK-8650 change the rule as a side 
effect. Then the ThriftServer show the class name of HiveThriftServer2, which 
is not appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16996
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73198/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16996
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16996
  
**[Test build #73198 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73198/testReport)**
 for PR 16996 at commit 
[`91b9fd2`](https://github.com/apache/spark/commit/91b9fd2cec7ef8ef9ab4bcf2c5468ef19139f647).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16997: Updated the SQL programming guide to explain about the E...

2017-02-20 Thread HarshSharma8
Github user HarshSharma8 commented on the issue:

https://github.com/apache/spark/pull/16997
  
Hello Sean,
I have updated the content with back-ticks, Can you have a look at this ?
And i am not getting which object-name you are asking about.


Thank You


Best Regards |
*Harsh Sharma*
Sr. Software Consultant
Facebook  | Twitter
 | Linked In

harshs...@gmail.com
Skype*: khandal60*
*+91-8447307237*

On Tue, Feb 21, 2017 at 11:03 AM, Harsh Sharma  wrote:

> Hello Sean,
> I apologize for bold instead of back-ticks, and i'm updating the content
> for this.
>
>
> Thank You
>
>
> Best Regards |
> *Harsh Sharma*
> Sr. Software Consultant
> Facebook  | Twitter
>  | Linked In
> 
> harshs...@gmail.com
> Skype*: khandal60*
> *+91-8447307237*
>
> On Tue, Feb 21, 2017 at 10:58 AM, Sean Owen 
> wrote:
>
>> *@srowen* commented on this pull request.
>> --
>>
>> In docs/sql-programming-guide.md
>> :
>>
>> > @@ -297,6 +297,9 @@ reflection and become the names of the columns. 
Case classes can also be nested
>>  types such as `Seq`s or `Array`s. This RDD can be implicitly converted 
to a DataFrame and then be
>>  registered as a table. Tables can be used in subsequent SQL statements.
>>
>> +Spark Encoders are used to convert a JVM object to Spark SQL 
representation. To create dataset, spark requires an encoder which takes the 
form of Encoder[T] where T is the type which has to be encoded. 
Creation of a dataset with a custom type of object, may result into 
java.lang.UnsupportedOperationException: No Encoder found for 
Object-Name.
>>
>> Yes, @HarshSharma8  this still doesn't
>> address the comments. Use back-ticks for code, not bold, too. What is
>> Object-Name?
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> , or 
mute
>> the thread
>> 

>> .
>>
>
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16594
  
**[Test build #73200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73200/testReport)**
 for PR 16594 at commit 
[`491ec8f`](https://github.com/apache/spark/commit/491ec8f3529bfb552fdae9dcd9c13bc2984f91ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16997: Updated the SQL programming guide to explain about the E...

2017-02-20 Thread HarshSharma8
Github user HarshSharma8 commented on the issue:

https://github.com/apache/spark/pull/16997
  
Hello Sean,
I apologize for bold instead of back-ticks, and i'm updating the content
for this.


Thank You


Best Regards |
*Harsh Sharma*
Sr. Software Consultant
Facebook  | Twitter
 | Linked In

harshs...@gmail.com
Skype*: khandal60*
*+91-8447307237*

On Tue, Feb 21, 2017 at 10:58 AM, Sean Owen 
wrote:

> *@srowen* commented on this pull request.
> --
>
> In docs/sql-programming-guide.md
> :
>
> > @@ -297,6 +297,9 @@ reflection and become the names of the columns. 
Case classes can also be nested
>  types such as `Seq`s or `Array`s. This RDD can be implicitly converted 
to a DataFrame and then be
>  registered as a table. Tables can be used in subsequent SQL statements.
>
> +Spark Encoders are used to convert a JVM object to Spark SQL 
representation. To create dataset, spark requires an encoder which takes the 
form of Encoder[T] where T is the type which has to be encoded. 
Creation of a dataset with a custom type of object, may result into 
java.lang.UnsupportedOperationException: No Encoder found for 
Object-Name.
>
> Yes, @HarshSharma8  this still doesn't
> address the comments. Use back-ticks for code, not bold, too. What is
> Object-Name?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...

2017-02-20 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/11211
  
ping @holdenk @HyukjinKwon PR is updated, please help review. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17010
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16999: [SPARK-18922][TESTS] Fix new test failures on Win...

2017-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16999


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16851: [SPARK-19508][Core] Improve error message when bi...

2017-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16851


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16851: [SPARK-19508][Core] Improve error message when binding s...

2017-02-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16851
  
Thanks! @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17010: [SPARK-19673][SQL] "ThriftServer default app name...

2017-02-20 Thread lvdongr
GitHub user lvdongr opened a pull request:

https://github.com/apache/spark/pull/17010

[SPARK-19673][SQL] "ThriftServer default app name is changed wrong"

## What changes were proposed in this pull request?
In spark 1.x ,the name of ThriftServer is SparkSQL:localHostName. While the 
ThriftServer default name is changed to the className of HiveThfift2 , which is 
not appropriate.

## How was this patch tested?
manual tests


Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvdongr/spark ThriftserverName

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17010.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17010


commit c4a02bca4594ca10473050a85165b4bf96a4ba4e
Author: lvdongr 
Date:   2017-02-21T04:37:12Z

[SPARK-19673][SQL] "ThriftServer default app name is changed wrong"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16997: Updated the SQL programming guide to explain about the E...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16997
  
BTW, could we maybe make the title complete (not `opera…`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16999: [SPARK-18922][TESTS] Fix new test failures on Windows du...

2017-02-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16999
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16997: Updated the SQL programming guide to explain abou...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16997#discussion_r102134319
  
--- Diff: docs/sql-programming-guide.md ---
@@ -297,6 +297,9 @@ reflection and become the names of the columns. Case 
classes can also be nested
 types such as `Seq`s or `Array`s. This RDD can be implicitly converted to 
a DataFrame and then be
 registered as a table. Tables can be used in subsequent SQL statements.
 
+Spark Encoders are used to convert a JVM object to Spark SQL 
representation. To create dataset, spark requires an encoder which takes the 
form of Encoder[T] where T is the type which has to be encoded. 
Creation of a dataset with a custom type of object, may result into 
java.lang.UnsupportedOperationException: No Encoder found for 
Object-Name.
--- End diff --

It is trivial.. but maybe `spark` -> `Spark`? I am not an expert in grammar 
but up to my knowledge, capitalizing a proper noun is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16851: [SPARK-19508][Core] Improve error message when binding s...

2017-02-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16851
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17009
  
**[Test build #73199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73199/testReport)**
 for PR 17009 at commit 
[`f24bf52`](https://github.com/apache/spark/commit/f24bf52e9712bf7879deef4a4565fcc5d9497237).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-20 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17009
  
cc @cloud-fan @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17005: [SPARK-14659][ML] RFormula supports setting base level b...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17005
  
(It seems it was due to 
https://github.com/apache/spark/commit/73f065569d352081b7d64c254af70ce996860c53.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-20 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16395#discussion_r102133390
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -0,0 +1,389 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.statsEstimation
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical._
+import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils._
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * In this test suite, we test predicates containing the following 
operators:
+ * =, <, <=, >, >=, AND, OR, IS NULL, IS NOT NULL, IN, NOT IN
+ */
+class FilterEstimationSuite extends StatsEstimationTestBase {
+
+  // Suppose our test table has 10 rows and 6 columns.
+  // First column cint has values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
+  // Hence, distinctCount:10, min:1, max:10, nullCount:0, avgLen:4, 
maxLen:4
+  val arInt = AttributeReference("cint", IntegerType)()
+  val childColStatInt = ColumnStat(distinctCount = 10, min = Some(1), max 
= Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4)
+
+  // Second column cdate has 10 values from 2017-01-01 through 2017-01-10.
+  val dMin = Date.valueOf("2017-01-01")
+  val dMax = Date.valueOf("2017-01-10")
+  val arDate = AttributeReference("cdate", DateType)()
+  val childColStatDate = ColumnStat(distinctCount = 10, min = Some(dMin), 
max = Some(dMax),
+nullCount = 0, avgLen = 4, maxLen = 4)
+
+  // Third column ctimestamp has 10 values from "2017-01-01 01:00:00" 
through
+  // "2017-01-01 10:00:00" for 10 distinct timestamps (or hours).
+  val tsMin = Timestamp.valueOf("2017-01-01 01:00:00")
+  val tsMax = Timestamp.valueOf("2017-01-01 10:00:00")
+  val arTimestamp = AttributeReference("ctimestamp", TimestampType)()
+  val childColStatTimestamp = ColumnStat(distinctCount = 10, min = 
Some(tsMin), max = Some(tsMax),
+nullCount = 0, avgLen = 8, maxLen = 8)
+
+  // Fourth column cdecimal has 10 values from 0.20 through 2.00 at 
increment of 0.2.
+  val decMin = new java.math.BigDecimal("0.20")
+  val decMax = new java.math.BigDecimal("2.00")
+  val arDecimal = AttributeReference("cdecimal", DecimalType(12, 2))()
+  val childColStatDecimal = ColumnStat(distinctCount = 10, min = 
Some(decMin), max = Some(decMax),
+nullCount = 0, avgLen = 8, maxLen = 8)
+
+  // Fifth column cdouble has 10 double values: 1.0, 2.0, 3.0, 4.0, 5.0, 
6.0, 7.0, 8.0, 9.0, 10.0
+  val arDouble = AttributeReference("cdouble", DoubleType)()
+  val childColStatDouble = ColumnStat(distinctCount = 10, min = Some(1.0), 
max = Some(10.0),
+nullCount = 0, avgLen = 8, maxLen = 8)
+
+  // Sixth column cstring has 10 String values:
+  // "A0", "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9"
+  val arString = AttributeReference("cstring", StringType)()
+  val childColStatString = ColumnStat(distinctCount = 10, min = None, max 
= None,
+nullCount = 0, avgLen = 2, maxLen = 2)
+
+  test("cint = 2") {
+validateEstimatedStats(
+  arInt,
+  Filter(EqualTo(arInt, Literal(2)), childStatsTestPlan(Seq(arInt))),
+  ColumnStat(distinctCount = 1, min = Some(2), max = Some(2),
+nullCount = 0, avgLen = 4, maxLen = 4),
+  Some(1L)
+)
+  }
+
+  test("cint = 0") {
+// This is an out-of-range case since 0 is outside the range [min, max]
+validateEstimatedStats(
+  arInt,
+  Filter(EqualTo(arInt, Literal(0)), childStatsTestPlan(Seq(arInt))),
+  ColumnStat(distinctCount = 10, min = Some(1), max = Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4),
+  Some(0L)
+)
+  }
+
+ 

[GitHub] spark pull request #16995: [SPARK-19340][SQL] CSV file will result in an exc...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16995#discussion_r102133102
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -374,34 +374,42 @@ case class DataSource(
   globPath
 }.toArray
 
-val (dataSchema, partitionSchema) = 
getOrInferFileFormatSchema(format)
-
-val fileCatalog = if 
(sparkSession.sqlContext.conf.manageFilesourcePartitions &&
-catalogTable.isDefined && 
catalogTable.get.tracksPartitionsInCatalog) {
-  val defaultTableSize = 
sparkSession.sessionState.conf.defaultSizeInBytes
-  new CatalogFileIndex(
-sparkSession,
-catalogTable.get,
-
catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize))
-} else {
-  new InMemoryFileIndex(sparkSession, globbedPaths, options, 
Some(partitionSchema))
-}
-
-HadoopFsRelation(
-  fileCatalog,
-  partitionSchema = partitionSchema,
-  dataSchema = dataSchema.asNullable,
-  bucketSpec = bucketSpec,
-  format,
-  caseInsensitiveOptions)(sparkSession)
-
+createHadoopRelation(format, globbedPaths)
   case _ =>
 throw new AnalysisException(
   s"$className is not a valid Spark SQL Data Source.")
 }
 
 relation
   }
+  /**
+   * Creates Hadoop relation based on format and globbed file paths
+   * @param format format of the data source file
+   * @param globPaths Path to the file resolved by Hadoop library
+   * @return Hadoop relation object
+   */
+  def createHadoopRelation(format: FileFormat,
+   globPaths: Array[Path]): BaseRelation = {
--- End diff --

Let's make this inlined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-02-20 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r102132903
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.scala
 ---
@@ -19,11 +19,15 @@ package org.apache.spark.sql.sources
 
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.execution.datasources.DataSource
 
 class ResolvedDataSourceSuite extends SparkFunSuite {
   private def getProvidingClass(name: String): Class[_] =
-DataSource(sparkSession = null, className = name).providingClass
+DataSource(
+  sparkSession = null,
+  className = name,
+  options = Map("timeZone" -> 
DateTimeUtils.defaultTimeZone().getID)).providingClass
--- End diff --

Unfortunately, we can't use the default session timezone because 
sparkSession is null here..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-02-20 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r102132850
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -859,6 +859,48 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("Write timestamps correctly with timestampFormat option and 
timeZone option") {
+withTempDir { dir =>
+  // With dateFormat option and timeZone option.
+  val timestampsWithFormatPath = 
s"${dir.getCanonicalPath}/timestampsWithFormat.csv"
+  val timestampsWithFormat = spark.read
+.format("csv")
+.option("header", "true")
+.option("inferSchema", "true")
+.option("timestampFormat", "dd/MM/ HH:mm")
+.load(testFile(datesFile))
+  timestampsWithFormat.write
+.format("csv")
+.option("header", "true")
+.option("timestampFormat", "/MM/dd HH:mm")
+.option("timeZone", "GMT")
+.save(timestampsWithFormatPath)
+
+  // This will load back the timestamps as string.
+  val stringTimestampsWithFormat = spark.read
+.format("csv")
+.option("header", "true")
+.option("inferSchema", "false")
--- End diff --

I see, I'll specify the schema in the next pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17007: [SPARK-19671]change 'var' to 'val' for better Specificat...

2017-02-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17007
  
This is too trivial -- we have generally rejected changes like this. If you 
would, please close this and the JIRA. Have a look at 
http://spark.apache.org/contributing.html too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16910
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73196/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16910
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #73196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73196/testReport)**
 for PR 16910 at commit 
[`b4caca7`](https://github.com/apache/spark/commit/b4caca761c9caffc39324b01b9e1aeecf2cc69fe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16910
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16910
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73194/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #73194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73194/testReport)**
 for PR 16910 at commit 
[`6fb2b57`](https://github.com/apache/spark/commit/6fb2b57fc2c43b1ad61c77e2b5017cbb5a0af386).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16910
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73193/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16910
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #73193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73193/testReport)**
 for PR 16910 at commit 
[`4493a8f`](https://github.com/apache/spark/commit/4493a8f96320720e82dd8a66f61a3b4ebf920116).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17008: [SPARK-19669][HOTFIX][SQL] sessionState access privilege...

2017-02-20 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17008
  
passed localy, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-02-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r102130841
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -217,6 +217,7 @@ class HiveSparkSubmitSuite
 runSparkSubmit(args)
   }
 
+  /*  TODO: SPARK-19540 re-enable this test
   test("set hive.metastore.warehouse.dir") {
--- End diff --

Is it better to use `ignore`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17006
  
See 
https://github.com/apache/spark/blob/7730426cb95eec2652a9ea979ae2c4faf7e585f2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L158-L160
 too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...

2017-02-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17006
  
BTW, even if we push the dot-separated names, this would not support the 
push-down in Parquet because `ParquetFilters` itself does not handle nested 
columns (see 
https://github.com/apache/spark/blob/7730426cb95eec2652a9ea979ae2c4faf7e585f2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L187-L234).
 

As it checks the field names in the schema, it pushs down nothing. Also, I 
think we definitely need a test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16996
  
cc @yhuai who did the original change. I am not sure whether we need to 
overwrite the original value of hadoopConf, although the change does not hurt 
anything IMO. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17008: [SPARK-19669][HOTFIX][SQL] sessionState access privilege...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17008
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17008: [SPARK-19669][HOTFIX][SQL] sessionState access privilege...

2017-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17008
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73192/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17008: [SPARK-19669][HOTFIX][SQL] sessionState access privilege...

2017-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17008
  
**[Test build #73192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73192/testReport)**
 for PR 17008 at commit 
[`99db6cf`](https://github.com/apache/spark/commit/99db6cf1a5766213fa42726398067375c783).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >