date:20170102

[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...

2017-01-02 Thread azmras

Github user azmras commented on a diff in the pull request:

https://github.com/apache/spark/pull/16429#discussion_r94367710
  
--- Diff: python/pyspark/serializers.py ---
@@ -382,18 +382,30 @@ def _hijack_namedtuple():
 return
 
 global _old_namedtuple  # or it will put in closure
+global _old_namedtuple_kwdefaults  # or it will put in closure too
 
 def _copy_func(f):
 return types.FunctionType(f.__code__, f.__globals__, f.__name__,
   f.__defaults__, f.__closure__)
 
+def _kwdefaults(f):
+kargs = getattr(f, "__kwdefaults__", None)
--- End diff --

after applying patch can you try to run 
sc.parallelize(range(100), 8)
and confirm that it is working, because it is not...
and serialisation of objects goes crazy.. 

Thanks for your efforts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16233
  
**[Test build #70806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70806/testReport)**
 for PR 16233 at commit 
[`19bc8eb`](https://github.com/apache/spark/commit/19bc8ebf27a54bf260e92dd3dd7114ded19cacfb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94367053
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,39 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is resolved with Hive support, else 
it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is resolved
+ *  with Hive support, else it should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
+child: Option[LogicalPlan] = None) extends LogicalPlan with 
MultiInstanceRelation {
--- End diff --

When Hive support is not provided, we don't parse the plan from the 
`CatalogTable.viewText`, so the child will be None.
Do you have any suggestions on how should we update the param comment to 
make it more clear? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94366859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,39 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is resolved with Hive support, else 
it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is resolved
+ *  with Hive support, else it should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
--- End diff --

It may looks a little over-engineering for now, but that enables us to 
decouple planning of query from the planning of the view, which allows us to 
cache resolved views in the future. So perhaps we'd better keep this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94366640
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
--- End diff --

We use the param `defaultDatabase` to look up the view with an empty 
database part. After we have added the `AnalysisContext`, I think the param can 
be removed and we always get the default database from 
`AnalysisContext.get.defaultDatabase`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-02 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15314
  
ping @srowen @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #70805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70805/testReport)**
 for PR 12135 at commit 
[`1b2df22`](https://github.com/apache/spark/commit/1b2df228050857bc404892aa8aeeb997062795a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16371
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15880
  
Just for your reference, below is the conversion charts of MS SQL Server. 
It includes both implicit and explicit conversion rules. 

![screenshot 2017-01-02 23 18 
56](https://cloud.githubusercontent.com/assets/11567269/21601706/e822a07c-d141-11e6-8dcc-6328835a77dd.png)

Source: https://msdn.microsoft.com/en-us/library/ms191530.aspx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94365473
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,39 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is resolved with Hive support, else 
it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is resolved
+ *  with Hive support, else it should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
+child: Option[LogicalPlan] = None) extends LogicalPlan with 
MultiInstanceRelation {
--- End diff --

When will the child be `None`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94365429
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,39 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is resolved with Hive support, else 
it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is resolved
+ *  with Hive support, else it should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
--- End diff --

Why can't we just use `def output = child.output`? If we wanna reorder the 
columns according to the original view schema, we can wrap the child with 
`Project`, e.g.
```
// The relation is a view, so we wrap the relation by:
// 1. Add a [[View]] operator over the relation to keep track of the view 
desc;
// 2. Wrap the logical plan in a [[SubqueryAlias]] which tracks the name of 
the view.
val viewPlan = sparkSession.sessionState.sqlParser.parsePlan(viewText)
val child = View(desc = table, child = Some(Project(schema.map(f => 
UnresolveAttribute(Seq(f.name, viewPlan))
SubqueryAlias(alias.getOrElse(table.identifier.table), child, 
Option(table.identifier))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15240
  
**[Test build #70804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70804/testReport)**
 for PR 15240 at commit 
[`cdab885`](https://github.com/apache/spark/commit/cdab8854466fe816663b4fa1a981e0654c526658).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70803/testReport)**
 for PR 15324 at commit 
[`df29d10`](https://github.com/apache/spark/commit/df29d10e61afe2f5e43346679fe30041b9e46a8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...

2017-01-02 Thread scwf

Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15240
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70799/testReport)**
 for PR 15324 at commit 
[`6e2d066`](https://github.com/apache/spark/commit/6e2d06624c2bf6c46b5bc319836a35b488b4b3e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70799/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70802 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70802/testReport)**
 for PR 15324 at commit 
[`8e0de62`](https://github.com/apache/spark/commit/8e0de623a9fc8a19f0704e7127cd4bc4573d1f59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70801 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70801/testReport)**
 for PR 16371 at commit 
[`15a10ee`](https://github.com/apache/spark/commit/15a10eebe272428841772a58d06f2e889d70b75c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94364696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
--- End diff --

Where do we use the `defaultDatabase` parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16233
  
**[Test build #70800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70800/testReport)**
 for PR 16233 at commit 
[`de4a80e`](https://github.com/apache/spark/commit/de4a80e5cd726b8b93c6cc8ac29bb8ec4504b370).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2017-01-02 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/15730
  
@WeichenXu123 Thanks! Will take a look once I get back from vacation (in a 
week). Happy new year!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94363919
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -88,19 +92,19 @@ abstract class Collect extends ImperativeAggregate {
 case class CollectList(
 child: Expression,
 mutableAggBufferOffset: Int = 0,
-inputAggBufferOffset: Int = 0) extends Collect {
+inputAggBufferOffset: Int = 0) extends Collect[ArrayBuffer[Any]] {
 
   def this(child: Expression) = this(child, 0, 0)
 
-  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): ImperativeAggregate =
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): CollectList =
 copy(mutableAggBufferOffset = newMutableAggBufferOffset)
 
   override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
ImperativeAggregate =
 copy(inputAggBufferOffset = newInputAggBufferOffset)
 
-  override def prettyName: String = "collect_list"
+  override def createAggregationBuffer(): ArrayBuffer[Any] = new 
ArrayBuffer[Any]()
 
-  override protected[this] val buffer: mutable.ArrayBuffer[Any] = 
mutable.ArrayBuffer.empty
--- End diff --

`mutable.ArrayBuffer.empty` looks better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94363879
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -88,19 +92,19 @@ abstract class Collect extends ImperativeAggregate {
 case class CollectList(
 child: Expression,
 mutableAggBufferOffset: Int = 0,
-inputAggBufferOffset: Int = 0) extends Collect {
+inputAggBufferOffset: Int = 0) extends Collect[ArrayBuffer[Any]] {
 
   def this(child: Expression) = this(child, 0, 0)
 
-  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): ImperativeAggregate =
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): CollectList =
--- End diff --

unnecessary change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94363844
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -16,16 +16,16 @@
  */
 package org.apache.spark.sql.catalyst.optimizer
 
-import org.apache.spark.sql.catalyst.SimpleCatalystConf
+import org.apache.spark.sql.catalyst.{InternalRow, SimpleCatalystConf}
 import org.apache.spark.sql.catalyst.analysis.{Analyzer, 
EmptyFunctionRegistry}
 import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, 
SessionCatalog}
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
-import org.apache.spark.sql.catalyst.expressions.{If, Literal}
-import org.apache.spark.sql.catalyst.expressions.aggregate.{CollectSet, 
Count}
+import org.apache.spark.sql.catalyst.expressions.{Expression, If, Literal}
+import org.apache.spark.sql.catalyst.expressions.aggregate.{CollectSet, 
Count, ImperativeAggregate, TypedImperativeAggregate}
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Expand, 
LocalRelation, LogicalPlan}
-import org.apache.spark.sql.types.{IntegerType, StringType}
+import org.apache.spark.sql.types.{DataType, IntegerType, StringType}
--- End diff --

please revert unnecessary changes in `import`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16320
  
The test case coverage in the suite `CSVInferSchemaSuite.scala` looks 
random. I am afraid the future code changes could easily break the existing 
type inference rules. Could you improve it in a separate PR? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70799/testReport)**
 for PR 15324 at commit 
[`6e2d066`](https://github.com/apache/spark/commit/6e2d06624c2bf6c46b5bc319836a35b488b4b3e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16320
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70795/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16320
  
**[Test build #70795 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70795/testReport)**
 for PR 16320 at commit 
[`393d3a9`](https://github.com/apache/spark/commit/393d3a9ceaa6d92a301b5a2917e28d29518c1638).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2017-01-02 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15730
  
@brkyvz I update code and attach a running result screenshot, waiting for 
your review, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16320
  
**[Test build #70798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70798/testReport)**
 for PR 16320 at commit 
[`e59631b`](https://github.com/apache/spark/commit/e59631bd54872a03eaa63cc74d0e245300bbc781).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16320
  
Yep. I added the testcase, too. @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread merlintang

Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94361979
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

 Sorry Xiao, since one of my best friend is Tao. :). Sorry. It is updated.  
Thanks again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16255: [SPARK-18609][SQL]Fix when CTE with Join between ...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16255#discussion_r94361717
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] 
{
 case plan: Project if plan eq proj => plan.child
 case plan => plan transformExpressions {
   case a: Attribute if attrMap.contains(a) => attrMap(a)
+  case b: Alias if attrMap.exists(_._1.exprId == b.exprId)
+&& b.child.isInstanceOf[NamedExpression] => b.child
--- End diff --

How do you reason about this? why we treat `Alias` differently here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15880
  
**[Test build #70797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70797/testReport)**
 for PR 15880 at commit 
[`821cca6`](https://github.com/apache/spark/commit/821cca6cd836f11ea917c89938f288f126d633ab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16255: [SPARK-18609][SQL]Fix when CTE with Join between ...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16255#discussion_r94361709
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] 
{
 case plan: Project if plan eq proj => plan.child
 case plan => plan transformExpressions {
   case a: Attribute if attrMap.contains(a) => attrMap(a)
+  case b: Alias if attrMap.exists(_._1.exprId == b.exprId)
+&& b.child.isInstanceOf[NamedExpression] => b.child
--- End diff --

How do you reason about this? why we treat `Alias` differently here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16448: [SPARK-19048] [SQL] Delete Partition Location when Dropp...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16448
  
**[Test build #70796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70796/testReport)**
 for PR 16448 at commit 
[`5441f15`](https://github.com/apache/spark/commit/5441f15cc86f0f22dbe766d3bf553a5f8183dc2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16320
  
I assumed this one. Right?
```scala
val path = "/tmp/test1"
Seq(s"${Long.MaxValue}1", "2015-12-01 00:00:00", 
"1").toDF().coalesce(1).write.text(path)
spark.read.option("inferSchema", true).csv(path).printSchema()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94361604
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
+if (columnName.contains(".")) {
+  throw new ParseException(
+"DESC TABLE COLUMN for an inner column of a nested type is not 
supported", ctx)
--- End diff --

Sure, you can try it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15880
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16320
  
Please add the test case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r94360417
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/simple-in.sql.out
 ---
@@ -0,0 +1,176 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 10
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
+  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  ("t2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:01:00.000', date '2014-04-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 
01:01:00.000', date '2016-05-04'),
+  ("t1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 
01:01:00.000', null),
+  ("t2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("t1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-05'),
+  ("t1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:01:00.000', date '2014-09-04'),
+  ("t1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:01:00.000', date '2014-10-04'),
+  ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', null)
+  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+  ("t3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:02:00.000', date '2014-04-04'),
+  ("t3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:02:00.000', date '2014-06-04'),
+  ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:02:00.000', date '2014-07-04'),
+  ("t3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:02:00.000', date '2014-08-04'),
+  ("t3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:02:00.000', date '2014-09-05'),
+  ("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:02:00.000', null),
+  ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 
01:02:00.000', null),
+  ("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:02:00.000', date '2015-05-04')
+  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3

[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94360355
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

You just want one column. Then, you can do it by
```Scala
Seq(Tuple1("a")).toDF("value").registerTempTable("tbl")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94360173
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

How about the following line?
```Scala
Seq((1, "a")).toDF("key", "value").registerTempTable("tbl")
```
BTW, I am Xiao Li. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2017-01-02 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r94360158
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/simple-in.sql.out
 ---
@@ -0,0 +1,176 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 10
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
+  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  ("t2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:01:00.000', date '2014-04-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 
01:01:00.000', date '2016-05-04'),
+  ("t1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 
01:01:00.000', null),
+  ("t2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("t1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-05'),
+  ("t1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:01:00.000', date '2014-09-04'),
+  ("t1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:01:00.000', date '2014-10-04'),
+  ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', null)
+  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+  ("t3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:02:00.000', date '2014-04-04'),
+  ("t3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:02:00.000', date '2014-06-04'),
+  ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:02:00.000', date '2014-07-04'),
+  ("t3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:02:00.000', date '2014-08-04'),
+  ("t3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:02:00.000', date '2014-09-05'),
+  ("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:02:00.000', null),
+  ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 
01:02:00.000', null),
+  ("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:02:00.000', date '2015-05-04')
+  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2017-01-02 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r94359988
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/simple-in.sql.out
 ---
@@ -0,0 +1,176 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 10
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
--- End diff --

Sorry, I forgot to mention that I have made two changes in the data, we 
need to re-run the the db2 verification test. 
1. I remove the "=" at "date("2014-05-0=4")"
2. I changed the the t1g/t2g/t3g from 26 to 2600(26E2)
thanks for checking.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread merlintang

Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94359244
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

thanks Tao, I have created a dataframe, then create registerTempTable as 
following.
 
 val df = sqlContext.createDataFrame((1 to 2).map(i => (i, 
"a"))).toDF("key", "value")
 df.select("value").repartition(1).registerTempTable("tbl")

it can work, but it looks like fuzzy. what do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16401
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16456: [SPARK-18994] clean up the local directories for applica...

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16456
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16456: [SPARK-18994] clean up the local directories for ...

2017-01-02 Thread liujianhuiouc

GitHub user liujianhuiouc opened a pull request:

https://github.com/apache/spark/pull/16456

[SPARK-18994] clean up the local directories for application in future by 
annother thread

## What changes were proposed in this pull request?

clean up the directories of the app by asynchronous method in future block

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liujianhuiouc/spark-1 spark-18994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16456


commit 0351f2c5b875ed2da6a17cdff4ac690cf145bb6b
Author: liujianhui 
Date:   2017-01-03T04:21:25Z

[spark-18994] asyn to delete the app directories




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16401


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70794/testReport)**
 for PR 15324 at commit 
[`cd4e68e`](https://github.com/apache/spark/commit/cd4e68ebd541734b96aba5c8199e4dd4f4504918).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70794/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu

Github user mpjlu closed the pull request at:

https://github.com/apache/spark/pull/16452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/16452
  
@sethah , thanks, I got it wrong. I will close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16320
  
Thank you again, @cloud-fan and @HyukjinKwon . I updated the fallback 
datatype.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16320
  
**[Test build #70795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70795/testReport)**
 for PR 16320 at commit 
[`393d3a9`](https://github.com/apache/spark/commit/393d3a9ceaa6d92a301b5a2917e28d29518c1638).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...

2017-01-02 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16320#discussion_r94358461
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -85,7 +85,9 @@ private[csv] object CSVInferSchema {
 case NullType => tryParseInteger(field, options)
 case IntegerType => tryParseInteger(field, options)
 case LongType => tryParseLong(field, options)
-case _: DecimalType => tryParseDecimal(field, options)
+case _: DecimalType =>
+  // DecimalTypes have different precisions and scales, so we try 
to find the common type.
+  findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(NullType)
--- End diff --

You're correct. I'll change into `StringType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...

2017-01-02 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16320#discussion_r94358447
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -85,7 +85,9 @@ private[csv] object CSVInferSchema {
 case NullType => tryParseInteger(field, options)
 case IntegerType => tryParseInteger(field, options)
 case LongType => tryParseLong(field, options)
-case _: DecimalType => tryParseDecimal(field, options)
+case _: DecimalType =>
+  // DecimalTypes have different precisions and scales, so we try 
to find the common type.
+  findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(NullType)
--- End diff --

Yes, otherwise, it might end up with an incorrect datatypes. For example,

```scala
val path = "/tmp/test1"
Seq(s"${Long.MaxValue}1", "2015-12-01 00:00:00", 
"1").toDF().coalesce(1).write.text(path)
spark.read.option("inferSchema", true).csv(path).printSchema()
```

```
root
 |-- _c0: integer (nullable = true)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16454
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70793/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...

2017-01-02 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16320#discussion_r94358365
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -85,7 +85,9 @@ private[csv] object CSVInferSchema {
 case NullType => tryParseInteger(field, options)
 case IntegerType => tryParseInteger(field, options)
 case LongType => tryParseLong(field, options)
-case _: DecimalType => tryParseDecimal(field, options)
+case _: DecimalType =>
+  // DecimalTypes have different precisions and scales, so we try 
to find the common type.
+  findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(NullType)
--- End diff --

Thank you for review, @cloud-fan .  I used `NullType` since `mergeRowTypes` 
does.

```scala
  def mergeRowTypes(first: Array[DataType], second: Array[DataType]): 
Array[DataType] = {
first.zipAll(second, NullType, NullType).map { case (a, b) =>
  findTightestCommonType(a, b).getOrElse(NullType)
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16454
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16454
  
**[Test build #70793 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70793/testReport)**
 for PR 16454 at commit 
[`80bba5e`](https://github.com/apache/spark/commit/80bba5ead0601f3ef4b05fff5391d07a61e06341).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16453
  
**[Test build #70792 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70792/testReport)**
 for PR 16453 at commit 
[`4937b7d`](https://github.com/apache/spark/commit/4937b7dd731893ec4345a57db952cc8a35efd9b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16453
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70792/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16453
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16452
  
@mpjlu This is the behavior I get:

scala
scala> import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.classification.LogisticRegression

scala> val lr = new LogisticRegression()
lr: org.apache.spark.ml.classification.LogisticRegression = 
logreg_2465e281c48e

scala> lr.getThresholds
java.util.NoSuchElementException: Failed to find a default value for 
thresholds
...

So, it throws an exception when nothing is set, as intended it seems.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16438: [SPARK-19029] [SQL] Remove databaseName from Simp...

2017-01-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16438


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2017-01-02 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15923#discussion_r94358078
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -305,40 +316,84 @@ final class ShuffleBlockFetcherIterator(
*/
   override def next(): (BlockId, InputStream) = {
 numBlocksProcessed += 1
-val startFetchWait = System.currentTimeMillis()
-currentResult = results.take()
-val result = currentResult
-val stopFetchWait = System.currentTimeMillis()
-shuffleMetrics.incFetchWaitTime(stopFetchWait - startFetchWait)
-
-result match {
-  case SuccessFetchResult(_, address, size, buf, isNetworkReqDone) =>
-if (address != blockManager.blockManagerId) {
-  shuffleMetrics.incRemoteBytesRead(buf.size)
-  shuffleMetrics.incRemoteBlocksFetched(1)
-}
-bytesInFlight -= size
-if (isNetworkReqDone) {
-  reqsInFlight -= 1
-  logDebug("Number of requests in flight " + reqsInFlight)
-}
-  case _ =>
-}
-// Send fetch requests up to maxBytesInFlight
-fetchUpToMaxBytes()
 
-result match {
-  case FailureFetchResult(blockId, address, e) =>
-throwFetchFailedException(blockId, address, e)
+var result: FetchResult = null
+var input: InputStream = null
+// Take the next fetched result and try to decompress it to detect 
data corruption,
+// then fetch it one more time if it's corrupt, throw 
FailureFetchResult if the second fetch
--- End diff --

@Tagar Spark doesn't use Netty's Snappy compression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16438: [SPARK-19029] [SQL] Remove databaseName from SimpleCatal...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16438
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-01-02 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94357987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
+if (columnName.contains(".")) {
+  throw new ParseException(
+"DESC TABLE COLUMN for an inner column of a nested type is not 
supported", ctx)
--- End diff --

In this case, `formatted` becomes table identifier. Should I postpone 
detection of nested column to `run()` method of DescColumnCommand? Then the 
existence of table idenfifier will be checked first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16455: [MINOR][DOCS] Remove the duplicated word/ typo in Stream...

2017-01-02 Thread neurons

Github user neurons commented on the issue:

https://github.com/apache/spark/pull/16455
  
@tdas could you accept this small PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16320#discussion_r94357821
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -85,7 +85,9 @@ private[csv] object CSVInferSchema {
 case NullType => tryParseInteger(field, options)
 case IntegerType => tryParseInteger(field, options)
 case LongType => tryParseLong(field, options)
-case _: DecimalType => tryParseDecimal(field, options)
+case _: DecimalType =>
+  // DecimalTypes have different precisions and scales, so we try 
to find the common type.
+  findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(NullType)
--- End diff --

Looks like the fallback policy here is to use `StringType`, shoud we follow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16455: [MINOR][DOCS] Remove the duplicated word/ typo in Stream...

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16455
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16401
  
@cloud-fan In the current stage, we have Filter, Agg, Join, Project, etc. 
Although there are only four plans, the `if` code is still repeated. Moreover, 
in the future, when we have other kind of statistics, we can support more 
plans, e.g. union, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70794/testReport)**
 for PR 15324 at commit 
[`cd4e68e`](https://github.com/apache/spark/commit/cd4e68ebd541734b96aba5c8199e4dd4f4504918).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16455: [MINOR][DOCS] Remove the duplicated word/ typo in...

2017-01-02 Thread neurons

GitHub user neurons opened a pull request:

https://github.com/apache/spark/pull/16455

[MINOR][DOCS] Remove the duplicated word/ typo in Streaming Docs

## What changes were proposed in this pull request?
In the section **Handling Late Data and Watermarking** in Structured 
Streaming Programming Guide, the word received occurs twice in row. Fixed this 
typo. 

## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/neurons/spark np.structure_streaming_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16455


commit 1ccbd79d18ff9c5914c1acda64dce7338a86670f
Author: Niranjan Padmanabhan 
Date:   2017-01-03T03:41:21Z

Remove the duplicated word




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16454
  
**[Test build #70793 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70793/testReport)**
 for PR 16454 at commit 
[`80bba5e`](https://github.com/apache/spark/commit/80bba5ead0601f3ef4b05fff5391d07a61e06341).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...

2017-01-02 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16119
  
The test failure is caused by another issue. I've submitted another PR to 
fix it: #16454. Once that is fixed, this test can be passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-02 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16454

[SPARK-19055][SQL][PySpark] Fix SparkSession initialization when 
SparkContext is stopped

## What changes were proposed in this pull request?



In SparkSession initialization, we store created the instance of 
SparkSession into a class variable _instantiatedContext. Next time we can use 
SparkSession.builder.getOrCreate() to retrieve the existing SparkSession 
instance.

However, when the active SparkContext is stopped and we create another new 
SparkContext to use, the existing SparkSession is still associated with the 
stopped SparkContext. So the operations with this existing SparkSession will be 
failed.

We need to detect such case in SparkSession and renew the class variable 
_instantiatedContext if needed.

## How was this patch tested?

New test added in PySpark.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 fix-pyspark-sparksession

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16454


commit 80bba5ead0601f3ef4b05fff5391d07a61e06341
Author: Liang-Chi Hsieh 
Date:   2017-01-03T03:06:21Z

Fix SparkSession initialization when previous SparkContext is stopped and 
new SparkContext is created.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16448: [SPARK-19048] [SQL] Delete Partition Location when Dropp...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16448
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16448: [SPARK-19048] [SQL] Delete Partition Location whe...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16448#discussion_r94357410
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -346,6 +346,46 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
 assert(new Path(partitionLocation) == defaultPartitionLocation)
   }
 
+  test("create/drop partitions in managed tables with location") {
+val catalog = newBasicCatalog()
+val table = CatalogTable(
+  identifier = TableIdentifier("tbl", Some("db1")),
+  tableType = CatalogTableType.MANAGED,
+  storage = CatalogStorageFormat(None, None, None, None, false, 
Map.empty),
--- End diff --

nit: `CatalogStorageFormat.empty`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16404
  
Found a bug filed in a JIRA 
https://issues.apache.org/jira/browse/SPARK-19035. This PR does not resolves 
it.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16401
  
> Then we need to modify all the existing implementation of statistics and 
do if(cboEnabled) test in each of them. That would be tedious.

hm? I think we only need to do `if(cboEnabled)` for a few operators that 
will estimate statistics, e.g. Filter, Aggregate, Join, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16437: [SPARK-19028] [SQL] Fixed non-thread-safe functions used...

2017-01-02 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16437
  
@gatorsmile it conflicts with branch 2.0, please send a new PR, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16452
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16452
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70790/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16452
  
**[Test build #70790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70790/testReport)**
 for PR 16452 at commit 
[`eece313`](https://github.com/apache/spark/commit/eece313b9fe7048f2e9aa260d0e5f183529bac65).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16453
  
**[Test build #70792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70792/testReport)**
 for PR 16453 at commit 
[`4937b7d`](https://github.com/apache/spark/commit/4937b7dd731893ec4345a57db952cc8a35efd9b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-02 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/16453

[SPARK-19054][ML] Eliminate extra pass in NB

## What changes were proposed in this pull request?
eliminate unnecessary extra pass in NB's train

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark nb_getNC

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16453


commit 4937b7dd731893ec4345a57db952cc8a35efd9b2
Author: Zheng RuiFeng 
Date:   2017-01-03T02:52:53Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15829: [SPARK-18379][SQL] Make the parallelism of parallelParti...

2017-01-02 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15829
  
Sure. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/16452
  
If both threshold and thresholds are not set, the master will return 
thresholds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16452
  
What is not right? Could you be more specific? The behavior for master 
branch seems to align with the comments, but maybe I'm missing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70791/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70791 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70791/testReport)**
 for PR 15324 at commit 
[`a59c558`](https://github.com/apache/spark/commit/a59c558625ad6f640a5d417c97770e55f4583e14).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2017-01-02 Thread Tagar

Github user Tagar commented on a diff in the pull request:

https://github.com/apache/spark/pull/15923#discussion_r94356127
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -305,40 +316,84 @@ final class ShuffleBlockFetcherIterator(
*/
   override def next(): (BlockId, InputStream) = {
 numBlocksProcessed += 1
-val startFetchWait = System.currentTimeMillis()
-currentResult = results.take()
-val result = currentResult
-val stopFetchWait = System.currentTimeMillis()
-shuffleMetrics.incFetchWaitTime(stopFetchWait - startFetchWait)
-
-result match {
-  case SuccessFetchResult(_, address, size, buf, isNetworkReqDone) =>
-if (address != blockManager.blockManagerId) {
-  shuffleMetrics.incRemoteBytesRead(buf.size)
-  shuffleMetrics.incRemoteBlocksFetched(1)
-}
-bytesInFlight -= size
-if (isNetworkReqDone) {
-  reqsInFlight -= 1
-  logDebug("Number of requests in flight " + reqsInFlight)
-}
-  case _ =>
-}
-// Send fetch requests up to maxBytesInFlight
-fetchUpToMaxBytes()
 
-result match {
-  case FailureFetchResult(blockId, address, e) =>
-throwFetchFailedException(blockId, address, e)
+var result: FetchResult = null
+var input: InputStream = null
+// Take the next fetched result and try to decompress it to detect 
data corruption,
+// then fetch it one more time if it's corrupt, throw 
FailureFetchResult if the second fetch
--- End diff --

Is netty/shuffle data being compressed using Snappy algorithm by default? 
If so, might be good to idea to enable checksum checking at Netty level too?


https://netty.io/4.0/api/io/netty/handler/codec/compression/SnappyFramedDecoder.html

> Note that by default, validation of the checksum header in each chunk is 
DISABLED for performance improvements. If performance is less of an issue, or 
if you would prefer the safety that checksum validation brings, please use the 
SnappyFramedDecoder(boolean) constructor with the argument set to true.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #70791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70791/testReport)**
 for PR 15324 at commit 
[`a59c558`](https://github.com/apache/spark/commit/a59c558625ad6f640a5d417c97770e55f4583e14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 180 matches

Mail list logo