date:20160720

[GitHub] spark issue #14296: [SPARK-16639][SQL] The query with having condition that ...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14296
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62655/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14296: [SPARK-16639][SQL] The query with having condition that ...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14296
  
**[Test build #62655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62655/consoleFull)**
 for PR 14296 at commit 
[`5704709`](https://github.com/apache/spark/commit/5704709fcddf90aa62810d82e4546eadd274e630).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14296: [SPARK-16639][SQL] The query with having condition that ...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14296
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14294
  
**[Test build #62657 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62657/consoleFull)**
 for PR 14294 at commit 
[`abbcb4e`](https://github.com/apache/spark/commit/abbcb4e9884c7aafcb0705a7b9d0b80988d8a8c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14294
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14294
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14294
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62653/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14294
  
**[Test build #62653 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62653/consoleFull)**
 for PR 14294 at commit 
[`abbcb4e`](https://github.com/apache/spark/commit/abbcb4e9884c7aafcb0705a7b9d0b80988d8a8c7).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71649272
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

After re-checking the wholestage code, I find that is not correct. If any 
expression in a sub plan will fall back, then the sub plan will not be 
wholestage codegen. The whole plan can be wholestage codegen, but for these no 
wholestage codegen sub plans, they will be wrapped with input adapter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14210
  
cc @cloud-fan How about this simple fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14277
  
**[Test build #62656 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62656/consoleFull)**
 for PR 14277 at commit 
[`264e9e5`](https://github.com/apache/spark/commit/264e9e56709ea7cded609fa1c36c973f442fd793).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71648748
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

Let me do a simple benchmark for this. Actually I think it is no harm to 
have codegen for this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71648516
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

ok. `CodegenFallback` does it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71648156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

no, expressions can always codegen, by default we just pass the expression 
reference into generated java code and call its `eval` method. So it's only 
about performance for the expression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r71647958
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1202,11 +1203,16 @@ class Analyzer(
   if (resolvedOperator.resolved) {
 // Try to replace all aggregate expressions in the filter by 
an alias.
 val aggregateExpressions = ArrayBuffer.empty[NamedExpression]
-val transformedAggregateFilter = 
resolvedAggregateFilter.transform {
+val transformedAggregateFilter = 
resolvedAggregateFilter.transformDown {
   case ae: AggregateExpression =>
 val alias = Alias(ae, ae.toString)()
 aggregateExpressions += alias
 alias.toAttribute
+  case ne: NamedExpression => ne
+  case e: Expression =>
+val alias = Alias(e, e.toString)()
+aggregateExpressions += alias
+alias.toAttribute
--- End diff --

looks like we are blindly pushing all expression into `Aggregate`, how 
about we just push the whole filter condition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14296: [SPARK-16639][SQL] The query with having condition that ...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14296
  
**[Test build #62655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62655/consoleFull)**
 for PR 14296 at commit 
[`5704709`](https://github.com/apache/spark/commit/5704709fcddf90aa62810d82e4546eadd274e630).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-20 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/14296

[SPARK-16639][SQL] The query with having condition that contains grouping 
by column should work

## What changes were proposed in this pull request?

The query with having condition that contains grouping by column will be 
failed during analysis. E.g.,

create table tbl(a int, b string);
select count(b) from tbl group by a + 1 having a + 1 = 2;

Having condition should be able to use grouping by column.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 having-contains-grouping-column

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14296


commit 5704709fcddf90aa62810d82e4546eadd274e630
Author: Liang-Chi Hsieh 
Date:   2016-07-21T05:03:01Z

The query with having condition that contains grouping by column should 
work.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14295: [SPARK-16648][SQL] Overrides TreeNode.withNewChildren in...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14295
  
**[Test build #62654 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62654/consoleFull)**
 for PR 14295 at commit 
[`efbed91`](https://github.com/apache/spark/commit/efbed9110b771a4f506292f73d4f0e6cb77e52d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14295: [SPARK-16648][SQL] Overrides TreeNode.withNewChil...

2016-07-20 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/14295

[SPARK-16648][SQL] Overrides TreeNode.withNewChildren in Last

## What changes were proposed in this pull request?

Default `TreeNode.withNewChildren` implementation doesn't work for `Last` 
when both constructor arguments are the same, e.g.:

```sql
LAST_VALUE(FALSE) // The 2nd argument defaults to FALSE
LAST_VALUE(FALSE, FALSE)
LAST_VALUE(TRUE, TRUE)
```

This is because although `Last` is a unary expression, both of its 
constructor arguments are `Expression`s. When they have the same value, 
`TreeNode.withNewChildren` treats both of them as child nodes by mistake.

## How was this patch tested?

New test case added in `WindowQuerySuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-16648-last-value

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14295.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14295


commit efbed9110b771a4f506292f73d4f0e6cb77e52d0
Author: Cheng Lian 
Date:   2016-07-21T04:48:16Z

Overrides TreeNode.withNewChildren in Last




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71646763
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

My main concern is not performance improvement for elt function. As you 
know, if any expression in a plan doesn't have codegen, the plan can't be 
wholestage codegen. So I think it is better to have as many codegen expressions 
as we can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13275: [SPARK-15499][PySpark][Tests] Add python testsuit...

2016-07-20 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at:

https://github.com/apache/spark/pull/13275


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71646550
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

a `switch` or `if` should work, or we can just fallback to `eval`. Can you 
benchmark how much speedup we can get by codegen for this expression? If it's 
only a little, I think it's ok to not have codegen here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71646312
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

I like your point! Your concern is valid. We should be careful when making 
an assumption on the external data source implementation! : ) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14293: [GIT] add pydev & Rstudio project file to gitignore list

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14293
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14293: [GIT] add pydev & Rstudio project file to gitignore list

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14293
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62650/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14293: [GIT] add pydev & Rstudio project file to gitignore list

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14293
  
**[Test build #62650 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62650/consoleFull)**
 for PR 14293 at commit 
[`da5a4a2`](https://github.com/apache/spark/commit/da5a4a2754a5d108c753dc23d91a66fe9013668e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71644651
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

so we are assuming the `PrunedScan.buildScan` can handle duplicated 
columns? I feel it's dangerous to assume that as we didn't document this 
semantic, the external implementation may break it.

For the hive one, it's internal stuff and we are safe to make this 
assumption. If we do wanna make them consistent, I'd like to change the hive 
one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14294
  
**[Test build #62653 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62653/consoleFull)**
 for PR 14294 at commit 
[`abbcb4e`](https://github.com/apache/spark/commit/abbcb4e9884c7aafcb0705a7b9d0b80988d8a8c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14294
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14294
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62652/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14294
  
**[Test build #62652 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62652/consoleFull)**
 for PR 14294 at commit 
[`9730afb`](https://github.com/apache/spark/commit/9730afbc2a303b4961474eb8d21b709a2cd7d596).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71643068
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

After this PR, Data Source Table Scan will return two columns too, just 
like what Hive Table Scan does. This is also shown in the the [test 
case](https://github.com/gatorsmile/spark/blob/ba7dcff8829c379666c417b87fc1e5022ed93141/sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala#L117)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62647/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62647 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62647/consoleFull)**
 for PR 14207 at commit 
[`264ad35`](https://github.com/apache/spark/commit/264ad35a1a749e14f8d8a33e4977cddda0916204).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14289
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14289
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62646/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14289
  
**[Test build #62646 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62646/consoleFull)**
 for PR 14289 at commit 
[`a3fce03`](https://github.com/apache/spark/commit/a3fce03aba6e7f0faf56706714da09ae942e8fb1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14098
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62651/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14098
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14098
  
**[Test build #62651 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62651/consoleFull)**
 for PR 14098 at commit 
[`8563ecb`](https://github.com/apache/spark/commit/8563ecb7d980cde7ccb515ed2e75da929d233568).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14035
  
hm.. I can close if it looks inappropriate or it seems making a lot of 
conflicts across PRs. Could you give some feedback please @mengxr and 
@yanboliang ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14117: [SPARK-16461][SQL] Support partition batch pruning with ...

2016-07-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14117
  
Could you please take a look @rxin ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14294
  
**[Test build #62652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62652/consoleFull)**
 for PR 14294 at commit 
[`9730afb`](https://github.com/apache/spark/commit/9730afbc2a303b4961474eb8d21b709a2cd7d596).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-20 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/14098
  
@liancheng I addressed all your comments. Except: 1). 2-spaces indents; I 
tried it, but it failed on python style tests. So I leave it 4-spaces indents; 
2) `col('...')` I haven't changed it yet. Do you have a finial decision on this 
part? 

It is ready to review now.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't accept num...

2016-07-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14294
  
@liancheng and @davies Will this change be appropriate? Could you please 
take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14174
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62643/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14174
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14294: [SPARK-16646][SQL] LEAST and GREATEST doesn't acc...

2016-07-20 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14294

[SPARK-16646][SQL] LEAST and GREATEST doesn't accept numeric arguments with 
different data types

## What changes were proposed in this pull request?

This PR makes `LEAST` and `GREATEST` accept other numeric types 
(`Decimal`). This worked fine in `HiveContext` in 1.6 because , for example, 
`1.5` was `DoubleType` in there.

So, this finds a tightest common type from both `IntegerType` and 
`DoubleType` as `DoubleType`, then it works okay.

But currently it seems `1.5` is being treated as `DecimalType` for now. So, 
It casts 1.5 as decimal(2, 1). So, it fails to find a tightest common type from 
both IntegerType and DecimalType(2, 1).

This PR introduces new function `findTightestCommonTypeToDecimal` dealing 
with decimals interacting with each other or with primitive types. This logic 
was borrowed from JSON schema inference.

## How was this patch tested?

Unit tests in `TypeCoercionSuite` and `DataFrameFunctionsSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16646

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14294


commit 9730afbc2a303b4961474eb8d21b709a2cd7d596
Author: hyukjinkwon 
Date:   2016-07-21T02:05:53Z

LEAST and GREATEST doesn't accept numeric arguments with different data 
types




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14174
  
**[Test build #62643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62643/consoleFull)**
 for PR 14174 at commit 
[`79ca51a`](https://github.com/apache/spark/commit/79ca51a1ba25f0fcf11d7181c276399dd09334f8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   |public class $generatedClassName `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14098
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14098
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62649/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14098
  
**[Test build #62651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62651/consoleFull)**
 for PR 14098 at commit 
[`8563ecb`](https://github.com/apache/spark/commit/8563ecb7d980cde7ccb515ed2e75da929d233568).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14098
  
**[Test build #62649 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62649/consoleFull)**
 for PR 14098 at commit 
[`95b16f5`](https://github.com/apache/spark/commit/95b16f5bba48455af738a364861cba7ba56a791a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14259: [SPARK-16622][SQL] Fix NullPointerException when ...

2016-07-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14259#discussion_r71638351
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -166,11 +185,11 @@ case class Invoke(
 } else {
   ""
 }
-
 val code = s"""
   ${obj.code}
   ${argGen.map(_.code).mkString("\n")}
   $setIsNull
+  $callFunc
--- End diff --

uh. Looks good. Let me update this. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14277#discussion_r71638204
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression])
   }
 }
   }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val index = children.head.map(_.genCode(ctx))(0)
+val strings = children.tail.map(_.genCode(ctx))
+val stringValues = strings.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}.mkString(", ")
+val indexVal = ctx.freshName("index")
+val stringArray = ctx.freshName("strings");
+
+ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
+  int $indexVal = ${index.value} - 1;
+  UTF8String[] $stringArray = {$stringValues};
--- End diff --

Because the index is only evaluated runtime, I think we can' just evaluate 
the string we need? Or we can wrap these string expressions in an if/else if or 
switch block, then we can only evaluate the string expression specified by the 
index. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14292
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62644/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14279: [SPARK-16216][SQL] Write Timestamp and Date in ISO 8601 ...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14279
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14279: [SPARK-16216][SQL] Write Timestamp and Date in ISO 8601 ...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14279
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62642/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...

2016-07-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14222
  
@rxin Any thing I need to update for this? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14292
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14279: [SPARK-16216][SQL] Write Timestamp and Date in ISO 8601 ...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14279
  
**[Test build #62642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62642/consoleFull)**
 for PR 14279 at commit 
[`f80dd53`](https://github.com/apache/spark/commit/f80dd530a986044f59f4d39344dd76c80ca60467).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14292
  
**[Test build #62644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62644/consoleFull)**
 for PR 14292 at commit 
[`d64e0c1`](https://github.com/apache/spark/commit/d64e0c15bbda3c32bff4947b04f386dea9e73515).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14098
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14098
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62648/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14098
  
**[Test build #62648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62648/consoleFull)**
 for PR 14098 at commit 
[`ac47d8d`](https://github.com/apache/spark/commit/ac47d8d4ba2e80b4e5fdac9f586a7ccf583fab90).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-20 Thread sun-rui

Github user sun-rui commented on the issue:

https://github.com/apache/spark/pull/14264
  
@rerngvit, sorry, I mean https://issues.apache.org/jira/browse/SPARK-11977. 
If your PR can enable accesses to columns with "." in their names without 
backticks, please first submit a PR for SPARK-11977, as the change is for the 
Spark Core, not SparkR specific. After that PR gets merged, you can then submit 
a PR for SPARK-11976 which contains SparkR only changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14293: [GIT] add pydev & Rstudio project file to gitignore list

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14293
  
**[Test build #62650 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62650/consoleFull)**
 for PR 14293 at commit 
[`da5a4a2`](https://github.com/apache/spark/commit/da5a4a2754a5d108c753dc23d91a66fe9013668e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14293: [GIT] add pydev & Rstudio project file to gitigno...

2016-07-20 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/14293

[GIT] add pydev & Rstudio project file to gitignore list

## What changes were proposed in this pull request?

Add Pydev & Rstudio project file to gitignore list, I think the two IEDs 
are used by many developers.
so that won't need personal gitignore_global config.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark update_gitignore

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14293


commit da5a4a2754a5d108c753dc23d91a66fe9013668e
Author: WeichenXu 
Date:   2016-07-14T17:52:14Z

add pydev & Rstudio project file to gitignore list




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/14292
  
test this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14098
  
**[Test build #62649 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62649/consoleFull)**
 for PR 14098 at commit 
[`95b16f5`](https://github.com/apache/spark/commit/95b16f5bba48455af738a364861cba7ba56a791a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14098
  
**[Test build #62648 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62648/consoleFull)**
 for PR 14098 at commit 
[`ac47d8d`](https://github.com/apache/spark/commit/ac47d8d4ba2e80b4e5fdac9f586a7ccf583fab90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14281: [SPARK-16644][SQL] Aggregate should not propagate...

2016-07-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14281


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14281: [SPARK-16644][SQL] Aggregate should not propagate constr...

2016-07-20 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14281
  
Thanks. I am merging this to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13868: [SPARK-15899] [SQL] Fix the construction of the f...

2016-07-20 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13868#discussion_r71635375
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -691,7 +692,8 @@ private[sql] class SQLConf extends Serializable with 
CatalystConf with Logging {
   def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH)
 
   def warehousePath: String = {
-getConf(WAREHOUSE_PATH).replace("${system:user.dir}", 
System.getProperty("user.dir"))
+new Path(getConf(WAREHOUSE_PATH).replace("${system:user.dir}",
+  System.getProperty("user.dir"))).toUri.toString
--- End diff --

Is is better to use `URLDecoder.decode()` to decode a URL encoding (e.g. 
`%20`) string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62647 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62647/consoleFull)**
 for PR 14207 at commit 
[`264ad35`](https://github.com/apache/spark/commit/264ad35a1a749e14f8d8a33e4977cddda0916204).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14022: [SPARK-16272][core] Allow config values to refere...

2016-07-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14022


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71634679
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

after your PR, will data source table return only one column for this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...

2016-07-20 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14022
  
Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14292
  
**[Test build #62645 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62645/consoleFull)**
 for PR 14292 at commit 
[`7a3e3fa`](https://github.com/apache/spark/commit/7a3e3fa4332370823e77b68c6281257574cffc8e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14292
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avoiding p...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14292
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62645/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71634358
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

```SQL
SELECT b, b FROM oneToTenPruned
```
For Hive Table Scan, we exclude `ProjectExec` from the physical plan but 
[the Hive table scan returns two duplicate 
columns](https://github.com/apache/spark/blob/865ec32dd997e63aea01a871d1c7b4947f43c111/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala#L93).
 No matter whether `projectSet.size == projects.size` is added or not, the 
result is always right. 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14289
  
**[Test build #62646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62646/consoleFull)**
 for PR 14289 at commit 
[`a3fce03`](https://github.com/apache/spark/commit/a3fce03aba6e7f0faf56706714da09ae942e8fb1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14289
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62641/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14289
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14289: [SPARK-16656] [SQL] Try to make CreateTableAsSelectSuite...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14289
  
**[Test build #62641 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62641/consoleFull)**
 for PR 14289 at commit 
[`184d679`](https://github.com/apache/spark/commit/184d679ab8ff7fbceda76367b5caecd1e3794df8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71633897
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,222 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  private def createDataSourceTable(
+  path: File,
+  userSpecifiedSchema: Option[String],
+  userSpecifiedPartitionCols: Option[String]): (StructType, 
Seq[String]) = {
+var tableSchema = StructType(Nil)
+var partCols = Seq.empty[String]
+
+val tabName = "tab1"
+withTable(tabName) {
+  val partitionClause =
+userSpecifiedPartitionCols.map(p => s"PARTITIONED BY 
($p)").getOrElse("")
+  val schemaClause = userSpecifiedSchema.map(s => 
s"($s)").getOrElse("")
+  sql(
+s"""
+   |CREATE TABLE $tabName $schemaClause
+   |USING parquet
+   |OPTIONS (
+   |  path '$path'
+   |)
+   |$partitionClause
+ """.stripMargin)
+  val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
+
+  tableSchema = DDLUtils.getSchemaFromTableProperties(tableMetadata)
+  partCols = 
DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata)
+}
+(tableSchema, partCols)
+  }
+
+  test("Create partitioned data source table without user specified 
schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: 
Option("inexistentColumns")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("inexistentColumns"), None).foreach { partitionCols =>
+  withTempPath { pathToPartitionedTable =>
+df.write.format("parquet").partitionBy("num")
+  .save(pathToPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToPartitionedTable,
+userSpecifiedSchema = None,
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("str", StringType, nullable = true) ::
+StructField("num", IntegerType, nullable = true) :: Nil))
+assert(partCols == Seq("num"))
+  }
+}
+  }
+
+  test("Create partitioned data source table with user specified schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: Option("num")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("num"), None).foreach { partitionCols =>
+  withTempPath { pathToPartitionedTable =>
+df.write.format("parquet").partitionBy("num")
+  .save(pathToPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToPartitionedTable,
+userSpecifiedSchema = Option("num int, str string"),
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("num", IntegerType, nullable = true) ::
+StructField("str", StringType, nullable = true) :: Nil))
+assert(partCols.mkString(", ") == partitionCols.getOrElse(""))
+  }
+}
+  }
+
+  test("Create non-partitioned data source table without user specified 
schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: 
Option("inexistentColumns")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("inexistentColumns"), None).foreach { partitionCols =>
+  withTempPath { pathToNonPartitionedTable =>
+
df.write.format("parquet").save(pathToNonPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToNonPartitionedTable,
+userSpecifiedSchema = None,
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("num", IntegerType, nullable = true) ::
+StructField("str", StringType, nullable = true) :: Nil))
+assert(partCols.isEmpty)
+  }
+}
+  }
+
+  test("Create non-partitioned data source table with user specified 
schema") {
+import testImplicits._

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14264
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62640/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71633776
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,222 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  private def createDataSourceTable(
+  path: File,
+  userSpecifiedSchema: Option[String],
+  userSpecifiedPartitionCols: Option[String]): (StructType, 
Seq[String]) = {
+var tableSchema = StructType(Nil)
+var partCols = Seq.empty[String]
+
+val tabName = "tab1"
+withTable(tabName) {
+  val partitionClause =
+userSpecifiedPartitionCols.map(p => s"PARTITIONED BY 
($p)").getOrElse("")
+  val schemaClause = userSpecifiedSchema.map(s => 
s"($s)").getOrElse("")
+  sql(
+s"""
+   |CREATE TABLE $tabName $schemaClause
+   |USING parquet
+   |OPTIONS (
+   |  path '$path'
+   |)
+   |$partitionClause
+ """.stripMargin)
+  val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
+
+  tableSchema = DDLUtils.getSchemaFromTableProperties(tableMetadata)
+  partCols = 
DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata)
+}
+(tableSchema, partCols)
+  }
+
+  test("Create partitioned data source table without user specified 
schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: 
Option("inexistentColumns")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("inexistentColumns"), None).foreach { partitionCols =>
+  withTempPath { pathToPartitionedTable =>
+df.write.format("parquet").partitionBy("num")
+  .save(pathToPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToPartitionedTable,
+userSpecifiedSchema = None,
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("str", StringType, nullable = true) ::
--- End diff --

Sure, will do. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14264
  
**[Test build #62640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62640/consoleFull)**
 for PR 14264 at commit 
[`60d0145`](https://github.com/apache/spark/commit/60d014501661d1c400675124c99ac7896a184250).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14264
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14164: [SPARK-16629] Allow comparisons between UDTs and Datatyp...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14164
  
can you add a regression test in your PR? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14207
  
cc @yhuai @liancheng to take another look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71633259
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,222 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  private def createDataSourceTable(
+  path: File,
+  userSpecifiedSchema: Option[String],
+  userSpecifiedPartitionCols: Option[String]): (StructType, 
Seq[String]) = {
+var tableSchema = StructType(Nil)
+var partCols = Seq.empty[String]
+
+val tabName = "tab1"
+withTable(tabName) {
+  val partitionClause =
+userSpecifiedPartitionCols.map(p => s"PARTITIONED BY 
($p)").getOrElse("")
+  val schemaClause = userSpecifiedSchema.map(s => 
s"($s)").getOrElse("")
+  sql(
+s"""
+   |CREATE TABLE $tabName $schemaClause
+   |USING parquet
+   |OPTIONS (
+   |  path '$path'
+   |)
+   |$partitionClause
+ """.stripMargin)
+  val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
+
+  tableSchema = DDLUtils.getSchemaFromTableProperties(tableMetadata)
+  partCols = 
DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata)
+}
+(tableSchema, partCols)
+  }
+
+  test("Create partitioned data source table without user specified 
schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: 
Option("inexistentColumns")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("inexistentColumns"), None).foreach { partitionCols =>
+  withTempPath { pathToPartitionedTable =>
+df.write.format("parquet").partitionBy("num")
+  .save(pathToPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToPartitionedTable,
+userSpecifiedSchema = None,
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("str", StringType, nullable = true) ::
+StructField("num", IntegerType, nullable = true) :: Nil))
+assert(partCols == Seq("num"))
+  }
+}
+  }
+
+  test("Create partitioned data source table with user specified schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: Option("num")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("num"), None).foreach { partitionCols =>
+  withTempPath { pathToPartitionedTable =>
+df.write.format("parquet").partitionBy("num")
+  .save(pathToPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToPartitionedTable,
+userSpecifiedSchema = Option("num int, str string"),
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("num", IntegerType, nullable = true) ::
+StructField("str", StringType, nullable = true) :: Nil))
+assert(partCols.mkString(", ") == partitionCols.getOrElse(""))
+  }
+}
+  }
+
+  test("Create non-partitioned data source table without user specified 
schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: 
Option("inexistentColumns")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("inexistentColumns"), None).foreach { partitionCols =>
+  withTempPath { pathToNonPartitionedTable =>
+
df.write.format("parquet").save(pathToNonPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToNonPartitionedTable,
+userSpecifiedSchema = None,
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("num", IntegerType, nullable = true) ::
+StructField("str", StringType, nullable = true) :: Nil))
+assert(partCols.isEmpty)
+  }
+}
+  }
+
+  test("Create non-partitioned data source table with user specified 
schema") {
+import testImplicits._

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71632699
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,222 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  private def createDataSourceTable(
+  path: File,
+  userSpecifiedSchema: Option[String],
+  userSpecifiedPartitionCols: Option[String]): (StructType, 
Seq[String]) = {
+var tableSchema = StructType(Nil)
+var partCols = Seq.empty[String]
+
+val tabName = "tab1"
+withTable(tabName) {
+  val partitionClause =
+userSpecifiedPartitionCols.map(p => s"PARTITIONED BY 
($p)").getOrElse("")
+  val schemaClause = userSpecifiedSchema.map(s => 
s"($s)").getOrElse("")
+  sql(
+s"""
+   |CREATE TABLE $tabName $schemaClause
+   |USING parquet
+   |OPTIONS (
+   |  path '$path'
+   |)
+   |$partitionClause
+ """.stripMargin)
+  val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
+
+  tableSchema = DDLUtils.getSchemaFromTableProperties(tableMetadata)
+  partCols = 
DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata)
+}
+(tableSchema, partCols)
+  }
+
+  test("Create partitioned data source table without user specified 
schema") {
+import testImplicits._
+val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+
+// Case 1: with partitioning columns but no schema: 
Option("inexistentColumns")
+// Case 2: without schema and partitioning columns: None
+Seq(Option("inexistentColumns"), None).foreach { partitionCols =>
+  withTempPath { pathToPartitionedTable =>
+df.write.format("parquet").partitionBy("num")
+  .save(pathToPartitionedTable.getCanonicalPath)
+val (tableSchema, partCols) =
+  createDataSourceTable(
+pathToPartitionedTable,
+userSpecifiedSchema = None,
+userSpecifiedPartitionCols = partitionCols)
+assert(tableSchema ==
+  StructType(StructField("str", StringType, nullable = true) ::
--- End diff --

nit: `new StructType().add...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14022
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62637/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14022
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...

2016-07-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14022
  
**[Test build #62637 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62637/consoleFull)**
 for PR 14022 at commit 
[`ed5c18b`](https://github.com/apache/spark/commit/ed5c18baddbd7ceb4157f5a31bf150d2ef9e7d19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14210
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 565 matches

Mail list logo