date:20160623

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61151/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13881
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13881
  
**[Test build #61151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61151/consoleFull)**
 for PR 13881 at commit 
[`7fb031e`](https://github.com/apache/spark/commit/7fb031eff488ca657e89220193866af0b39a358a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-23 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13883
  
Thanks - can you describe the bug in the pr description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-23 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/13883
  
https://gist.github.com/vlad17/964c0a93510d79cb130c33700f6139b7


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13884#discussion_r68355286
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -688,6 +688,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {
 case c: Command =>
   stop = true
   c
+// For outer join, although its output attributes are derived from 
its children, they are
+// actually different attributes: the output of outer join is not 
always picked from its
+// children, but can also be null.
+// TODO(cloud-fan): It seems more reasonable to use new attributes 
as the output attributes
+// of outer join.
--- End diff --

Yea, I think we should consider it for 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13884#discussion_r68355194
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1541,4 +1541,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 val df = Seq(1, 1, 2).toDF("column.with.dot")
 checkAnswer(df.distinct(), Row(1) :: Row(2) :: Nil)
   }
+
+  test("SPARK-16181: outer join with isNull filter") {
+val left = Seq("x").toDF("col")
+val right = Seq("y").toDF("col").withColumn("new", lit(true))
+val joined = left.join(right, left("col") === right("col"), 
"left_outer")
+
+checkAnswer(joined, Row("x", null, null))
+checkAnswer(joined.filter($"new".isNull), Row("x", null, null))
--- End diff --

ah, this is subtle. `new` is replaced back to the original `col` from 
right, which is not nullable. Then, `isNull` just returns false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-23 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13883
  
What is the bug?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13881
  
**[Test build #61151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61151/consoleFull)**
 for PR 13881 at commit 
[`7fb031e`](https://github.com/apache/spark/commit/7fb031eff488ca657e89220193866af0b39a358a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13881
  
**[Test build #61150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61150/consoleFull)**
 for PR 13881 at commit 
[`f5a6893`](https://github.com/apache/spark/commit/f5a6893a1314de5f6a33bd6fb912a77a6cb19fa1).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13881
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61150/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13881
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13881
  
**[Test build #61150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61150/consoleFull)**
 for PR 13881 at commit 
[`f5a6893`](https://github.com/apache/spark/commit/f5a6893a1314de5f6a33bd6fb912a77a6cb19fa1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13680
  
having 2 implementations is also kind of a branch: the virtual function 
call need to be dispatched between these 2 implementations, while the only one 
implementation can be marked as final and doesn't have this overhead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-06-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13701#discussion_r68353312
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -85,8 +85,15 @@ private[sql] object FileSourceStrategy extends Strategy 
with Logging {
 
ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet)))
   logInfo(s"Pruning directories with: 
${partitionKeyFilters.mkString(",")}")
 
-  val dataColumns =
-l.resolve(fsRelation.dataSchema, 
fsRelation.sparkSession.sessionState.analyzer.resolver)
+  val dataColumns = l.resolve(fsRelation.dataSchema,
+fsRelation.sparkSession.sessionState.analyzer.resolver).map { c =>
+  fsRelation.dataSchema.find(_.name == c.name).map { f =>
+c match {
+  case a: AttributeReference => a.withMetadata(f.metadata)
+  case _ => c
+}
+  }.getOrElse(c)
+}
--- End diff --

We use metadata in merged schema to mark the optional field (not existing 
in all partitions), the metadata is lost after resolving. If we don't add them 
back, the pushed-down filters will be failed due to non existing field error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...

2016-06-23 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/13844
  
LGTM. Merged into master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13884: [SPARK-16181][SQL: outer join with isNull filter may ret...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13884
  
**[Test build #61149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61149/consoleFull)**
 for PR 13884 at commit 
[`9316d7f`](https://github.com/apache/spark/commit/9316d7f0baec6d59e8a5a88cd872eca3e6720f9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf i...

2016-06-23 Thread davies

GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/13883

[SPARK-16179]  [PYSPARK] fix bugs for Python udf in generate

## What changes were proposed in this pull request?

This PR fix the bug when Python UDF is used in explode (generator), 
GenerateExec requires that all the attributes in expressions should be 
resolvable from children when creating, we should replace the children first, 
then replace it's expressions.

## How was this patch tested?

Added regression tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark udf_in_generate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13883.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13883


commit b9fd4bfb93dea18331987b83336b11f4f1f6e388
Author: Davies Liu 
Date:   2016-06-24T04:43:35Z

fix udf in generate




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13844
  
**[Test build #61145 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61145/consoleFull)**
 for PR 13844 at commit 
[`718023d`](https://github.com/apache/spark/commit/718023d9fa899af580cc45851db7d53c83fe1efa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13884#discussion_r68353553
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -688,6 +688,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {
 case c: Command =>
   stop = true
   c
+// For outer join, although its output attributes are derived from 
its children, they are
+// actually different attributes: the output of outer join is not 
always picked from its
+// children, but can also be null.
+// TODO(cloud-fan): It seems more reasonable to use new attributes 
as the output attributes
+// of outer join.
--- End diff --

cc @marmbrus @yhuai @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...

2016-06-23 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/13884

[SPARK-16181][SQL: outer join with isNull filter may return wrong result

## What changes were proposed in this pull request?

The root cause is: the output attributes of outer join are derived from its 
children, while they are actually different attributes(outer join can return 
null).

We have already added some special logic to handle it, e.g. 
`PushPredicateThroughJoin` won't push down predicates through outer join side, 
`FixNullability`.

This PR adds one more special logic in `FoldablePropagation`.


## How was this patch tested?

new test in `DataFrameSuite`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13884.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13884


commit 9316d7f0baec6d59e8a5a88cd872eca3e6720f9d
Author: Wenchen Fan 
Date:   2016-06-24T04:48:58Z

fix bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13701
  
@yhuai As I mentioned in the description, I am not sure if we can 
manipulate row groups as we want, but I have manually tested it to show the 
actually scanned row numbers. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13844: [SPARK-16133][ML] model loading backward compatib...

2016-06-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13844


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13883
  
**[Test build #61148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61148/consoleFull)**
 for PR 13883 at commit 
[`b9fd4bf`](https://github.com/apache/spark/commit/b9fd4bfb93dea18331987b83336b11f4f1f6e388).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13844
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13844
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61145/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13701
  
Thank you for the testing. Can you also test the case that a file contains 
multiple row groups and we can avoid of scanning unneeded ones?

Also since it is not fixing a critical bug, let's not merge it into 
branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13877: [SPARK-16142] [R] group naiveBayes method docs in...

2016-06-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13877


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13877: [SPARK-16142] [R] group naiveBayes method docs in a sing...

2016-06-23 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/13877
  
Merged into master and branch-2.0. Thanks for reviewing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61144/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13881
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13881
  
**[Test build #61144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61144/consoleFull)**
 for PR 13881 at commit 
[`bd7d24d`](https://github.com/apache/spark/commit/bd7d24d4f5a79eca6ff9629706c254beba74bc45).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13701#discussion_r68352913
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -85,8 +85,15 @@ private[sql] object FileSourceStrategy extends Strategy 
with Logging {
 
ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet)))
   logInfo(s"Pruning directories with: 
${partitionKeyFilters.mkString(",")}")
 
-  val dataColumns =
-l.resolve(fsRelation.dataSchema, 
fsRelation.sparkSession.sessionState.analyzer.resolver)
+  val dataColumns = l.resolve(fsRelation.dataSchema,
+fsRelation.sparkSession.sessionState.analyzer.resolver).map { c =>
+  fsRelation.dataSchema.find(_.name == c.name).map { f =>
+c match {
+  case a: AttributeReference => a.withMetadata(f.metadata)
+  case _ => c
+}
+  }.getOrElse(c)
+}
--- End diff --

I guess a better question is if it is part of the bug fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13701#discussion_r68352884
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -85,8 +85,15 @@ private[sql] object FileSourceStrategy extends Strategy 
with Logging {
 
ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet)))
   logInfo(s"Pruning directories with: 
${partitionKeyFilters.mkString(",")}")
 
-  val dataColumns =
-l.resolve(fsRelation.dataSchema, 
fsRelation.sparkSession.sessionState.analyzer.resolver)
+  val dataColumns = l.resolve(fsRelation.dataSchema,
+fsRelation.sparkSession.sessionState.analyzer.resolver).map { c =>
+  fsRelation.dataSchema.find(_.name == c.name).map { f =>
+c match {
+  case a: AttributeReference => a.withMetadata(f.metadata)
+  case _ => c
+}
+  }.getOrElse(c)
+}
--- End diff --

Do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...

2016-06-23 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13865
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13701
  
**[Test build #61147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61147/consoleFull)**
 for PR 13701 at commit 
[`36fd059`](https://github.com/apache/spark/commit/36fd0596302a4ef7e411c2fe45a279082adaf69a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13680
  
@cloud-fan , for the first issue, we are on the same page. Your proposal is 
what I am thinking about as possible solutions. I will do that.

For the second issue, it seems to be design choice between
1. introduce one conditional branch in ```isNullAt()``` in one 
implementation
2. have two implementations without conditional branch at ```isNullAt()```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13865
  
**[Test build #61146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61146/consoleFull)**
 for PR 13865 at commit 
[`85e0eed`](https://github.com/apache/spark/commit/85e0eedd1d610d5c2cf486a43cda3401df856c33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13865
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13701
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13877: [SPARK-16142] [R] group naiveBayes method docs in a sing...

2016-06-23 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/13877
  
The new document in the screenshot looks pretty good to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352383
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

it's inside `withTable`, tables will be dropped automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352349
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

Yea, sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352336
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

oh, nvm. We have withTable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352322
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

drop the table at the end of this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352282
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

yea, it is a good idea to add comments to explain why this one failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352258
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

Yea, when reading data from a partition, the Avro deserializer needs to 
know the Avro schema defined in the table properties (`avro.schema.literal`). 
However, originally we only initialize the deserializer using the partition 
properties, which doesn't contain `avro.schema.literal`. This PR fixes it by 
merging to sets of properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13701
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13701
  
**[Test build #61143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61143/consoleFull)**
 for PR 13701 at commit 
[`36fd059`](https://github.com/apache/spark/commit/36fd0596302a4ef7e411c2fe45a279082adaf69a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13701
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61143/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13877: [SPARK-16142] [R] group naiveBayes method docs in...

2016-06-23 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/13877#discussion_r68352034
  
--- Diff: R/pkg/R/mllib.R ---
@@ -390,23 +376,41 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
 
-#' Fit a Bernoulli naive Bayes model
+#' Naive Bayes Models
 #'
-#' Fit a Bernoulli naive Bayes model on a Spark DataFrame (only 
categorical data is supported).
+#' \code{spark.naiveBayes} fits a Bernoulli naive Bayes model against a 
SparkDataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#' Only categorical data is supported.
 #'
-#' @param data SparkDataFrame for training
+#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
 #' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
 #' @param smoothing Smoothing parameter
-#' @return a fitted naive Bayes model
+#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
 #' @rdname spark.naiveBayes
+#' @name spark.naiveBayes
 #' @seealso e1071: \url{https://cran.r-project.org/web/packages/e1071/}
--- End diff --

We could use the `\link` tag as discussed in 
http://stackoverflow.com/questions/25489042/linking-to-other-packages-in-documentation-in-roxygen2-in-r


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13844
  
**[Test build #61145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61145/consoleFull)**
 for PR 13844 at commit 
[`718023d`](https://github.com/apache/spark/commit/718023d9fa899af580cc45851db7d53c83fe1efa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...

2016-06-23 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/13844
  
@mengxr Thanks for your review. Sent update for the style issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13881
  
**[Test build #61144 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61144/consoleFull)**
 for PR 13881 at commit 
[`bd7d24d`](https://github.com/apache/spark/commit/bd7d24d4f5a79eca6ff9629706c254beba74bc45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13844: [SPARK-16133][ML] model loading backward compatib...

2016-06-23 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/13844#discussion_r68350637
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -232,7 +233,9 @@ object MinMaxScalerModel extends 
MLReadable[MinMaxScalerModel] {
 override def load(path: String): MinMaxScalerModel = {
   val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
   val dataPath = new Path(path, "data").toString
-  val Row(originalMin: Vector, originalMax: Vector) = 
sparkSession.read.parquet(dataPath)
+  val data = sparkSession.read.parquet(dataPath)
+  val Row(originalMin: Vector, originalMax: Vector) = 
MLUtils.convertVectorColumnsToML(
+data, "originalMin", "originalMax")
--- End diff --

Sorry to miss it. Will update right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-23 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/13881
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...

2016-06-23 Thread bomeng

Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13720
  
ok, i will work on it based on comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13879: [SPARK-16177] [ML] model loading backward compati...

2016-06-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13879


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13844: [SPARK-16133][ML] model loading backward compatib...

2016-06-23 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/13844#discussion_r68350339
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -232,7 +233,9 @@ object MinMaxScalerModel extends 
MLReadable[MinMaxScalerModel] {
 override def load(path: String): MinMaxScalerModel = {
   val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
   val dataPath = new Path(path, "data").toString
-  val Row(originalMin: Vector, originalMax: Vector) = 
sparkSession.read.parquet(dataPath)
+  val data = sparkSession.read.parquet(dataPath)
+  val Row(originalMin: Vector, originalMax: Vector) = 
MLUtils.convertVectorColumnsToML(
+data, "originalMin", "originalMax")
--- End diff --

@hhbyyh Could you fix this style issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13879: [SPARK-16177] [ML] model loading backward compatibility ...

2016-06-23 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/13879
  
LGTM. Merged into master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13701
  
**[Test build #61143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61143/consoleFull)**
 for PR 13701 at commit 
[`36fd059`](https://github.com/apache/spark/commit/36fd0596302a4ef7e411c2fe45a279082adaf69a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13874: [SQL][minor] ParserUtils.operationNotAllowed shou...

2016-06-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13874


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...

2016-06-23 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13874
  
LGTM - thanks! Merging to master/2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13701
  
ping @liancheng @yhuai again...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13874
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61141/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13874
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13874
  
**[Test build #61141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61141/consoleFull)**
 for PR 13874 at commit 
[`ec0506f`](https://github.com/apache/spark/commit/ec0506f5a27c9581857d49cf296e4b0bde76297d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13699: [SPARK-15958] Make initial buffer size for the Sorter co...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13699
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13699: [SPARK-15958] Make initial buffer size for the Sorter co...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13699
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61140/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13699: [SPARK-15958] Make initial buffer size for the Sorter co...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13699
  
**[Test build #61140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61140/consoleFull)**
 for PR 13699 at commit 
[`cf464a3`](https://github.com/apache/spark/commit/cf464a3eae5d2fa86f4946c302b15df9d9ee1a21).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13130
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13130
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61142/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13130
  
**[Test build #61142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61142/consoleFull)**
 for PR 13130 at commit 
[`82d78a3`](https://github.com/apache/spark/commit/82d78a36161167c76aebf313ce9541ce51989948).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-06-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13778
  
ping @vlad17 @davies @liancheng Any thing else?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13130
  
**[Test build #61142 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61142/consoleFull)**
 for PR 13130 at commit 
[`82d78a3`](https://github.com/apache/spark/commit/82d78a36161167c76aebf313ce9541ce51989948).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13130
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-23 Thread Div333

Github user Div333 commented on the issue:

https://github.com/apache/spark/pull/13786
  
Hello everyone,  Thanks a lot for implementing the pivot functionality. I 
have started using SparkR recently and would like to know if the pivot method 
is included in the library as I don't find it in the documentation. Also, I am 
looking to use the unpivot functionality. It would be great if it is included.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-06-23 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/12601
  
Bump @HyukjinKwon I have some comments to your comments. Could you please 
review them and I can push my changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13882: Branch 1.6

2016-06-23 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/13882
  
Please close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...

2016-06-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r68343766
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -522,7 +523,7 @@ case class DescribeTableCommand(table: TableIdentifier, 
isExtended: Boolean, isF
 
   private def describeSchema(schema: Seq[CatalogColumn], buffer: 
ArrayBuffer[Row]): Unit = {
 schema.foreach { column =>
-  append(buffer, column.name, column.dataType.toLowerCase, 
column.comment.orNull)
+  append(buffer, column.name, column.dataType.toLowerCase, 
column.comment.getOrElse(""))
--- End diff --

Yea. If it is null, let's keep it as null. Changing a null to an empty 
string actually destroys the information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13882: Branch 1.6

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13882
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68343609
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

can you explain it a bit more how this query fails without your patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13882: Branch 1.6

2016-06-23 Thread liu549676915

GitHub user liu549676915 opened a pull request:

https://github.com/apache/spark/pull/13882

Branch 1.6

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13882


commit 0afad6678431846a6eebda8d5891da9115884915
Author: RJ Nowling 
Date:   2016-01-05T23:05:04Z

[SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans

SPARK-12450 . Un-persist broadcasted variables in KMeans.

Author: RJ Nowling 

Closes #10415 from rnowling/spark-12450.

(cherry picked from commit 78015a8b7cc316343e302eeed6fe30af9f2961e8)
Signed-off-by: Joseph K. Bradley 

commit bf3dca2df4dd3be264691be1321e0c700d4f4e32
Author: BrianLondon 
Date:   2016-01-05T23:15:07Z

[SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk

Successfully ran kinesis demo on a live, aws hosted kinesis stream against 
master and 1.6 branches.  For reasons I don't entirely understand it required a 
manual merge to 1.5 which I did as shown here: 
https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2

The demo ran successfully on the 1.5 branch as well.

According to `mvn dependency:tree` it is still pulling a fairly old version 
of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis 
regression in 1.5.2.

Author: BrianLondon 

Closes #10492 from BrianLondon/remove-only.

(cherry picked from commit ff89975543b153d0d235c0cac615d45b34aa8fe7)
Signed-off-by: Sean Owen 

commit c3135d02176cdd679b4a0e4883895b9e9f001a55
Author: Yanbo Liang 
Date:   2016-01-06T06:35:41Z

[SPARK-12393][SPARKR] Add read.text and write.text for SparkR

Add ```read.text``` and ```write.text``` for SparkR.
cc sun-rui felixcheung shivaram

Author: Yanbo Liang 

Closes #10348 from yanboliang/spark-12393.

(cherry picked from commit d1fea41363c175a67b97cb7b3fe89f9043708739)
Signed-off-by: Shivaram Venkataraman 

commit 175681914af953b7ce1b2971fef83a2445de1f94
Author: zero323 
Date:   2016-01-06T19:58:33Z

[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

If initial model passed to GMM is not empty it causes 
`net.razorvine.pickle.PickleException`. It can be fixed by converting 
`initialModel.weights` to `list`.

Author: zero323 

Closes #9986 from zero323/SPARK-12006.

(cherry picked from commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04)
Signed-off-by: Joseph K. Bradley 

commit d821fae0ecca6393d3632977797d72ba594d26a9
Author: Shixiong Zhu 
Date:   2016-01-06T20:03:01Z

[SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming

Move Py4jCallbackConnectionCleaner to Streaming because the callback server 
starts only in StreamingContext.

Author: Shixiong Zhu 

Closes #10621 from zsxwing/SPARK-12617-2.

(cherry picked from commit 1e6648d62fb82b708ea54c51cd23bfe4f542856e)
Signed-off-by: Shixiong Zhu 

commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386
Author: huangzhaowei 
Date:   2016-01-06T20:48:57Z

[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default 
root path to gain the streaming batch url.

Author: huangzhaowei 

Closes #10617 from SaintBacchus/SPARK-12672.

commit 39b0a348008b6ab532768b90fd578b77711af98c
Author: Shixiong Zhu 
Date:   2016-01-06T21:53:25Z

Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of 
default root path to gain the streaming batch url."

This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge 
#10618 instead.

commit 11b901b22b1cdaa6d19b1b73885627ac601be275
Author: Liang-Chi Hsieh 
Date:   2015-12-14T17:59:42Z

[SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in 
pyspark

JIRA:

[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13720
  
For the test, currently we only have one `desc table` test in 
`HiveDDLSuite`, It will be good if we can have an individual test suite for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r68342801
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -522,7 +523,7 @@ case class DescribeTableCommand(table: TableIdentifier, 
isExtended: Boolean, isF
 
   private def describeSchema(schema: Seq[CatalogColumn], buffer: 
ArrayBuffer[Row]): Unit = {
 schema.foreach { column =>
-  append(buffer, column.name, column.dataType.toLowerCase, 
column.comment.orNull)
+  append(buffer, column.name, column.dataType.toLowerCase, 
column.comment.getOrElse(""))
--- End diff --

this is a behaviour changing. The result is not only used to display, but 
also used as a table to be queried later. I'm not sure it worth. cc @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13874
  
**[Test build #61141 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61141/consoleFull)**
 for PR 13874 at commit 
[`ec0506f`](https://github.com/apache/spark/commit/ec0506f5a27c9581857d49cf296e4b0bde76297d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13880: SPARK-16178: Remove unnecessary Hive partition check.

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61139/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13880: SPARK-16178: Remove unnecessary Hive partition check.

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13880: SPARK-16178: Remove unnecessary Hive partition check.

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13880
  
**[Test build #61139 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61139/consoleFull)**
 for PR 13880 at commit 
[`919f520`](https://github.com/apache/spark/commit/919f52001f78f9b1de8a0088a3de312dd6447fae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13680
  
@kiszk we should definitely put zero into the corresponding field when set 
null. It will be a little harder than `UnsafeRow`, as we need `setNullBoolean`, 
`setNullInt`, etc. but it's still doable.

about clear out all null bits, yea, it's a big overhead for array with 
small element like boolean array, but I'm not sure this worth 2 different 
implementations, cc @rxin 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13832: [SPARK-16123] Avoid NegativeArraySizeException wh...

2016-06-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13832


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13832
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61138/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13832
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...

2016-06-23 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13832
  
Merging to master/2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13832
  
**[Test build #61138 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61138/consoleFull)**
 for PR 13832 at commit 
[`e2a1e1e`](https://github.com/apache/spark/commit/e2a1e1e757ada0f51bac8cf8b8a77b20d2d26c8e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61137/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...

2016-06-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13860
  
@hvanhovell Sure, will do it! It sounds like you also like the suggestion 
by @cloud-fan . Let me do it too. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...

2016-06-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13876
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...

2016-06-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13876
  
**[Test build #61137 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61137/consoleFull)**
 for PR 13876 at commit 
[`5a9f4ec`](https://github.com/apache/spark/commit/5a9f4ecdb349453a42ad2b06293183c55c0b1c44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...

2016-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13837#discussion_r68341391
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala
 ---
@@ -40,7 +40,7 @@ private[sql] class ParquetOptions(
 if (!shortParquetCompressionCodecNames.contains(codecName)) {
   val availableCodecs = 
shortParquetCompressionCodecNames.keys.map(_.toLowerCase)
   throw new IllegalArgumentException(s"Codec [$codecName] " +
-s"is not available. Available codecs are 
${availableCodecs.mkString(", ")}.")
+s"is not available. Known codecs are ${availableCodecs.mkString(", 
")}.")
--- End diff --

Just to make it consistent with the output of the other cases. See the 
code: 
https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CompressionCodecs.scala#L49-L51


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 475 matches

Mail list logo