[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64331/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64331 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64331/consoleFull)**
 for PR 14753 at commit 
[`7190eb0`](https://github.com/apache/spark/commit/7190eb0c2a4dce2c5b84c29fb90bb2def23a3520).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75997361
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala
 ---
@@ -131,6 +150,11 @@ class SortBasedAggregationIterator(
 firstRowInNextGroup = currentRow.copy()
   }
 }
+
+// Serializes the generic object stored in aggregation buffer for 
TypedImperativeAggregate
+// aggregation functions.
+serializeTypedAggregateBuffer(sortBasedAggregationBuffer)
--- End diff --

(basically, when we call `eval`, we always get the original object)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64330/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14780
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75997312
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala
 ---
@@ -131,6 +150,11 @@ class SortBasedAggregationIterator(
 firstRowInNextGroup = currentRow.copy()
   }
 }
+
+// Serializes the generic object stored in aggregation buffer for 
TypedImperativeAggregate
+// aggregation functions.
+serializeTypedAggregateBuffer(sortBasedAggregationBuffer)
--- End diff --

An alternative approach is to call the serialization just before we output 
the buffer 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala#L233-L239).
 Then, we will not need to check the class at 
https://github.com/apache/spark/pull/14753/files#diff-9463c978126246071e528ddfa7a376d5R507.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14780
  
**[Test build #64330 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64330/consoleFull)**
 for PR 14780 at commit 
[`cfbfefc`](https://github.com/apache/spark/commit/cfbfefc07364506fbafea0d853786e81c93cdebd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64329/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64329/consoleFull)**
 for PR 14753 at commit 
[`b843f2f`](https://github.com/apache/spark/commit/b843f2f0169d9021529b82377de09c20142b856a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14778: [SPARK-17174][SQL] Correct usages and documenatio...

2016-08-23 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/14778


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14778
  
Hm.. I will close this for now and will ask what we want with this in the 
JIRA. Thanks again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13460: [SPARK-15615] [SQL] Support Json input from Dataset[Stri...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13460
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14781: [SPARK-17167] [2.0] [SQL] Issue Exceptions when Analyze ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14781
  
**[Test build #64333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64333/consoleFull)**
 for PR 14781 at commit 
[`77666fd`](https://github.com/apache/spark/commit/77666fd83f5bab69dc7191a6e1a8f7d253300de0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14778
  
I see. Yes, I think that makes sense. Let me convert this PR into fixing 
typos as below:

```
Returns returns date with the time portion of the day truncated to the unit 
specified by the format model fmt.
```

to 

```
Returns date with the time portion of the day truncated to the unit 
specified by the format model fmt.
```

and 

```
Extracts the date part of the date or datetime expression expr
```

to

```
Extracts the date part of the date or timestamp expression
```

with another look for those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14729: [SPARK-17167] [SQL] Issue Exceptions when Analyze...

2016-08-23 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/14729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14729: [SPARK-17167] [SQL] Issue Exceptions when Analyze Table ...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14729
  
The PR https://github.com/apache/spark/pull/14781 is opened. This one will 
be closed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64332/consoleFull)**
 for PR 14753 at commit 
[`5904bcd`](https://github.com/apache/spark/commit/5904bcd2eb523b6f3e744925a0e9d9da52f6ae0b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14781: [SPARK-17167] [2.0] [SQL] Issue Exceptions when Analyze ...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14781
  
cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread clockfly
Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14753
  
ï¼ viirya,  thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14781: [SPARK-17167] [2.0] [SQL] Issue Exceptions when A...

2016-08-23 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14781

[SPARK-17167] [2.0] [SQL] Issue Exceptions when Analyze Table on In-Memory 
Cataloged Tables

### What changes were proposed in this pull request?
Currently, `Analyze Table` is only used for Hive-serde tables. We should 
issue exceptions in all the other cases. When the tables are data source 
tables, we issued an exception. However, when tables are In-Memory Cataloged 
tables, we do not issue any exception.

This PR is to issue an exception when the tables are in-memory cataloged. 
For example,  
```SQL
CREATE TABLE tbl(a INT, b INT) USING parquet
```
`tbl` is a `SimpleCatalogRelation` when the hive support is not enabled.

### How was this patch tested?
Added two test cases. One of them is just to improve the test coverage when 
the analyzed table is data source tables.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark analyzeInMemoryTable2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14781.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14781






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-08-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14623
  
Hi, @rxin .
Could you review this PR again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/8880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/8880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64328/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/8880
  
**[Test build #64328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64328/consoleFull)**
 for PR 8880 at commit 
[`2204453`](https://github.com/apache/spark/commit/22044539a54329572e2d60123a1cb5f42e5f7626).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75994611
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,153 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * Aggregation function which allows **arbitrary** user-defined java 
object to be used as internal
+ * aggregation buffer object.
+ *
+ * {{{
+ *aggregation buffer for normal aggregation function `avg`
+ *|
+ *v
+ *  
+--+---+---+
+ *  |  sum1 (Long) | count1 (Long) | generic user-defined 
java objects |
+ *  
+--+---+---+
+ * ^
+ * |
+ *Aggregation buffer object for 
`TypedImperativeAggregate` aggregation function
+ * }}}
+ *
+ * Work flow (Partial mode aggregate at Mapper side, and Final mode 
aggregate at Reducer side):
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object.
+ *  2. Upon each input row, the framework calls
+ * `update(buffer: T, input: InternalRow): Unit` to update the 
aggregation buffer object T.
+ *  3. After processing all rows of current group (group by key), the 
framework will serialize
+ * aggregation buffer object T to storage format (Array[Byte]) and 
persist the Array[Byte]
+ * to disk if needed.
+ *  4. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object (type T) for merging.
+ *  2. For each aggregation output of Stage 1, The framework de-serializes 
the storage
+ * format (Array[Byte]) and produces one input aggregation object 
(type T).
+ *  3. For each input aggregation object, the framework calls 
`merge(buffer: T, input: T): Unit`
+ * to merge the input aggregation object into aggregation buffer 
object.
+ *  4. After processing all input aggregation objects of current group 
(group by key), the framework
+ * calls method `eval(buffer: T)` to generate the final output for 
this group.
+ *  5. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * NOTE: SQL with TypedImperativeAggregate functions is planned in sort 
based aggregation,
+ * instead of hash based aggregation, as TypedImperativeAggregate use 
BinaryType as aggregation
+ * buffer's storage format, which is not supported by hash based 
aggregation. Hash based
+ * aggregation only support aggregation buffer of mutable types (like 
LongType, IntType that have
+ * fixed length and can be mutated in place in UnsafeRow)
+ */
+abstract class TypedImperativeAggregate[T] extends ImperativeAggregate {
+
+  /**
+   * Creates an empty aggregation buffer object. This is called before 
processing each key group
+   * (group by key).
+   *
+   * @return an aggregation buffer object
+   */
+  def createAggregationBuffer(): T
+
+  /**
+   * In-place updates the aggregation buffer object with an input row. 
buffer = buffer + input.
+   * This is typically called when doing Partial or Complete mode 
aggregation.
+   *
+   * @param buffer The aggregation buffer object.
+   * @param input an input row
+   */
+  def update(buffer: T, input: InternalRow): Unit
+
+  /**
+   * Merges an input aggregation object into aggregation buffer object. 
buffer = buffer + input.
+   * This is typically called when doing PartialMerge or Final mode 
aggregation.
+   *
+   * @param buffer the aggregation buffer object used to store the 
aggregation result.
+   * @param input an input aggregation object. Input aggregation object 
can be produced by
+   *  de-serializing the partial aggregate's output from 
Mapper side.
+   */
+  def merge(buffer: T, input: T): Unit
+
+  /**
+   * Generates the final aggregation result value for current key group 
with the aggregation buffer
+   * object.
+   *
+   * @param buffer aggregation buffer object.
+   * @return The aggregation result of current key group
+   */
+  

[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75994597
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,153 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * Aggregation function which allows **arbitrary** user-defined java 
object to be used as internal
+ * aggregation buffer object.
+ *
+ * {{{
+ *aggregation buffer for normal aggregation function `avg`
+ *|
+ *v
+ *  
+--+---+---+
+ *  |  sum1 (Long) | count1 (Long) | generic user-defined 
java objects |
+ *  
+--+---+---+
+ * ^
+ * |
+ *Aggregation buffer object for 
`TypedImperativeAggregate` aggregation function
+ * }}}
+ *
+ * Work flow (Partial mode aggregate at Mapper side, and Final mode 
aggregate at Reducer side):
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object.
+ *  2. Upon each input row, the framework calls
+ * `update(buffer: T, input: InternalRow): Unit` to update the 
aggregation buffer object T.
+ *  3. After processing all rows of current group (group by key), the 
framework will serialize
+ * aggregation buffer object T to storage format (Array[Byte]) and 
persist the Array[Byte]
+ * to disk if needed.
+ *  4. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object (type T) for merging.
+ *  2. For each aggregation output of Stage 1, The framework de-serializes 
the storage
+ * format (Array[Byte]) and produces one input aggregation object 
(type T).
+ *  3. For each input aggregation object, the framework calls 
`merge(buffer: T, input: T): Unit`
+ * to merge the input aggregation object into aggregation buffer 
object.
+ *  4. After processing all input aggregation objects of current group 
(group by key), the framework
+ * calls method `eval(buffer: T)` to generate the final output for 
this group.
+ *  5. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * NOTE: SQL with TypedImperativeAggregate functions is planned in sort 
based aggregation,
+ * instead of hash based aggregation, as TypedImperativeAggregate use 
BinaryType as aggregation
+ * buffer's storage format, which is not supported by hash based 
aggregation. Hash based
+ * aggregation only support aggregation buffer of mutable types (like 
LongType, IntType that have
+ * fixed length and can be mutated in place in UnsafeRow)
+ */
+abstract class TypedImperativeAggregate[T] extends ImperativeAggregate {
+
+  /**
+   * Creates an empty aggregation buffer object. This is called before 
processing each key group
+   * (group by key).
+   *
+   * @return an aggregation buffer object
+   */
+  def createAggregationBuffer(): T
+
+  /**
+   * In-place updates the aggregation buffer object with an input row. 
buffer = buffer + input.
+   * This is typically called when doing Partial or Complete mode 
aggregation.
+   *
+   * @param buffer The aggregation buffer object.
+   * @param input an input row
+   */
+  def update(buffer: T, input: InternalRow): Unit
+
+  /**
+   * Merges an input aggregation object into aggregation buffer object. 
buffer = buffer + input.
+   * This is typically called when doing PartialMerge or Final mode 
aggregation.
+   *
+   * @param buffer the aggregation buffer object used to store the 
aggregation result.
+   * @param input an input aggregation object. Input aggregation object 
can be produced by
+   *  de-serializing the partial aggregate's output from 
Mapper side.
+   */
+  def merge(buffer: T, input: T): Unit
+
+  /**
+   * Generates the final aggregation result value for current key group 
with the aggregation buffer
+   * object.
+   *
+   * @param buffer aggregation buffer object.
+   * @return The aggregation result of current key group
+   */
+  

[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75994519
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala
 ---
@@ -90,6 +91,24 @@ class SortBasedAggregationIterator(
   // compared to MutableRow (aggregation buffer) directly.
   private[this] val safeProj: Projection = 
FromUnsafeProjection(valueAttributes.map(_.dataType))
 
+  // Aggregation function which uses generic aggregation buffer object.
+  // @see [[TypedImperativeAggregate]] for more information
+  private val typedImperativeAggregates: 
Array[TypedImperativeAggregate[_]] = {
+aggregateFunctions.collect {
+  case (ag: TypedImperativeAggregate[_]) => ag
+}
+  }
+
+  // For TypedImperativeAggregate with generic aggregation buffer object, 
we need to call
+  // serializeAggregateBufferInPlace(...) explicitly to convert the 
aggregation buffer object
+  // to Spark Sql internally supported serializable storage format.
+  private def serializeTypedAggregateBuffer(aggregationBuffer: 
MutableRow): Unit = {
--- End diff --

Unused parameter `aggregationBuffer`. Or replace the following 
`sortBasedAggregationBuffer` to `aggregationBuffer`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14778
  
This is just my personal impression. You always had better ask advice of 
committers. Spark community has been reducing the gap between DBMS and 
SparkSQL. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75994509
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,175 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * Aggregation function which allows **arbitrary** user-defined java 
object to be used as internal
+ * aggregation buffer object.
+ *
+ * {{{
+ *aggregation buffer for normal aggregation function `avg`
+ *|
+ *v
+ *  
+--+---+---+
+ *  |  sum1 (Long) | count1 (Long) | generic user-defined 
java objects |
+ *  
+--+---+---+
+ * ^
+ * |
+ *Aggregation buffer object for 
`TypedImperativeAggregate` aggregation function
+ * }}}
+ *
+ * Work flow (Partial mode aggregate at Mapper side, and Final mode 
aggregate at Reducer side):
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object.
+ *  2. Upon each input row, the framework calls
+ * `update(buffer: T, input: InternalRow): Unit` to update the 
aggregation buffer object T.
+ *  3. After processing all rows of current group (group by key), the 
framework will serialize
+ * aggregation buffer object T to SparkSQL internally supported 
underlying storage format, and
+ * persist the serializable format to disk if needed.
+ *  4. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object (type T) for merging.
+ *  2. For each aggregation output of Stage 1, The framework de-serializes 
the storage
+ * format and generates one input aggregation object (type T).
+ *  3. For each input aggregation object, the framework calls 
`merge(buffer: T, input: T): Unit`
+ * to merge the input aggregation object into aggregation buffer 
object.
+ *  4. After processing all input aggregation objects of current group 
(group by key), the framework
+ * calls method `eval(buffer: T)` to generate the final output for 
this group.
+ *  5. The framework moves on to next group, until all groups have been 
processed.
+ */
+abstract class TypedImperativeAggregate[T] extends ImperativeAggregate {
+
+  /**
+   * Creates an empty aggregation buffer object. This is called before 
processing each key group
+   * (group by key).
+   *
+   * @return an aggregation buffer object
+   */
+  def createAggregationBuffer(): T
+
+  /**
+   * In-place updates the aggregation buffer object with an input row. 
buffer = buffer + input.
+   * This is typically called when doing Partial or Complete mode 
aggregation.
+   *
+   * @param buffer The aggregation buffer object.
+   * @param input an input row
+   */
+  def update(buffer: T, input: InternalRow): Unit
+
+  /**
+   * Merges an input aggregation object into aggregation buffer object. 
buffer = buffer + input.
+   * This is typically called when doing PartialMerge or Final mode 
aggregation.
+   *
+   * @param buffer the aggregation buffer object used to store the 
aggregation result.
+   * @param input an input aggregation object. Input aggregation object 
can be produced by
+   *  de-serializing the partial aggregate's output from 
Mapper side.
+   */
+  def merge(buffer: T, input: T): Unit
+
+  /**
+   * Generates the final aggregation result value for current key group 
with the aggregation buffer
+   * object.
+   *
+   * @param buffer aggregation buffer object.
+   * @return The aggregation result of current key group
+   */
+  def eval(buffer: T): Any
+
+  /** Returns the class of aggregation buffer object */
+  def aggregationBufferClass: Class[T]
--- End diff --

oh, I was thinking about just avoid of using scala feature unless we have 
to. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is 

[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14778
  
If possible, why don't you make the code more consistently instead of 
function descriptions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14778
  
Ur, @HyukjinKwon .

If you compare this with `MySQL` which returns timestamp for timestamp 
input, e.g., `date_add(current_timestamp(), INTERVAL 1 DAY)`, this might look 
weird. But, currently, this follows `Hive` definition and behavior. According 
to the current definition, these functions already define their input and 
output as `start_date` and `date` which means the `day` part of a certain time. 
For example, we usually don't say `current_date()` function ignores the time 
part. Sorry, but I'm not sure about this PR.

```
hive> select date_add(current_timestamp, 1);
OK
2016-08-24
Time taken: 0.077 seconds, Fetched: 1 row(s)

hive> describe function date_add;
OK
date_add(start_date, num_days) - Returns the date that is num_days after 
start_date.
Time taken: 0.039 seconds, Fetched: 1 row(s)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN

2016-08-23 Thread JoshRosen
Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/14777
  
Don't merge this yet; may have found more decimal bugs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14279
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14279
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64326/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14279
  
**[Test build #64326 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64326/consoleFull)**
 for PR 14279 at commit 
[`af8250e`](https://github.com/apache/spark/commit/af8250e12490c77f02587275eff9aa225e5dcdba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14711: [SPARK-16822] [DOC] [Support latex in scaladoc wi...

2016-08-23 Thread jagadeesanas2
Github user jagadeesanas2 closed the pull request at:

https://github.com/apache/spark/pull/14711


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64331/consoleFull)**
 for PR 14753 at commit 
[`7190eb0`](https://github.com/apache/spark/commit/7190eb0c2a4dce2c5b84c29fb90bb2def23a3520).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14778
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14778
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64325/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14778
  
**[Test build #64325 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64325/consoleFull)**
 for PR 14778 at commit 
[`71ddb42`](https://github.com/apache/spark/commit/71ddb42f9debc795746ff5946c303f6444df7425).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14780: [SPARK-17206]SQL] Support ANALYZE TABLE on analyzable te...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14780
  
**[Test build #64330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64330/consoleFull)**
 for PR 14780 at commit 
[`cfbfefc`](https://github.com/apache/spark/commit/cfbfefc07364506fbafea0d853786e81c93cdebd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14780: [SPARK-17206]SQL] Support ANALYZE TABLE on analyzable te...

2016-08-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14780
  
@hvanhovell Based on the prior discussion, I opened a JIRA and this PR. Can 
you review it if it is on right direction? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10896
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64324/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10896
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14780: [SPARK-17206]SQL] Support ANALYZE TABLE on analyz...

2016-08-23 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/14780

[SPARK-17206]SQL] Support ANALYZE TABLE on analyzable temporary view

## What changes were proposed in this pull request?

Currently `ANALYZE TABLE` DDL command can't work on temporary view. 
However, for the specified type of temporary view which is analyzable, we can 
support the DDL command for it. So the CBO can work with temporary view too.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 analyze-temp-table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14780.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14780


commit cfbfefc07364506fbafea0d853786e81c93cdebd
Author: Liang-Chi Hsieh 
Date:   2016-08-22T09:19:14Z

Support ANALYZE TABLE on analyzable temporary table.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10896
  
**[Test build #64324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64324/consoleFull)**
 for PR 10896 at commit 
[`d5e0ed3`](https://github.com/apache/spark/commit/d5e0ed3d0efdc8047948e48cdc6fb1257cc381f0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64329/consoleFull)**
 for PR 14753 at commit 
[`b843f2f`](https://github.com/apache/spark/commit/b843f2f0169d9021529b82377de09c20142b856a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14779
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64327/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14779
  
**[Test build #64327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64327/consoleFull)**
 for PR 14779 at commit 
[`fe76c69`](https://github.com/apache/spark/commit/fe76c69f78721e8825dbf4b27af728a147102c72).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14769
  
**[Test build #3231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3231/consoleFull)**
 for PR 14769 at commit 
[`8f3e25f`](https://github.com/apache/spark/commit/8f3e25fe3fb88ba51c8c01013786041f58e80427).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14537
  
Based on what you replied to @cloud-fan 's question, my follow-up question 
is:

How about the non-partitioned empty ORC table? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14537#discussion_r75988035
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with 
BeforeAndAfterAll with OrcTest {
 }
   }
 
+  test("SPARK-16948. Check empty orc partitioned tables in ORC") {
+withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) {
+  withTempPath { dir =>
--- End diff --

Could you remove this line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14537#discussion_r75987946
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with 
BeforeAndAfterAll with OrcTest {
 }
   }
 
+  test("SPARK-16948. Check empty orc partitioned tables in ORC") {
+withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) {
+  withTempPath { dir =>
+withTable("empty_orc_partitioned") {
+  spark.sql(
+s"""CREATE TABLE empty_orc_partitioned(key INT, value STRING)
+| PARTITIONED BY (p INT) STORED AS ORC
+  """.stripMargin)
--- End diff --

A comment about the style
```Scala
  sql(
"""
  |CREATE TABLE empty_orc_partitioned(key INT, value STRING)
  |PARTITIONED BY (p INT) STORED AS ORC
""".stripMargin)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14537#discussion_r75987870
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with 
BeforeAndAfterAll with OrcTest {
 }
   }
 
+  test("SPARK-16948. Check empty orc partitioned tables in ORC") {
+withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) {
+  withTempPath { dir =>
+withTable("empty_orc_partitioned") {
+  spark.sql(
+s"""CREATE TABLE empty_orc_partitioned(key INT, value STRING)
+| PARTITIONED BY (p INT) STORED AS ORC
+  """.stripMargin)
+
+  val emptyDF = Seq.empty[(Int, String)].toDF("key", 
"value").coalesce(1)
+  emptyDF.createOrReplaceTempView("empty")
+
+  // Query empty table
+  val df = spark.sql(
+s"""SELECT key, value FROM empty_orc_partitioned
+| WHERE key > 10
+  """.stripMargin)
+  checkAnswer(df, emptyDF)
--- End diff --

A comment about the style. 
```Scala
  checkAnswer(
sql("SELECT key, value FROM empty_orc_partitioned WHERE key > 
10"),
emptyDF)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14537#discussion_r75987730
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with 
BeforeAndAfterAll with OrcTest {
 }
   }
 
+  test("SPARK-16948. Check empty orc partitioned tables in ORC") {
+withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) {
+  withTempPath { dir =>
+withTable("empty_orc_partitioned") {
+  spark.sql(
+s"""CREATE TABLE empty_orc_partitioned(key INT, value STRING)
+| PARTITIONED BY (p INT) STORED AS ORC
+  """.stripMargin)
+
+  val emptyDF = Seq.empty[(Int, String)].toDF("key", 
"value").coalesce(1)
+  emptyDF.createOrReplaceTempView("empty")
--- End diff --

Could you remove this line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14777
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64320/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14777
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14777
  
**[Test build #64320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64320/consoleFull)**
 for PR 14777 at commit 
[`26e036a`](https://github.com/apache/spark/commit/26e036af512e7e21a1365cdf665cb5e9dca39c66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75986531
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
 ---
@@ -21,26 +21,26 @@ import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.catalyst.catalog._
-import org.apache.spark.sql.hive.client.HiveClient
 
 /**
  * Test suite for the [[HiveExternalCatalog]].
  */
 class HiveExternalCatalogSuite extends ExternalCatalogSuite {
--- End diff --

Before the PR, what `HiveExternalCatalogSuite` uses is a 
[HiveUtils.newClientForExecution](https://github.com/apache/spark/blob/7bb64aae27f670531699f59d3f410e38866609b7/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala#L34)
 . The `newClientForExecution`'s configuration is `newTemporaryConfiguration`, 
[which makes a new path for 
metastore](https://github.com/apache/spark/blob/2ae7b88a07140e012b6c60db3c4a2a8ca360c684/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L366-L379).
 Thus, we can say it is pointing to a different metastore.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/8880
  
**[Test build #64328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64328/consoleFull)**
 for PR 8880 at commit 
[`2204453`](https://github.com/apache/spark/commit/22044539a54329572e2d60123a1cb5f42e5f7626).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14779
  
**[Test build #64327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64327/consoleFull)**
 for PR 14779 at commit 
[`fe76c69`](https://github.com/apache/spark/commit/fe76c69f78721e8825dbf4b27af728a147102c72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14702
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64321/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14702
  
**[Test build #64321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64321/consoleFull)**
 for PR 14702 at commit 
[`9afbd5e`](https://github.com/apache/spark/commit/9afbd5e2d2b08087596dc5d575935e4894b390bc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14702
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14779: [SparkR][Minor] Add more examples to window funct...

2016-08-23 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14779

[SparkR][Minor] Add more examples to window function docs

## What changes were proposed in this pull request?

This PR adds more examples to window function docs to make them more 
accessible to the users.

It also fixes default value issues for `lag` and `lead`.

## How was this patch tested?

Manual test, R unit test.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARKR-FixWindowFunctionDocs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14779


commit fe76c69f78721e8825dbf4b27af728a147102c72
Author: Junyang Qian 
Date:   2016-08-24T02:25:43Z

Add more examples to window function docs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14279
  
**[Test build #64326 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64326/consoleFull)**
 for PR 14279 at commit 
[`af8250e`](https://github.com/apache/spark/commit/af8250e12490c77f02587275eff9aa225e5dcdba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14747
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14747
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64323/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14747
  
**[Test build #64323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64323/consoleFull)**
 for PR 14747 at commit 
[`f800af2`](https://github.com/apache/spark/commit/f800af2ee50bea258025eced519d09301505af75).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14778
  
@dongjoon-hyun Thank you for your quick response. I will wait :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14778
  
Oh, sorry. I'm outside for dinner~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14640: [SPARK-17055] [MLLIB] add labelKFold to CrossValidator

2016-08-23 Thread hqzizania
Github user hqzizania commented on the issue:

https://github.com/apache/spark/pull/14640
  
This work may be similar with 
[SPARK-8971](https://github.com/apache/spark/pull/14321) which is another 
variation of KFold, and very significant in some cases.  I suppose it is okay 
to add to .mllib like the latter PR, but we could add its use to CrossValidator 
in .ml. @sethah @MLnick @yanboliang 
BTW, fortunately, it seems to be easier to implement than the 
kFoldStratified, as it does not need to change underlying codes, such as in 
rdd/PairRDDFunctions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14726


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14773: [SPARK-17203][SQL] data source options should always be ...

2016-08-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14773
  
Maybe these options should just case insensitive in general?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14778
  
**[Test build #64325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64325/consoleFull)**
 for PR 14778 at commit 
[`71ddb42`](https://github.com/apache/spark/commit/71ddb42f9debc795746ff5946c303f6444df7425).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14726: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14726
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...

2016-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14778
  
Hi @dongjoon-hyun, do you mind if I ask a quick look first before cc other 
committers?  (as I saw a related PR was merged).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14778: [SPARK-17174][SQL] Correct usages and documenatio...

2016-08-23 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14778

[SPARK-17174][SQL] Correct usages and documenations for functions returning 
date types which truncates time part

## What changes were proposed in this pull request?

This PR fixes the documentation about functions returning `DateType`s to 
mention the time part will be truncated.

Currently, the functions, `add_months`, `date_add`, `date_sub`, `last_day`, 
`next_day`, `to_date` and `trunc` can take `TimestampType` or string 
representation including time part as below:

```scala
val df = Seq(Tuple1(Timestamp.valueOf("2012-07-16 12:12:12"))).toDF("ts")
df.selectExpr("ts", "add_months(ts, 1)", "date_add(ts, 1)", "date_sub(ts, 
1)").show()
df.selectExpr("ts", "last_day(ts)", """next_day(ts, "TU")""", 
"to_date(ts)", """trunc(ts, "MM")""").show()
```

However, for those functions, the time part is truncated as below:

```

++---+-+-+
|  ts|add_months(CAST(ts AS DATE), 1)|date_add(CAST(ts AS 
DATE), 1)|date_sub(CAST(ts AS DATE), 1)|

++---+-+-+
|2012-07-16 12:12:...| 2012-08-16|   
2012-07-17|   2012-07-15|

++---+-+-+


++--+--+-+---+
|  ts|last_day(CAST(ts AS DATE))|next_day(CAST(ts AS DATE), 
TU)|to_date(CAST(ts AS DATE))|trunc(CAST(ts AS DATE), MM)|

++--+--+-+---+
|2012-07-16 12:12:...|2012-07-31|
2012-07-17|   2012-07-16| 2012-07-01|

++--+--+-+---+
```

In user's perspective, this might be weird (just like this JIRA is open). 
As a reference, Hive is mentioning this behaviour, 
https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java#L48-L51

## How was this patch tested?

N/A



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17174-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14778


commit 71ddb42f9debc795746ff5946c303f6444df7425
Author: hyukjinkwon 
Date:   2016-08-24T01:35:40Z

Fix documenations for date functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10896
  
**[Test build #64324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64324/consoleFull)**
 for PR 10896 at commit 
[`d5e0ed3`](https://github.com/apache/spark/commit/d5e0ed3d0efdc8047948e48cdc6fb1257cc381f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14769
  
**[Test build #3231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3231/consoleFull)**
 for PR 14769 at commit 
[`8f3e25f`](https://github.com/apache/spark/commit/8f3e25fe3fb88ba51c8c01013786041f58e80427).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14637
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64317/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14637
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14637
  
**[Test build #64317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64317/consoleFull)**
 for PR 14637 at commit 
[`cdc5753`](https://github.com/apache/spark/commit/cdc5753d9813f3358625bec1c674f54e0d69835e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14747
  
**[Test build #64323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64323/consoleFull)**
 for PR 14747 at commit 
[`f800af2`](https://github.com/apache/spark/commit/f800af2ee50bea258025eced519d09301505af75).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64316/consoleFull)**
 for PR 14753 at commit 
[`2873765`](https://github.com/apache/spark/commit/2873765dcc3cb2d57935a68f77f8e6e2585929c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10896
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10896
  
**[Test build #64322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64322/consoleFull)**
 for PR 10896 at commit 
[`8a81e23`](https://github.com/apache/spark/commit/8a81e23cebe25315be1e8d94dbf9b52258bc31f9).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10896
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64322/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10896
  
**[Test build #64322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64322/consoleFull)**
 for PR 10896 at commit 
[`8a81e23`](https://github.com/apache/spark/commit/8a81e23cebe25315be1e8d94dbf9b52258bc31f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-08-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/10896
  
okay, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14776: [SparkR][Minor] Fix doc for show method

2016-08-23 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14776#discussion_r75980038
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -212,9 +212,9 @@ setMethod("showDF",
 
 #' show
 #'
-#' Print the SparkDataFrame column names and types
+#' Print class and type information of a SparkR object.
--- End diff --

not sure if there is a better name for this collection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

2016-08-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10896#discussion_r75979792
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala 
---
@@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
 
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.physical.Distribution
 import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, 
StateStoreSaveExec}
 
 /**
+ * A pattern that finds aggregate operators to support partial 
aggregations.
+ */
+object PartialAggregate {
+
+  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
+case agg: AggregateExec
+if 
agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
--- End diff --

yea, okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14702
  
**[Test build #64321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64321/consoleFull)**
 for PR 14702 at commit 
[`9afbd5e`](https://github.com/apache/spark/commit/9afbd5e2d2b08087596dc5d575935e4894b390bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

2016-08-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10896#discussion_r75979651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala 
---
@@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
 
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.physical.Distribution
 import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, 
StateStoreSaveExec}
 
 /**
+ * A pattern that finds aggregate operators to support partial 
aggregations.
+ */
+object PartialAggregate {
+
+  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
+case agg: AggregateExec
+if 
agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
+  Some(agg.requiredChildDistribution.head)
+case _ =>
+  None
+  }
+}
+
+/**
  * Utility functions used by the query planner to convert our plan to new 
aggregation code path.
  */
 object AggUtils {
 
-  def planAggregateWithoutPartial(
+  private def createPartialAggregateExec(
   groupingExpressions: Seq[NamedExpression],
   aggregateExpressions: Seq[AggregateExpression],
-  resultExpressions: Seq[NamedExpression],
-  child: SparkPlan): Seq[SparkPlan] = {
+  child: SparkPlan): SparkPlan = {
+val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
+val partialAggregateExpressions = aggregateExpressions.map {
+  case agg @ AggregateExpression(_, _, false, _) if 
functionsWithDistinct.length > 0 =>
+agg.copy(mode = PartialMerge)
+  case agg =>
+agg.copy(mode = Partial)
+}
+val partialAggregateAttributes =
+  
partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
+val partialResultExpressions =
+  groupingAttributes ++
+
partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
 
-val completeAggregateExpressions = 
aggregateExpressions.map(_.copy(mode = Complete))
-val completeAggregateAttributes = 
completeAggregateExpressions.map(_.resultAttribute)
-SortAggregateExec(
-  requiredChildDistributionExpressions = Some(groupingExpressions),
+createAggregateExec(
+  requiredChildDistributionExpressions = None,
   groupingExpressions = groupingExpressions,
-  aggregateExpressions = completeAggregateExpressions,
-  aggregateAttributes = completeAggregateAttributes,
-  initialInputBufferOffset = 0,
-  resultExpressions = resultExpressions,
-  child = child
-) :: Nil
+  aggregateExpressions = partialAggregateExpressions,
+  aggregateAttributes = partialAggregateAttributes,
+  initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
+groupingExpressions.length + 
functionsWithDistinct.head.aggregateFunction.children.length
+  } else {
+0
+  },
+  resultExpressions = partialResultExpressions,
+  child = child)
+  }
+
+  private def updateMergeAggregateMode(aggregateExpressions: 
Seq[AggregateExpression]) = {
+def updateMode(mode: AggregateMode) = mode match {
+  case Partial => PartialMerge
+  case Complete => Final
+  case mode => mode
+}
+aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
   }
 
-  private def createAggregate(
+  /**
+   * Builds new merge and map-side [[AggregateExec]]s from an input 
aggregate operator.
+   * If an aggregation needs a shuffle for satisfying its own distribution 
and supports partial
+   * aggregations, a map-side aggregation is appended before the shuffle in
+   * [[org.apache.spark.sql.execution.exchange.EnsureRequirements]].
+   */
+  def createMapMergeAggregatePair(operator: SparkPlan): (SparkPlan, 
SparkPlan) = operator match {
+case agg: AggregateExec =>
+  val mapSideAgg = createPartialAggregateExec(
+agg.groupingExpressions, agg.aggregateExpressions, agg.child)
+  val mergeAgg = createAggregateExec(
+requiredChildDistributionExpressions = 
agg.requiredChildDistributionExpressions,
+groupingExpressions = agg.groupingExpressions.map(_.toAttribute),
+aggregateExpressions = 
updateMergeAggregateMode(agg.aggregateExpressions),
+aggregateAttributes = agg.aggregateAttributes,
+initialInputBufferOffset = agg.groupingExpressions.length,
+

[GitHub] spark issue #14776: [SparkR][Minor] Fix doc for show method

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14776
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64319/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >