[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14753 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64331/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14753 **[Test build #64331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64331/consoleFull)** for PR 14753 at commit [`7190eb0`](https://github.com/apache/spark/commit/7190eb0c2a4dce2c5b84c29fb90bb2def23a3520). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r75997361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala --- @@ -131,6 +150,11 @@ class SortBasedAggregationIterator( firstRowInNextGroup = currentRow.copy() } } + +// Serializes the generic object stored in aggregation buffer for TypedImperativeAggregate +// aggregation functions. +serializeTypedAggregateBuffer(sortBasedAggregationBuffer) --- End diff -- (basically, when we call `eval`, we always get the original object) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64330/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14780 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r75997312 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala --- @@ -131,6 +150,11 @@ class SortBasedAggregationIterator( firstRowInNextGroup = currentRow.copy() } } + +// Serializes the generic object stored in aggregation buffer for TypedImperativeAggregate +// aggregation functions. +serializeTypedAggregateBuffer(sortBasedAggregationBuffer) --- End diff -- An alternative approach is to call the serialization just before we output the buffer (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala#L233-L239). Then, we will not need to check the class at https://github.com/apache/spark/pull/14753/files#diff-9463c978126246071e528ddfa7a376d5R507. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14780 **[Test build #64330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64330/consoleFull)** for PR 14780 at commit [`cfbfefc`](https://github.com/apache/spark/commit/cfbfefc07364506fbafea0d853786e81c93cdebd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64329/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14753 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14753 **[Test build #64329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64329/consoleFull)** for PR 14753 at commit [`b843f2f`](https://github.com/apache/spark/commit/b843f2f0169d9021529b82377de09c20142b856a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14778: [SPARK-17174][SQL] Correct usages and documenatio...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/14778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14778 Hm.. I will close this for now and will ask what we want with this in the JIRA. Thanks again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13460: [SPARK-15615] [SQL] Support Json input from Dataset[Stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13460 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14781: [SPARK-17167] [2.0] [SQL] Issue Exceptions when Analyze ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14781 **[Test build #64333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64333/consoleFull)** for PR 14781 at commit [`77666fd`](https://github.com/apache/spark/commit/77666fd83f5bab69dc7191a6e1a8f7d253300de0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14778 I see. Yes, I think that makes sense. Let me convert this PR into fixing typos as below: ``` Returns returns date with the time portion of the day truncated to the unit specified by the format model fmt. ``` to ``` Returns date with the time portion of the day truncated to the unit specified by the format model fmt. ``` and ``` Extracts the date part of the date or datetime expression expr ``` to ``` Extracts the date part of the date or timestamp expression ``` with another look for those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14729: [SPARK-17167] [SQL] Issue Exceptions when Analyze...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14729 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14729: [SPARK-17167] [SQL] Issue Exceptions when Analyze Table ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14729 The PR https://github.com/apache/spark/pull/14781 is opened. This one will be closed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14753 **[Test build #64332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64332/consoleFull)** for PR 14753 at commit [`5904bcd`](https://github.com/apache/spark/commit/5904bcd2eb523b6f3e744925a0e9d9da52f6ae0b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14781: [SPARK-17167] [2.0] [SQL] Issue Exceptions when Analyze ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14781 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14753 ï¼ viirya, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14781: [SPARK-17167] [2.0] [SQL] Issue Exceptions when A...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14781 [SPARK-17167] [2.0] [SQL] Issue Exceptions when Analyze Table on In-Memory Cataloged Tables ### What changes were proposed in this pull request? Currently, `Analyze Table` is only used for Hive-serde tables. We should issue exceptions in all the other cases. When the tables are data source tables, we issued an exception. However, when tables are In-Memory Cataloged tables, we do not issue any exception. This PR is to issue an exception when the tables are in-memory cataloged. For example, ```SQL CREATE TABLE tbl(a INT, b INT) USING parquet ``` `tbl` is a `SimpleCatalogRelation` when the hive support is not enabled. ### How was this patch tested? Added two test cases. One of them is just to improve the test coverage when the analyzed table is data source tables. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark analyzeInMemoryTable2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14781.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14781 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14623 Hi, @rxin . Could you review this PR again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64328/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64328/consoleFull)** for PR 8880 at commit [`2204453`](https://github.com/apache/spark/commit/22044539a54329572e2d60123a1cb5f42e5f7626). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r75994611 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -389,3 +389,153 @@ abstract class DeclarativeAggregate def right: AttributeReference = inputAggBufferAttributes(aggBufferAttributes.indexOf(a)) } } + +/** + * Aggregation function which allows **arbitrary** user-defined java object to be used as internal + * aggregation buffer object. + * + * {{{ + *aggregation buffer for normal aggregation function `avg` + *| + *v + * +--+---+---+ + * | sum1 (Long) | count1 (Long) | generic user-defined java objects | + * +--+---+---+ + * ^ + * | + *Aggregation buffer object for `TypedImperativeAggregate` aggregation function + * }}} + * + * Work flow (Partial mode aggregate at Mapper side, and Final mode aggregate at Reducer side): + * + * Stage 1: Partial aggregate at Mapper side: + * + * 1. The framework calls `createAggregationBuffer(): T` to create an empty internal aggregation + * buffer object. + * 2. Upon each input row, the framework calls + * `update(buffer: T, input: InternalRow): Unit` to update the aggregation buffer object T. + * 3. After processing all rows of current group (group by key), the framework will serialize + * aggregation buffer object T to storage format (Array[Byte]) and persist the Array[Byte] + * to disk if needed. + * 4. The framework moves on to next group, until all groups have been processed. + * + * Shuffling exchange data to Reducer tasks... + * + * Stage 2: Final mode aggregate at Reducer side: + * + * 1. The framework calls `createAggregationBuffer(): T` to create an empty internal aggregation + * buffer object (type T) for merging. + * 2. For each aggregation output of Stage 1, The framework de-serializes the storage + * format (Array[Byte]) and produces one input aggregation object (type T). + * 3. For each input aggregation object, the framework calls `merge(buffer: T, input: T): Unit` + * to merge the input aggregation object into aggregation buffer object. + * 4. After processing all input aggregation objects of current group (group by key), the framework + * calls method `eval(buffer: T)` to generate the final output for this group. + * 5. The framework moves on to next group, until all groups have been processed. + * + * NOTE: SQL with TypedImperativeAggregate functions is planned in sort based aggregation, + * instead of hash based aggregation, as TypedImperativeAggregate use BinaryType as aggregation + * buffer's storage format, which is not supported by hash based aggregation. Hash based + * aggregation only support aggregation buffer of mutable types (like LongType, IntType that have + * fixed length and can be mutated in place in UnsafeRow) + */ +abstract class TypedImperativeAggregate[T] extends ImperativeAggregate { + + /** + * Creates an empty aggregation buffer object. This is called before processing each key group + * (group by key). + * + * @return an aggregation buffer object + */ + def createAggregationBuffer(): T + + /** + * In-place updates the aggregation buffer object with an input row. buffer = buffer + input. + * This is typically called when doing Partial or Complete mode aggregation. + * + * @param buffer The aggregation buffer object. + * @param input an input row + */ + def update(buffer: T, input: InternalRow): Unit + + /** + * Merges an input aggregation object into aggregation buffer object. buffer = buffer + input. + * This is typically called when doing PartialMerge or Final mode aggregation. + * + * @param buffer the aggregation buffer object used to store the aggregation result. + * @param input an input aggregation object. Input aggregation object can be produced by + * de-serializing the partial aggregate's output from Mapper side. + */ + def merge(buffer: T, input: T): Unit + + /** + * Generates the final aggregation result value for current key group with the aggregation buffer + * object. + * + * @param buffer aggregation buffer object. + * @return The aggregation result of current key group + */ +
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r75994597 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -389,3 +389,153 @@ abstract class DeclarativeAggregate def right: AttributeReference = inputAggBufferAttributes(aggBufferAttributes.indexOf(a)) } } + +/** + * Aggregation function which allows **arbitrary** user-defined java object to be used as internal + * aggregation buffer object. + * + * {{{ + *aggregation buffer for normal aggregation function `avg` + *| + *v + * +--+---+---+ + * | sum1 (Long) | count1 (Long) | generic user-defined java objects | + * +--+---+---+ + * ^ + * | + *Aggregation buffer object for `TypedImperativeAggregate` aggregation function + * }}} + * + * Work flow (Partial mode aggregate at Mapper side, and Final mode aggregate at Reducer side): + * + * Stage 1: Partial aggregate at Mapper side: + * + * 1. The framework calls `createAggregationBuffer(): T` to create an empty internal aggregation + * buffer object. + * 2. Upon each input row, the framework calls + * `update(buffer: T, input: InternalRow): Unit` to update the aggregation buffer object T. + * 3. After processing all rows of current group (group by key), the framework will serialize + * aggregation buffer object T to storage format (Array[Byte]) and persist the Array[Byte] + * to disk if needed. + * 4. The framework moves on to next group, until all groups have been processed. + * + * Shuffling exchange data to Reducer tasks... + * + * Stage 2: Final mode aggregate at Reducer side: + * + * 1. The framework calls `createAggregationBuffer(): T` to create an empty internal aggregation + * buffer object (type T) for merging. + * 2. For each aggregation output of Stage 1, The framework de-serializes the storage + * format (Array[Byte]) and produces one input aggregation object (type T). + * 3. For each input aggregation object, the framework calls `merge(buffer: T, input: T): Unit` + * to merge the input aggregation object into aggregation buffer object. + * 4. After processing all input aggregation objects of current group (group by key), the framework + * calls method `eval(buffer: T)` to generate the final output for this group. + * 5. The framework moves on to next group, until all groups have been processed. + * + * NOTE: SQL with TypedImperativeAggregate functions is planned in sort based aggregation, + * instead of hash based aggregation, as TypedImperativeAggregate use BinaryType as aggregation + * buffer's storage format, which is not supported by hash based aggregation. Hash based + * aggregation only support aggregation buffer of mutable types (like LongType, IntType that have + * fixed length and can be mutated in place in UnsafeRow) + */ +abstract class TypedImperativeAggregate[T] extends ImperativeAggregate { + + /** + * Creates an empty aggregation buffer object. This is called before processing each key group + * (group by key). + * + * @return an aggregation buffer object + */ + def createAggregationBuffer(): T + + /** + * In-place updates the aggregation buffer object with an input row. buffer = buffer + input. + * This is typically called when doing Partial or Complete mode aggregation. + * + * @param buffer The aggregation buffer object. + * @param input an input row + */ + def update(buffer: T, input: InternalRow): Unit + + /** + * Merges an input aggregation object into aggregation buffer object. buffer = buffer + input. + * This is typically called when doing PartialMerge or Final mode aggregation. + * + * @param buffer the aggregation buffer object used to store the aggregation result. + * @param input an input aggregation object. Input aggregation object can be produced by + * de-serializing the partial aggregate's output from Mapper side. + */ + def merge(buffer: T, input: T): Unit + + /** + * Generates the final aggregation result value for current key group with the aggregation buffer + * object. + * + * @param buffer aggregation buffer object. + * @return The aggregation result of current key group + */ +
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r75994519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala --- @@ -90,6 +91,24 @@ class SortBasedAggregationIterator( // compared to MutableRow (aggregation buffer) directly. private[this] val safeProj: Projection = FromUnsafeProjection(valueAttributes.map(_.dataType)) + // Aggregation function which uses generic aggregation buffer object. + // @see [[TypedImperativeAggregate]] for more information + private val typedImperativeAggregates: Array[TypedImperativeAggregate[_]] = { +aggregateFunctions.collect { + case (ag: TypedImperativeAggregate[_]) => ag +} + } + + // For TypedImperativeAggregate with generic aggregation buffer object, we need to call + // serializeAggregateBufferInPlace(...) explicitly to convert the aggregation buffer object + // to Spark Sql internally supported serializable storage format. + private def serializeTypedAggregateBuffer(aggregationBuffer: MutableRow): Unit = { --- End diff -- Unused parameter `aggregationBuffer`. Or replace the following `sortBasedAggregationBuffer` to `aggregationBuffer`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14778 This is just my personal impression. You always had better ask advice of committers. Spark community has been reducing the gap between DBMS and SparkSQL. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r75994509 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -389,3 +389,175 @@ abstract class DeclarativeAggregate def right: AttributeReference = inputAggBufferAttributes(aggBufferAttributes.indexOf(a)) } } + +/** + * Aggregation function which allows **arbitrary** user-defined java object to be used as internal + * aggregation buffer object. + * + * {{{ + *aggregation buffer for normal aggregation function `avg` + *| + *v + * +--+---+---+ + * | sum1 (Long) | count1 (Long) | generic user-defined java objects | + * +--+---+---+ + * ^ + * | + *Aggregation buffer object for `TypedImperativeAggregate` aggregation function + * }}} + * + * Work flow (Partial mode aggregate at Mapper side, and Final mode aggregate at Reducer side): + * + * Stage 1: Partial aggregate at Mapper side: + * + * 1. The framework calls `createAggregationBuffer(): T` to create an empty internal aggregation + * buffer object. + * 2. Upon each input row, the framework calls + * `update(buffer: T, input: InternalRow): Unit` to update the aggregation buffer object T. + * 3. After processing all rows of current group (group by key), the framework will serialize + * aggregation buffer object T to SparkSQL internally supported underlying storage format, and + * persist the serializable format to disk if needed. + * 4. The framework moves on to next group, until all groups have been processed. + * + * Shuffling exchange data to Reducer tasks... + * + * Stage 2: Final mode aggregate at Reducer side: + * + * 1. The framework calls `createAggregationBuffer(): T` to create an empty internal aggregation + * buffer object (type T) for merging. + * 2. For each aggregation output of Stage 1, The framework de-serializes the storage + * format and generates one input aggregation object (type T). + * 3. For each input aggregation object, the framework calls `merge(buffer: T, input: T): Unit` + * to merge the input aggregation object into aggregation buffer object. + * 4. After processing all input aggregation objects of current group (group by key), the framework + * calls method `eval(buffer: T)` to generate the final output for this group. + * 5. The framework moves on to next group, until all groups have been processed. + */ +abstract class TypedImperativeAggregate[T] extends ImperativeAggregate { + + /** + * Creates an empty aggregation buffer object. This is called before processing each key group + * (group by key). + * + * @return an aggregation buffer object + */ + def createAggregationBuffer(): T + + /** + * In-place updates the aggregation buffer object with an input row. buffer = buffer + input. + * This is typically called when doing Partial or Complete mode aggregation. + * + * @param buffer The aggregation buffer object. + * @param input an input row + */ + def update(buffer: T, input: InternalRow): Unit + + /** + * Merges an input aggregation object into aggregation buffer object. buffer = buffer + input. + * This is typically called when doing PartialMerge or Final mode aggregation. + * + * @param buffer the aggregation buffer object used to store the aggregation result. + * @param input an input aggregation object. Input aggregation object can be produced by + * de-serializing the partial aggregate's output from Mapper side. + */ + def merge(buffer: T, input: T): Unit + + /** + * Generates the final aggregation result value for current key group with the aggregation buffer + * object. + * + * @param buffer aggregation buffer object. + * @return The aggregation result of current key group + */ + def eval(buffer: T): Any + + /** Returns the class of aggregation buffer object */ + def aggregationBufferClass: Class[T] --- End diff -- oh, I was thinking about just avoid of using scala feature unless we have to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14778 If possible, why don't you make the code more consistently instead of function descriptions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14778 Ur, @HyukjinKwon . If you compare this with `MySQL` which returns timestamp for timestamp input, e.g., `date_add(current_timestamp(), INTERVAL 1 DAY)`, this might look weird. But, currently, this follows `Hive` definition and behavior. According to the current definition, these functions already define their input and output as `start_date` and `date` which means the `day` part of a certain time. For example, we usually don't say `current_date()` function ignores the time part. Sorry, but I'm not sure about this PR. ``` hive> select date_add(current_timestamp, 1); OK 2016-08-24 Time taken: 0.077 seconds, Fetched: 1 row(s) hive> describe function date_add; OK date_add(start_date, num_days) - Returns the date that is num_days after start_date. Time taken: 0.039 seconds, Fetched: 1 row(s) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14777 Don't merge this yet; may have found more decimal bugs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14279 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14279 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64326/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14279 **[Test build #64326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64326/consoleFull)** for PR 14279 at commit [`af8250e`](https://github.com/apache/spark/commit/af8250e12490c77f02587275eff9aa225e5dcdba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14711: [SPARK-16822] [DOC] [Support latex in scaladoc wi...
Github user jagadeesanas2 closed the pull request at: https://github.com/apache/spark/pull/14711 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14753 **[Test build #64331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64331/consoleFull)** for PR 14753 at commit [`7190eb0`](https://github.com/apache/spark/commit/7190eb0c2a4dce2c5b84c29fb90bb2def23a3520). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14778 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14778 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64325/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14778 **[Test build #64325 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64325/consoleFull)** for PR 14778 at commit [`71ddb42`](https://github.com/apache/spark/commit/71ddb42f9debc795746ff5946c303f6444df7425). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14780: [SPARK-17206]SQL] Support ANALYZE TABLE on analyzable te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14780 **[Test build #64330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64330/consoleFull)** for PR 14780 at commit [`cfbfefc`](https://github.com/apache/spark/commit/cfbfefc07364506fbafea0d853786e81c93cdebd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14780: [SPARK-17206]SQL] Support ANALYZE TABLE on analyzable te...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14780 @hvanhovell Based on the prior discussion, I opened a JIRA and this PR. Can you review it if it is on right direction? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10896 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64324/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10896 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14780: [SPARK-17206]SQL] Support ANALYZE TABLE on analyz...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/14780 [SPARK-17206]SQL] Support ANALYZE TABLE on analyzable temporary view ## What changes were proposed in this pull request? Currently `ANALYZE TABLE` DDL command can't work on temporary view. However, for the specified type of temporary view which is analyzable, we can support the DDL command for it. So the CBO can work with temporary view too. ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 analyze-temp-table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14780 commit cfbfefc07364506fbafea0d853786e81c93cdebd Author: Liang-Chi HsiehDate: 2016-08-22T09:19:14Z Support ANALYZE TABLE on analyzable temporary table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10896 **[Test build #64324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64324/consoleFull)** for PR 10896 at commit [`d5e0ed3`](https://github.com/apache/spark/commit/d5e0ed3d0efdc8047948e48cdc6fb1257cc381f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14753 **[Test build #64329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64329/consoleFull)** for PR 14753 at commit [`b843f2f`](https://github.com/apache/spark/commit/b843f2f0169d9021529b82377de09c20142b856a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14779 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64327/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14779 **[Test build #64327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64327/consoleFull)** for PR 14779 at commit [`fe76c69`](https://github.com/apache/spark/commit/fe76c69f78721e8825dbf4b27af728a147102c72). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14769 **[Test build #3231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3231/consoleFull)** for PR 14769 at commit [`8f3e25f`](https://github.com/apache/spark/commit/8f3e25fe3fb88ba51c8c01013786041f58e80427). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14537 Based on what you replied to @cloud-fan 's question, my follow-up question is: How about the non-partitioned empty ORC table? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75988035 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala --- @@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { } } + test("SPARK-16948. Check empty orc partitioned tables in ORC") { +withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) { + withTempPath { dir => --- End diff -- Could you remove this line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75987946 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala --- @@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { } } + test("SPARK-16948. Check empty orc partitioned tables in ORC") { +withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) { + withTempPath { dir => +withTable("empty_orc_partitioned") { + spark.sql( +s"""CREATE TABLE empty_orc_partitioned(key INT, value STRING) +| PARTITIONED BY (p INT) STORED AS ORC + """.stripMargin) --- End diff -- A comment about the style ```Scala sql( """ |CREATE TABLE empty_orc_partitioned(key INT, value STRING) |PARTITIONED BY (p INT) STORED AS ORC """.stripMargin) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75987870 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala --- @@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { } } + test("SPARK-16948. Check empty orc partitioned tables in ORC") { +withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) { + withTempPath { dir => +withTable("empty_orc_partitioned") { + spark.sql( +s"""CREATE TABLE empty_orc_partitioned(key INT, value STRING) +| PARTITIONED BY (p INT) STORED AS ORC + """.stripMargin) + + val emptyDF = Seq.empty[(Int, String)].toDF("key", "value").coalesce(1) + emptyDF.createOrReplaceTempView("empty") + + // Query empty table + val df = spark.sql( +s"""SELECT key, value FROM empty_orc_partitioned +| WHERE key > 10 + """.stripMargin) + checkAnswer(df, emptyDF) --- End diff -- A comment about the style. ```Scala checkAnswer( sql("SELECT key, value FROM empty_orc_partitioned WHERE key > 10"), emptyDF) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75987730 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala --- @@ -372,6 +372,29 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { } } + test("SPARK-16948. Check empty orc partitioned tables in ORC") { +withSQLConf((HiveUtils.CONVERT_METASTORE_ORC.key, "true")) { + withTempPath { dir => +withTable("empty_orc_partitioned") { + spark.sql( +s"""CREATE TABLE empty_orc_partitioned(key INT, value STRING) +| PARTITIONED BY (p INT) STORED AS ORC + """.stripMargin) + + val emptyDF = Seq.empty[(Int, String)].toDF("key", "value").coalesce(1) + emptyDF.createOrReplaceTempView("empty") --- End diff -- Could you remove this line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14777 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64320/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14777 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14777: [SPARK-17205] Literal.sql should handle Infinity and NaN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14777 **[Test build #64320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64320/consoleFull)** for PR 14777 at commit [`26e036a`](https://github.com/apache/spark/commit/26e036af512e7e21a1365cdf665cb5e9dca39c66). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14757#discussion_r75986531 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala --- @@ -21,26 +21,26 @@ import org.apache.hadoop.conf.Configuration import org.apache.spark.SparkConf import org.apache.spark.sql.catalyst.catalog._ -import org.apache.spark.sql.hive.client.HiveClient /** * Test suite for the [[HiveExternalCatalog]]. */ class HiveExternalCatalogSuite extends ExternalCatalogSuite { --- End diff -- Before the PR, what `HiveExternalCatalogSuite` uses is a [HiveUtils.newClientForExecution](https://github.com/apache/spark/blob/7bb64aae27f670531699f59d3f410e38866609b7/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala#L34) . The `newClientForExecution`'s configuration is `newTemporaryConfiguration`, [which makes a new path for metastore](https://github.com/apache/spark/blob/2ae7b88a07140e012b6c60db3c4a2a8ca360c684/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L366-L379). Thus, we can say it is pointing to a different metastore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64328/consoleFull)** for PR 8880 at commit [`2204453`](https://github.com/apache/spark/commit/22044539a54329572e2d60123a1cb5f42e5f7626). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14779: [SparkR][Minor] Add more examples to window function doc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14779 **[Test build #64327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64327/consoleFull)** for PR 14779 at commit [`fe76c69`](https://github.com/apache/spark/commit/fe76c69f78721e8825dbf4b27af728a147102c72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64321/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14702 **[Test build #64321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64321/consoleFull)** for PR 14702 at commit [`9afbd5e`](https://github.com/apache/spark/commit/9afbd5e2d2b08087596dc5d575935e4894b390bc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14779: [SparkR][Minor] Add more examples to window funct...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14779 [SparkR][Minor] Add more examples to window function docs ## What changes were proposed in this pull request? This PR adds more examples to window function docs to make them more accessible to the users. It also fixes default value issues for `lag` and `lead`. ## How was this patch tested? Manual test, R unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARKR-FixWindowFunctionDocs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14779 commit fe76c69f78721e8825dbf4b27af728a147102c72 Author: Junyang QianDate: 2016-08-24T02:25:43Z Add more examples to window function docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14279: [SPARK-16216][SQL] Read/write timestamps and dates in IS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14279 **[Test build #64326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64326/consoleFull)** for PR 14279 at commit [`af8250e`](https://github.com/apache/spark/commit/af8250e12490c77f02587275eff9aa225e5dcdba). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14747 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14747 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64323/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14747 **[Test build #64323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64323/consoleFull)** for PR 14747 at commit [`f800af2`](https://github.com/apache/spark/commit/f800af2ee50bea258025eced519d09301505af75). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14778 @dongjoon-hyun Thank you for your quick response. I will wait :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14778 Oh, sorry. I'm outside for dinner~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14640: [SPARK-17055] [MLLIB] add labelKFold to CrossValidator
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14640 This work may be similar with [SPARK-8971](https://github.com/apache/spark/pull/14321) which is another variation of KFold, and very significant in some cases. I suppose it is okay to add to .mllib like the latter PR, but we could add its use to CrossValidator in .ml. @sethah @MLnick @yanboliang BTW, fortunately, it seems to be easier to implement than the kFoldStratified, as it does not need to change underlying codes, such as in rdd/PairRDDFunctions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14726 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14773: [SPARK-17203][SQL] data source options should always be ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14773 Maybe these options should just case insensitive in general? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14778 **[Test build #64325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64325/consoleFull)** for PR 14778 at commit [`71ddb42`](https://github.com/apache/spark/commit/71ddb42f9debc795746ff5946c303f6444df7425). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14726: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14726 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14778: [SPARK-17174][SQL] Correct usages and documenations for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14778 Hi @dongjoon-hyun, do you mind if I ask a quick look first before cc other committers? (as I saw a related PR was merged). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14778: [SPARK-17174][SQL] Correct usages and documenatio...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14778 [SPARK-17174][SQL] Correct usages and documenations for functions returning date types which truncates time part ## What changes were proposed in this pull request? This PR fixes the documentation about functions returning `DateType`s to mention the time part will be truncated. Currently, the functions, `add_months`, `date_add`, `date_sub`, `last_day`, `next_day`, `to_date` and `trunc` can take `TimestampType` or string representation including time part as below: ```scala val df = Seq(Tuple1(Timestamp.valueOf("2012-07-16 12:12:12"))).toDF("ts") df.selectExpr("ts", "add_months(ts, 1)", "date_add(ts, 1)", "date_sub(ts, 1)").show() df.selectExpr("ts", "last_day(ts)", """next_day(ts, "TU")""", "to_date(ts)", """trunc(ts, "MM")""").show() ``` However, for those functions, the time part is truncated as below: ``` ++---+-+-+ | ts|add_months(CAST(ts AS DATE), 1)|date_add(CAST(ts AS DATE), 1)|date_sub(CAST(ts AS DATE), 1)| ++---+-+-+ |2012-07-16 12:12:...| 2012-08-16| 2012-07-17| 2012-07-15| ++---+-+-+ ++--+--+-+---+ | ts|last_day(CAST(ts AS DATE))|next_day(CAST(ts AS DATE), TU)|to_date(CAST(ts AS DATE))|trunc(CAST(ts AS DATE), MM)| ++--+--+-+---+ |2012-07-16 12:12:...|2012-07-31| 2012-07-17| 2012-07-16| 2012-07-01| ++--+--+-+---+ ``` In user's perspective, this might be weird (just like this JIRA is open). As a reference, Hive is mentioning this behaviour, https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java#L48-L51 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17174-doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14778.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14778 commit 71ddb42f9debc795746ff5946c303f6444df7425 Author: hyukjinkwonDate: 2016-08-24T01:35:40Z Fix documenations for date functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10896 **[Test build #64324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64324/consoleFull)** for PR 10896 at commit [`d5e0ed3`](https://github.com/apache/spark/commit/d5e0ed3d0efdc8047948e48cdc6fb1257cc381f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14769 **[Test build #3231 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3231/consoleFull)** for PR 14769 at commit [`8f3e25f`](https://github.com/apache/spark/commit/8f3e25fe3fb88ba51c8c01013786041f58e80427). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14637: [SPARK-16967] move mesos to module
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14637 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64317/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14637: [SPARK-16967] move mesos to module
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14637 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14637: [SPARK-16967] move mesos to module
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14637 **[Test build #64317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64317/consoleFull)** for PR 14637 at commit [`cdc5753`](https://github.com/apache/spark/commit/cdc5753d9813f3358625bec1c674f54e0d69835e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14747: [SPARK-17086][ML] Fix InvalidArgumentException issue in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14747 **[Test build #64323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64323/consoleFull)** for PR 14747 at commit [`f800af2`](https://github.com/apache/spark/commit/f800af2ee50bea258025eced519d09301505af75). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64316/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14753 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14753 **[Test build #64316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64316/consoleFull)** for PR 14753 at commit [`2873765`](https://github.com/apache/spark/commit/2873765dcc3cb2d57935a68f77f8e6e2585929c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10896 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10896 **[Test build #64322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64322/consoleFull)** for PR 10896 at commit [`8a81e23`](https://github.com/apache/spark/commit/8a81e23cebe25315be1e8d94dbf9b52258bc31f9). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10896 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64322/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10896 **[Test build #64322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64322/consoleFull)** for PR 10896 at commit [`8a81e23`](https://github.com/apache/spark/commit/8a81e23cebe25315be1e8d94dbf9b52258bc31f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/10896 okay, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14776: [SparkR][Minor] Fix doc for show method
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14776#discussion_r75980038 --- Diff: R/pkg/R/DataFrame.R --- @@ -212,9 +212,9 @@ setMethod("showDF", #' show #' -#' Print the SparkDataFrame column names and types +#' Print class and type information of a SparkR object. --- End diff -- not sure if there is a better name for this collection --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/10896#discussion_r75979792 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala --- @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.physical.Distribution import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec} /** + * A pattern that finds aggregate operators to support partial aggregations. + */ +object PartialAggregate { + + def unapply(plan: SparkPlan): Option[Distribution] = plan match { +case agg: AggregateExec +if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) => --- End diff -- yea, okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14702 **[Test build #64321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64321/consoleFull)** for PR 14702 at commit [`9afbd5e`](https://github.com/apache/spark/commit/9afbd5e2d2b08087596dc5d575935e4894b390bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/10896#discussion_r75979651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala --- @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.physical.Distribution import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec} /** + * A pattern that finds aggregate operators to support partial aggregations. + */ +object PartialAggregate { + + def unapply(plan: SparkPlan): Option[Distribution] = plan match { +case agg: AggregateExec +if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) => + Some(agg.requiredChildDistribution.head) +case _ => + None + } +} + +/** * Utility functions used by the query planner to convert our plan to new aggregation code path. */ object AggUtils { - def planAggregateWithoutPartial( + private def createPartialAggregateExec( groupingExpressions: Seq[NamedExpression], aggregateExpressions: Seq[AggregateExpression], - resultExpressions: Seq[NamedExpression], - child: SparkPlan): Seq[SparkPlan] = { + child: SparkPlan): SparkPlan = { +val groupingAttributes = groupingExpressions.map(_.toAttribute) +val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct) +val partialAggregateExpressions = aggregateExpressions.map { + case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 => +agg.copy(mode = PartialMerge) + case agg => +agg.copy(mode = Partial) +} +val partialAggregateAttributes = + partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes) +val partialResultExpressions = + groupingAttributes ++ + partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes) -val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete)) -val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute) -SortAggregateExec( - requiredChildDistributionExpressions = Some(groupingExpressions), +createAggregateExec( + requiredChildDistributionExpressions = None, groupingExpressions = groupingExpressions, - aggregateExpressions = completeAggregateExpressions, - aggregateAttributes = completeAggregateAttributes, - initialInputBufferOffset = 0, - resultExpressions = resultExpressions, - child = child -) :: Nil + aggregateExpressions = partialAggregateExpressions, + aggregateAttributes = partialAggregateAttributes, + initialInputBufferOffset = if (functionsWithDistinct.length > 0) { +groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length + } else { +0 + }, + resultExpressions = partialResultExpressions, + child = child) + } + + private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = { +def updateMode(mode: AggregateMode) = mode match { + case Partial => PartialMerge + case Complete => Final + case mode => mode +} +aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode))) } - private def createAggregate( + /** + * Builds new merge and map-side [[AggregateExec]]s from an input aggregate operator. + * If an aggregation needs a shuffle for satisfying its own distribution and supports partial + * aggregations, a map-side aggregation is appended before the shuffle in + * [[org.apache.spark.sql.execution.exchange.EnsureRequirements]]. + */ + def createMapMergeAggregatePair(operator: SparkPlan): (SparkPlan, SparkPlan) = operator match { +case agg: AggregateExec => + val mapSideAgg = createPartialAggregateExec( +agg.groupingExpressions, agg.aggregateExpressions, agg.child) + val mergeAgg = createAggregateExec( +requiredChildDistributionExpressions = agg.requiredChildDistributionExpressions, +groupingExpressions = agg.groupingExpressions.map(_.toAttribute), +aggregateExpressions = updateMergeAggregateMode(agg.aggregateExpressions), +aggregateAttributes = agg.aggregateAttributes, +initialInputBufferOffset = agg.groupingExpressions.length, +
[GitHub] spark issue #14776: [SparkR][Minor] Fix doc for show method
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14776 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64319/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org