[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8431#issuecomment-164249212 @Agent007 does this require adding this large data file to the repo? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/10281 [SPARK-12310] [SparkR] Add write.json and write.parquet for SparkR Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-12310 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10281.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10281 commit e19a100236d3cf6213a0268ca461d99f099ddf6b Author: Yanbo LiangDate: 2015-12-13T10:59:11Z Add write.json and write.parquet for SparkR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12016][MLlib][PySpark] Wrap Word2VecMod...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10100#issuecomment-164245365 ping @davies Can you take a look of this? It should be straightforward one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/10280 [Minor] [ML] Rename weights to coefficients for examples/DeveloperApiExample Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-coefficients Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10280.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10280 commit f19b748e980edbb0a13d32146437c4f38a986e13 Author: Yanbo LiangDate: 2015-12-13T08:39:35Z Rename weights to coefficients for examples/DeveloperApiExample --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164251993 @yhuai I think having the two clearly separated paths (this PR) is an improvement of the current situation. I also admit that I am responsible for introducing the second path. Your comment on having four aggregate steps without the exhange triggered me, and I was thinking out loud on how we could do this using the rewriting rule (the removal of one of the paths would have been a bonus). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link
GitHub user BenFradet opened a pull request: https://github.com/apache/spark/pull/10282 [Minor] [DOC] Fix broken word2vec link Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BenFradet/spark SPARK-12199 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10282 commit 8937801896a1b7cc350abbe89e720506b75dac56 Author: BenFradetDate: 2015-12-13T12:09:56Z fix broken word2vec link --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11923][ML] Python API for ml.feature.Ch...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10186#issuecomment-164238348 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12204][SPARKR] Implement drop method fo...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10201#issuecomment-164320958 According to https://issues.apache.org/jira/browse/SPARK-6635 and https://issues.apache.org/jira/browse/SPARK-10073, the feature for Scala was in Spark 1.4.0 and python in 1.5.0. But seems both just have API updated without any migration guide for compatibility break. Do we need to do it specifically for SparkR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10281#discussion_r47456558 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1448,23 +1457,28 @@ test_that("write.df() on DataFrame and works with read.parquet", { expect_equal(count(df), count(parquetDF)) }) -test_that("read.parquet()/parquetFile() works with multiple input paths", { +test_that("read.parquet()/parquetFile() and write.parquet()/saveAsParquetFile()", { --- End diff -- How about renaming this test case to "read/write Parquet files", where read.parquet(), parquetFile(), read.df(), write.parquet(), saveAsParquetFile(), write.df() are tested to read/write json files. This test case can be merged with the one "write.df() as parquet file" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12275][SQL] No plan for BroadcastHint i...
Github user yucai commented on the pull request: https://github.com/apache/spark/pull/10265#issuecomment-164325959 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-164327277 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12315][SQL] isnotnull operator not push...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/10287 [SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource. `IsNotNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), [SQL:1999](http://web.cs.ualberta.ca/~yuan/courses/db_readings/ansi-iso-9075-2-1999.pdf), [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-12315 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10287 commit eacb4eac990ebf18b51f9269e4115b1b83a7be62 Author: hyukjinkwonDate: 2015-12-14T02:53:49Z Add a case for isnotnull operator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/10281#discussion_r47458635 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -371,13 +371,13 @@ test_that("Collect DataFrame with complex types", { expect_equal(bob$height, 176.5) }) -test_that("read.json()/jsonFile() on a local file returns a DataFrame", { +test_that("read.json()/jsonFile() and write.json()", { df <- read.json(sqlContext, jsonPath) expect_is(df, "DataFrame") expect_equal(count(df), 3) # read.json()/jsonFile() works with multiple input paths jsonPath2 <- tempfile(pattern="jsonPath2", fileext=".json") - write.df(df, jsonPath2, "json", mode="overwrite") + write.json(df, jsonPath2) --- End diff -- I agree to have a test for ```write.df```, but it still exists even I removing this such as [here](https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L870). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/10281#discussion_r47458647 --- Diff: R/pkg/R/DataFrame.R --- @@ -614,12 +641,25 @@ setMethod("toJSON", #' sqlContext <- sparkRSQL.init(sc) #' path <- "path/to/file.json" #' df <- read.json(sqlContext, path) -#' saveAsParquetFile(df, "/tmp/sparkr-tmp/") +#' write.parquet(df, "/tmp/sparkr-tmp1/") +#' saveAsParquetFile(df, "/tmp/sparkr-tmp2/") #'} +setMethod("write.parquet", + signature(x = "DataFrame", path = "character"), + function(x, path) { +write <- callJMethod(x@sdf, "write") +invisible(callJMethod(write, "parquet", path)) + }) + +#' @family DataFrame functions --- End diff -- Good point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12186] [WEB UI] Send the complete reque...
Github user mindprince commented on the pull request: https://github.com/apache/spark/pull/10180#issuecomment-164333244 Can someone review this or if it seems okay, merge it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-164334418 **[Test build #47637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47637/consoleFull)** for PR 8880 at commit [`f88f011`](https://github.com/apache/spark/commit/f88f011081ad07377758c5b187fe50fd1c7f4791). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-164334532 **[Test build #47637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47637/consoleFull)** for PR 8880 at commit [`f88f011`](https://github.com/apache/spark/commit/f88f011081ad07377758c5b187fe50fd1c7f4791). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-164334536 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12275][SQL] No plan for BroadcastHint i...
Github user yucai commented on the pull request: https://github.com/apache/spark/pull/10265#issuecomment-164339684 @andrewor14 , @yhuai could you kindly help re-trigger testing? Much thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6518][MLlib][Example][DOC] Add example ...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9952#issuecomment-164341602 Oh, I'm terribly sorry about that. Pushing the update was failed - Modified what you pointed out - Add a Java example and its doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12057] [SQL] Prevent failure on corrupt...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10288#issuecomment-164343229 @simplyianm @srowen I create this PR based on #10043. Please see https://github.com/yhuai/spark/commit/d8722bb30d347fa97c7d358284eb433edb1be6ec for my change. This PR should fix the test. It also makes the schema inference more robust to records that we cannot parse. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12275][SQL] No plan for BroadcastHint i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10265#issuecomment-164346090 **[Test build #47642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47642/consoleFull)** for PR 10265 at commit [`1b8d570`](https://github.com/apache/spark/commit/1b8d5701cf3804040748219729a092af3af1e70e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10282#issuecomment-164255724 **[Test build #47627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47627/consoleFull)** for PR 10282 at commit [`8937801`](https://github.com/apache/spark/commit/8937801896a1b7cc350abbe89e720506b75dac56). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10280#issuecomment-164256735 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10280#issuecomment-164256737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47625/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10280#issuecomment-164256562 **[Test build #47625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47625/consoleFull)** for PR 10280 at commit [`f19b748`](https://github.com/apache/spark/commit/f19b748e980edbb0a13d32146437c4f38a986e13). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164263964 **[Test build #47622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47622/consoleFull)** for PR 10228 at commit [`51ca055`](https://github.com/apache/spark/commit/51ca0553c1b4f3307512954858741ad89cea89f6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164264208 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164264212 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47622/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10283#issuecomment-164270890 **[Test build #47628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47628/consoleFull)** for PR 10283 at commit [`ca4326a`](https://github.com/apache/spark/commit/ca4326ab0c4d26af3eddcc9550624db91d195e8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10282#issuecomment-164258023 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10281#issuecomment-164258061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47626/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10281#issuecomment-164257767 **[Test build #47626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47626/consoleFull)** for PR 10281 at commit [`e19a100`](https://github.com/apache/spark/commit/e19a100236d3cf6213a0268ca461d99f099ddf6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10282#issuecomment-164258021 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10281#issuecomment-164258058 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8431#issuecomment-164269210 [Test build #2214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2214/console) for PR 8431 at commit [`9998df6`](https://github.com/apache/spark/commit/9998df67b1c15c7cc350ef59c10af8275b947e02). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `public class JavaTwitterHashTagJoinSentiments ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10280#issuecomment-164255583 **[Test build #47625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47625/consoleFull)** for PR 10280 at commit [`f19b748`](https://github.com/apache/spark/commit/f19b748e980edbb0a13d32146437c4f38a986e13). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10279#issuecomment-164255590 **[Test build #47624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47624/consoleFull)** for PR 10279 at commit [`c696d2b`](https://github.com/apache/spark/commit/c696d2bb14ecbfa44da9ec97cc495eff0728f2ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10281#issuecomment-164255617 **[Test build #47626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47626/consoleFull)** for PR 10281 at commit [`e19a100`](https://github.com/apache/spark/commit/e19a100236d3cf6213a0268ca461d99f099ddf6b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164265237 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47623/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164265233 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164264969 **[Test build #47623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)** for PR 10278 at commit [`50733c6`](https://github.com/apache/spark/commit/50733c6239b721ecb1f0691bb3d4680235c15a18). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/10283 [SPARK-12293][SQL] Support UnsafeRow in LocalTableScan JIRA: https://issues.apache.org/jira/browse/SPARK-12293 Make LocalTableScan support UnsafeRow. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 unsaferow-localtablescan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10283 commit ca4326ab0c4d26af3eddcc9550624db91d195e8a Author: Liang-Chi HsiehDate: 2015-12-13T15:46:13Z Support UnsafeRow in LocalTableScan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164255399 **[Test build #47622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47622/consoleFull)** for PR 10228 at commit [`51ca055`](https://github.com/apache/spark/commit/51ca0553c1b4f3307512954858741ad89cea89f6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10282#issuecomment-164252960 cc @jkbradley @yinxusen @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...
Github user Agent007 commented on the pull request: https://github.com/apache/spark/pull/8431#issuecomment-164253461 Hi @srowen Yes, it has been discussed in this pull request. The AFINN word sentiment file is only 27.4 KB and is used to calculate the Twitter hash tag sentiments for the streaming example. At first, the file was downloaded at runtime as suggested by @brennonyork but @tdas wanted the file in the spark/data directory instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/10282#issuecomment-164253470 LGTM thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164255495 **[Test build #47623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)** for PR 10278 at commit [`50733c6`](https://github.com/apache/spark/commit/50733c6239b721ecb1f0691bb3d4680235c15a18). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...
Github user reggert commented on a diff in the pull request: https://github.com/apache/spark/pull/9264#discussion_r47446916 --- Diff: core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala --- @@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with BeforeAndAfterAll with Tim Await.result(f, Duration(20, "milliseconds")) } } + + private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): Unit = { +val executionContextInvoked = Promise[Unit] +val fakeExecutionContext = new ExecutionContext { + override def execute(runnable: Runnable): Unit = { +executionContextInvoked.success(()) + } + override def reportFailure(t: Throwable): Unit = () +} +val starter = Smuggle(new Semaphore(0)) +starter.drainPermits() +val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => starter.acquire(1); itr} +val f = action(rdd) +f.onComplete(_ => ())(fakeExecutionContext) +// Here we verify that registering the callback didn't cause a thread to be consumed. +assert(!executionContextInvoked.isCompleted) +// Now allow the executors to proceed with task processing. +starter.release(rdd.partitions.length) +// Waiting for the result verifies that the tasks were successfully processed. +// This mainly exists to verify that we didn't break task deserialization. --- End diff -- Oh, is that all? :-) Comment line removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8431#issuecomment-164255270 [Test build #2214 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2214/consoleFull) for PR 8431 at commit [`9998df6`](https://github.com/apache/spark/commit/9998df67b1c15c7cc350ef59c10af8275b947e02). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10228#discussion_r47444204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala --- @@ -165,237 +137,100 @@ abstract class AggregationIterator( // Initializing functions used to process a row. protected val processRow: (MutableRow, InternalRow) => Unit = { -val rowToBeProcessed = new JoinedRow -val aggregationBufferSchema = allAggregateFunctions.flatMap(_.aggBufferAttributes) -aggregationMode match { - // Partial-only - case (Some(Partial), None) => -val updateExpressions = nonCompleteAggregateFunctions.flatMap { - case ae: DeclarativeAggregate => ae.updateExpressions - case agg: AggregateFunction => Seq.fill(agg.aggBufferAttributes.length)(NoOp) -} -val expressionAggUpdateProjection = - newMutableProjection(updateExpressions, aggregationBufferSchema ++ valueAttributes)() - -(currentBuffer: MutableRow, row: InternalRow) => { - expressionAggUpdateProjection.target(currentBuffer) - // Process all expression-based aggregate functions. - expressionAggUpdateProjection(rowToBeProcessed(currentBuffer, row)) - // Process all imperative aggregate functions. - var i = 0 - while (i < nonCompleteImperativeAggregateFunctions.length) { - nonCompleteImperativeAggregateFunctions(i).update(currentBuffer, row) -i += 1 - } -} - - // PartialMerge-only or Final-only - case (Some(PartialMerge), None) | (Some(Final), None) => -val inputAggregationBufferSchema = if (initialInputBufferOffset == 0) { - // If initialInputBufferOffset, the input value does not contain - // grouping keys. - // This part is pretty hacky. - allAggregateFunctions.flatMap(_.inputAggBufferAttributes).toSeq -} else { - groupingKeyAttributes ++ allAggregateFunctions.flatMap(_.inputAggBufferAttributes) -} -// val inputAggregationBufferSchema = -// groupingKeyAttributes ++ -//allAggregateFunctions.flatMap(_.cloneBufferAttributes) -val mergeExpressions = nonCompleteAggregateFunctions.flatMap { - case ae: DeclarativeAggregate => ae.mergeExpressions - case agg: AggregateFunction => Seq.fill(agg.aggBufferAttributes.length)(NoOp) -} -// This projection is used to merge buffer values for all expression-based aggregates. -val expressionAggMergeProjection = - newMutableProjection( -mergeExpressions, -aggregationBufferSchema ++ inputAggregationBufferSchema)() - -(currentBuffer: MutableRow, row: InternalRow) => { - // Process all expression-based aggregate functions. - expressionAggMergeProjection.target(currentBuffer)(rowToBeProcessed(currentBuffer, row)) - // Process all imperative aggregate functions. - var i = 0 - while (i < nonCompleteImperativeAggregateFunctions.length) { - nonCompleteImperativeAggregateFunctions(i).merge(currentBuffer, row) -i += 1 +val joinedRow = new JoinedRow +if (aggregateExpressions.nonEmpty) { + val mergeExpressions = aggregateFunctions.zipWithIndex.flatMap { +case (ae: DeclarativeAggregate, i) => + aggregateExpressions(i).mode match { +case Partial | Complete => ae.updateExpressions +case PartialMerge | Final => ae.mergeExpressions } -} - - // Final-Complete - case (Some(Final), Some(Complete)) => -val completeAggregateFunctions: Array[AggregateFunction] = - allAggregateFunctions.takeRight(completeAggregateExpressions.length) -// All imperative aggregate functions with mode Complete. -val completeImperativeAggregateFunctions: Array[ImperativeAggregate] = - completeAggregateFunctions.collect { case func: ImperativeAggregate => func } - -// The first initialInputBufferOffset values of the input aggregation buffer is -// for grouping expressions and distinct columns. -val groupingAttributesAndDistinctColumns = valueAttributes.take(initialInputBufferOffset) - -val completeOffsetExpressions = - Seq.fill(completeAggregateFunctions.map(_.aggBufferAttributes.length).sum)(NoOp) -// We do not touch buffer values of aggregate functions with the Final mode. -val finalOffsetExpressions = -
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10228#discussion_r47444202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala --- @@ -165,237 +137,100 @@ abstract class AggregationIterator( // Initializing functions used to process a row. protected val processRow: (MutableRow, InternalRow) => Unit = { -val rowToBeProcessed = new JoinedRow -val aggregationBufferSchema = allAggregateFunctions.flatMap(_.aggBufferAttributes) -aggregationMode match { - // Partial-only - case (Some(Partial), None) => -val updateExpressions = nonCompleteAggregateFunctions.flatMap { - case ae: DeclarativeAggregate => ae.updateExpressions - case agg: AggregateFunction => Seq.fill(agg.aggBufferAttributes.length)(NoOp) -} -val expressionAggUpdateProjection = - newMutableProjection(updateExpressions, aggregationBufferSchema ++ valueAttributes)() - -(currentBuffer: MutableRow, row: InternalRow) => { - expressionAggUpdateProjection.target(currentBuffer) - // Process all expression-based aggregate functions. - expressionAggUpdateProjection(rowToBeProcessed(currentBuffer, row)) - // Process all imperative aggregate functions. - var i = 0 - while (i < nonCompleteImperativeAggregateFunctions.length) { - nonCompleteImperativeAggregateFunctions(i).update(currentBuffer, row) -i += 1 - } -} - - // PartialMerge-only or Final-only - case (Some(PartialMerge), None) | (Some(Final), None) => -val inputAggregationBufferSchema = if (initialInputBufferOffset == 0) { - // If initialInputBufferOffset, the input value does not contain - // grouping keys. - // This part is pretty hacky. - allAggregateFunctions.flatMap(_.inputAggBufferAttributes).toSeq -} else { - groupingKeyAttributes ++ allAggregateFunctions.flatMap(_.inputAggBufferAttributes) -} -// val inputAggregationBufferSchema = -// groupingKeyAttributes ++ -//allAggregateFunctions.flatMap(_.cloneBufferAttributes) -val mergeExpressions = nonCompleteAggregateFunctions.flatMap { - case ae: DeclarativeAggregate => ae.mergeExpressions - case agg: AggregateFunction => Seq.fill(agg.aggBufferAttributes.length)(NoOp) -} -// This projection is used to merge buffer values for all expression-based aggregates. -val expressionAggMergeProjection = - newMutableProjection( -mergeExpressions, -aggregationBufferSchema ++ inputAggregationBufferSchema)() - -(currentBuffer: MutableRow, row: InternalRow) => { - // Process all expression-based aggregate functions. - expressionAggMergeProjection.target(currentBuffer)(rowToBeProcessed(currentBuffer, row)) - // Process all imperative aggregate functions. - var i = 0 - while (i < nonCompleteImperativeAggregateFunctions.length) { - nonCompleteImperativeAggregateFunctions(i).merge(currentBuffer, row) -i += 1 +val joinedRow = new JoinedRow +if (aggregateExpressions.nonEmpty) { + val mergeExpressions = aggregateFunctions.zipWithIndex.flatMap { --- End diff -- ```zip(aggregateExpressions)``` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10279#issuecomment-164260918 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10279#issuecomment-164260893 **[Test build #47624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47624/consoleFull)** for PR 10279 at commit [`c696d2b`](https://github.com/apache/spark/commit/c696d2bb14ecbfa44da9ec97cc495eff0728f2ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10279#issuecomment-164260919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47624/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10283#issuecomment-164272319 **[Test build #47628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47628/consoleFull)** for PR 10283 at commit [`ca4326a`](https://github.com/apache/spark/commit/ca4326ab0c4d26af3eddcc9550624db91d195e8a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `case class LocalRelation(output: Seq[Attribute], data: Seq[UnsafeRow] = Nil)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10283#issuecomment-164272329 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47628/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10283#issuecomment-164272328 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9264#issuecomment-164276707 **[Test build #47629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47629/consoleFull)** for PR 9264 at commit [`539ac43`](https://github.com/apache/spark/commit/539ac43c3be54abb61b5a44450cc13cc196113b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10281#discussion_r47449369 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -454,6 +454,8 @@ test_that("insertInto() on a registered table", { expect_equal(count(sql(sqlContext, "select * from table1")), 2) expect_equal(first(sql(sqlContext, "select * from table1 order by age"))$name, "Bob") dropTempTable(sqlContext, "table1") + + unlink(parquetPath2) --- End diff -- should there be `unlink(jsonPath2)` as well? if so could you please add that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12204][SPARKR] Implement drop method fo...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10201#issuecomment-164294560 @shivaram I checked but release notes and programming guide/migration guide and I don't see referencing to withColumn for Spark 1.4.0 or 1.4.1. Perhaps the behavior change happened before the 1.4.0 release? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-164284282 @hvanhovell just a quick heads up. I am going to review this PR today. Will post a comment once I finish my review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164284205 @hvanhovell Yea, it's great to think about how to use a single rule to handle aggregation queries with distinct after we have this improvement. The logical rewriter rules probably is a good place because rewriting logical plans is easier. If it is the right approach, we can make some changes to our physical planner to make it respect the aggregation mode of an agg expression in a logical agg operator (right now, our physical planner always ignore the mode). So, when we create physical plan, we can understand that, for example, a logical agg operator is used to merge aggregation buffers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9264#issuecomment-164285868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47629/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47449872 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int --- End diff -- Add scala doc for `default`, `offset`, and `offsetSign`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47450253 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") -
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47450842 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) --- End diff -- I guess the results still depend on the number of rows of that partition. I tried a few queries in postgres ``` yhuai=# select lead(1, 2) over(); lead -- (1 row) yhuai=# select lead(1, 2) over() from (select 100 as k union all select 99 as k) tmp; lead -- (2 rows) yhuai=# select lead(1, 2) over() from (select 100 as k union all select 99 as k union all select 98 as k) tmp; lead -- 1 (3 rows) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9264#issuecomment-164285867 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9264#issuecomment-164285830 **[Test build #47629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47629/consoleFull)** for PR 9264 at commit [`539ac43`](https://github.com/apache/spark/commit/539ac43c3be54abb61b5a44450cc13cc196113b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `trait JobSubmitter `\n * `class ComplexFutureAction[T](run : JobSubmitter => Future[T])`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10281#discussion_r47449353 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -371,13 +371,13 @@ test_that("Collect DataFrame with complex types", { expect_equal(bob$height, 176.5) }) -test_that("read.json()/jsonFile() on a local file returns a DataFrame", { +test_that("read.json()/jsonFile() and write.json()", { df <- read.json(sqlContext, jsonPath) expect_is(df, "DataFrame") expect_equal(count(df), 3) # read.json()/jsonFile() works with multiple input paths jsonPath2 <- tempfile(pattern="jsonPath2", fileext=".json") - write.df(df, jsonPath2, "json", mode="overwrite") + write.json(df, jsonPath2) --- End diff -- I think it actually make sense to have a test for `write.df` with additional option (in this case `mode="overwrite"`) - is there a reason for removing this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10281#discussion_r47449336 --- Diff: R/pkg/R/DataFrame.R --- @@ -614,12 +641,25 @@ setMethod("toJSON", #' sqlContext <- sparkRSQL.init(sc) #' path <- "path/to/file.json" #' df <- read.json(sqlContext, path) -#' saveAsParquetFile(df, "/tmp/sparkr-tmp/") +#' write.parquet(df, "/tmp/sparkr-tmp1/") +#' saveAsParquetFile(df, "/tmp/sparkr-tmp2/") #'} +setMethod("write.parquet", + signature(x = "DataFrame", path = "character"), + function(x, path) { +write <- callJMethod(x@sdf, "write") +invisible(callJMethod(write, "parquet", path)) + }) + +#' @family DataFrame functions --- End diff -- putting @family here would create multiple "See Also" section in the resulting html page, we would only need one for all those with the same @rdname --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47449863 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47449859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47449589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) --- End diff -- Do we expect a `OffsetWindowFunction` be foldable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47450230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) --- End diff -- It can be foldable, for instance: ```LEAD(1, 2) OVER (PARTITION BY ... ORDER BY ...)``` should always return ```2```. However there is currently no Optimizer rule to eliminate this. Should I add one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-164318656 **[Test build #47632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47632/consoleFull)** for PR 8880 at commit [`b912465`](https://github.com/apache/spark/commit/b912465481623fb50068b8f29622991616233cf4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47451483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) --- End diff -- Ahhh... Yes, you have a point. Without a foldable default value it is definately not foldable. Lets make it non-foldable for now... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47452110 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47452210 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/10284 [SPARK-12062] [CORE] Change Master to asyc rebuild UI when application completes This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark async-MasterUI-SPARK-12062 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10284 commit fe67dd79e86c635ccecde7aa9ca288d159cf474f Author: Bryan CutlerDate: 2015-12-08T02:36:46Z [SPARK-12062] Changed Master rebuildSparkUI to run async from the RPC thread commit b748dc10b6c47ea88794a5d596c44e77266607c3 Author: Bryan Cutler Date: 2015-12-08T18:51:27Z rebuildUI thread was not being shutdown commit 7d60de1728d46c75c47d7fec77c898e93fcc5e52 Author: Bryan Cutler Date: 2015-12-08T20:16:43Z remove line that was accidentally left in for testing commit af20e77940c33044dc8f7075e941219e737bb744 Author: Bryan Cutler Date: 2015-12-12T00:24:50Z cleanup of log file opening logic commit 54089aa1c1dc73b8f09e54973dd7ef77f361547c Author: Bryan Cutler Date: 2015-12-12T00:47:36Z minor cleanup commit 80076374068e58fc9d89cf78c3fa90bfeb23ecb6 Author: Bryan Cutler Date: 2015-12-13T21:52:06Z changed catching Exception to NonFatal, which is better commit 456c806eee6577fcb2721879613f20351e57ff14 Author: Bryan Cutler Date: 2015-12-13T23:10:58Z fixed indentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47452245 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala --- @@ -472,7 +475,7 @@ class HiveWindowFunctionQuerySuite extends HiveComparisonTest with BeforeAndAfte |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name | rows between 2 preceding and 2 following) """.stripMargin, reset = false) - + */ --- End diff -- Instead of disabling it, how about we convert it to a regular `test` (it will not use Hive for the golden answers and we can use `checkAnswer`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...
Github user BryanCutler commented on the pull request: https://github.com/apache/spark/pull/10284#issuecomment-164309595 I tested this by making an artificially large event log file, then tried to stop the worker and make sure it could re-register before rebuilding the UI was completed, which worked fine for me on a local cluster at least. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47452622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -19,10 +19,13 @@ package org.apache.spark.sql.execution import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.types.IntegerType import org.apache.spark.rdd.RDD -import org.apache.spark.util.collection.CompactBuffer + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer --- End diff -- order imports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-164312709 Can you add scala doc to explain how we evaluate an regular agg function when it is used as a window function? (Maybe I missed it) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-164313553 @hvanhovell This is very cool! I have finished my review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10284#issuecomment-164315793 **[Test build #47630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47630/consoleFull)** for PR 10284 at commit [`456c806`](https://github.com/apache/spark/commit/456c806eee6577fcb2721879613f20351e57ff14). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` case class AttachCompletedRebuildUI(appId: String)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10284#issuecomment-164315823 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47630/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10284#issuecomment-164315821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user winningsix commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-164317758 >>we have to do it in the level of Yarn module since we need to obtain the security key from the credentials stored in user group information >Yeah, I see... I was going to suggest moving the implementation of YarnSparkHadoopUtil.getCurrentUserCredentials to core, but hadoop 1 doesn't have the needed API. So we have to wait for SPARK-11807 to do that. Let's leave the test in the yarn module in the meantime and file a bug (blocked by SPARK-11807) to move it to the right place later. Can you do that? Ticket is filed in https://issues.apache.org/jira/browse/SPARK-12278 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12281][Core]Fix a race condition when r...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10269#issuecomment-164319280 Thanks @zsxwing for finding out this wrong state transition issue. For now I think `assert` should be OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10284#issuecomment-164310014 **[Test build #47630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47630/consoleFull)** for PR 10284 at commit [`456c806`](https://github.com/apache/spark/commit/456c806eee6577fcb2721879613f20351e57ff14). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47452553 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1097,6 +1118,42 @@ class Analyzer( } } } + + /** + * Check and add proper window frames for all window functions. + */ + object ResolveWindowFrame extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case logical: LogicalPlan => logical transformExpressions { +case WindowExpression(wf: WindowFunction, +WindowSpecDefinition(_, _, f: SpecifiedWindowFrame)) + if wf.frame != UnspecifiedFrame && wf.frame != f => + failAnalysis(s"Window Frame $f must match the required frame ${wf.frame}") +case WindowExpression(wf: WindowFunction, +s @ WindowSpecDefinition(_, o, UnspecifiedFrame)) + if wf.frame != UnspecifiedFrame => + WindowExpression(wf, s.copy(frameSpecification = wf.frame)) +case we @ WindowExpression(e, s @ WindowSpecDefinition(_, o, UnspecifiedFrame)) => + val frame = SpecifiedWindowFrame.defaultWindowFrame(o.nonEmpty, acceptWindowFrame = true) + we.copy(windowSpec = s.copy(frameSpecification = frame)) + } +} + } + + /** +* Check and add order to [[AggregateWindowFunction]]s. +*/ + object ResolveWindowOrder extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case logical: LogicalPlan => logical transformExpressions { +case WindowExpression(wf: WindowFunction, spec) if spec.orderSpec.isEmpty => + failAnalysis(s"WindowFunction $wf requires window to be ordered") --- End diff -- Is it required? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47453068 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47453214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -736,15 +691,148 @@ private[execution] final class UnboundedFollowingWindowFunctionFrame( // Only recalculate and update when the buffer changes. if (bufferUpdated) { - evaluatePrepared(buffer, inputIndex, buffer.length) - fill(target, outputIndex) + processor.initialize(input.size) + processor.update(input, inputIndex, input.size) + processor.evaluate(target) } // Move to the next row. outputIndex += 1 } +} + +/** + * This class prepares and manages the processing of a number of aggregate functions. + * + * This implementation only supports evaluation in [[Complete]] mode. This is enough for + * Window processing. + * + * Processing of distinct aggregates is currently not supported. + * + * The implementation is split into an object which takes care of construction, and a the actual + * processor class. Construction might be expensive and could be separated into a 'driver' and a + * 'executor' part. + */ +private[execution] object AggregateProcessor { + def apply(functions: Array[Expression], +ordinal: Int, +inputAttributes: Seq[Attribute], +newMutableProjection: (Seq[Expression], Seq[Attribute]) => () => MutableProjection): --- End diff -- format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164313814 @hvanhovell How about we merge this first and we take a look at how to use a single rule to handle aggregation queries with distinct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47451906 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47452123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame +} - def reset(): Unit +/** + * An offset window function is a window function that returns the value of the input column offset + * by a number of rows within the partition. For instance: an OffsetWindowfunction for value x with + * offset -2, will get the value of x 2 rows back in the partition. + */ +abstract class OffsetWindowFunction + extends Expression with WindowFunction with Unevaluable with ImplicitCastInputTypes { + val input: Expression + val default: Expression + val offset: Expression + val offsetSign: Int + + override def children: Seq[Expression] = Seq(input, offset, default) - def prepareInputParameters(input: InternalRow): AnyRef + override def foldable: Boolean = input.foldable && (default == null || default.foldable) - def update(input: AnyRef): Unit + override def nullable: Boolean = input.nullable && (default == null || default.nullable) - def batchUpdate(inputs: Array[AnyRef]): Unit + override lazy val frame = { +// This will be triggered by the Analyzer. +val offsetValue = offset.eval() match { + case o: Int => o + case x => throw new AnalysisException( +s"Offset expression must be a foldable integer expression: $x") +} +val boundary = ValueFollowing(offsetSign * offsetValue) +SpecifiedWindowFrame(RowFrame, boundary, boundary) + } - def evaluate(): Unit + override def dataType: DataType = input.dataType - def get(index: Int): Any + override def inputTypes: Seq[AbstractDataType] = +Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType)) - def newInstance(): WindowFunction + override def toString: String = s"$prettyName($input, $offset, $default)" } -case class UnresolvedWindowFunction( -name: String, -children: Seq[Expression]) - extends Expression with WindowFunction with Unevaluable { +case class Lead(input: Expression, offset: Expression, default: Expression) +extends OffsetWindowFunction { - override def dataType: DataType = throw new UnresolvedException(this, "dataType") - override def foldable: Boolean = throw new UnresolvedException(this, "foldable") - override def nullable: Boolean = throw new UnresolvedException(this, "nullable") - override lazy val resolved = false + def this(input: Expression, offset: Expression) = this(input, offset, Literal(null)) - override def init(): Unit = throw new UnresolvedException(this, "init") - override def reset(): Unit = throw new UnresolvedException(this, "reset") - override def prepareInputParameters(input: InternalRow): AnyRef = -throw new UnresolvedException(this, "prepareInputParameters") - override def update(input: AnyRef): Unit = throw new UnresolvedException(this, "update") - override def batchUpdate(inputs: Array[AnyRef]): Unit = -throw new UnresolvedException(this, "batchUpdate") - override
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-164310795 Do we have a test case that uses a UDAF as window function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47453065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame --- End diff -- Actually, is it used as the default frame? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47453040 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object SpecifiedWindowFrame { } } +case class UnresolvedWindowExpression( +child: Expression, +windowSpec: WindowSpecReference) extends UnaryExpression with Unevaluable { + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") + override def foldable: Boolean = throw new UnresolvedException(this, "foldable") + override def nullable: Boolean = throw new UnresolvedException(this, "nullable") + override lazy val resolved = false +} + +case class WindowExpression( +windowFunction: Expression, +windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { + + override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil + + override def dataType: DataType = windowFunction.dataType + override def foldable: Boolean = windowFunction.foldable + override def nullable: Boolean = windowFunction.nullable + + override def toString: String = s"$windowFunction $windowSpec" +} + /** - * Every window function needs to maintain a output buffer for its output. - * It should expect that for a n-row window frame, it will be called n times - * to retrieve value corresponding with these n rows. + * A window function is a function that can only be evaluated in the context of a window operator. */ trait WindowFunction extends Expression { - def init(): Unit + /** Frame in which the window operator must be executed. */ + def frame: WindowFrame = UnspecifiedFrame --- End diff -- I guess we need to say it is also used as the default frame? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org