[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...

2015-12-13 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8431#issuecomment-164249212
  
@Agent007 does this require adding this large data file to the repo?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/10281

[SPARK-12310] [SparkR] Add write.json and write.parquet for SparkR

Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated 
```saveAsParquetFile```.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-12310

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10281


commit e19a100236d3cf6213a0268ca461d99f099ddf6b
Author: Yanbo Liang 
Date:   2015-12-13T10:59:11Z

Add write.json and write.parquet for SparkR




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12016][MLlib][PySpark] Wrap Word2VecMod...

2015-12-13 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10100#issuecomment-164245365
  
ping @davies Can you take a look of this? It should be straightforward one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...

2015-12-13 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/10280

[Minor] [ML] Rename weights to coefficients for examples/DeveloperApiExample

Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-coefficients

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10280


commit f19b748e980edbb0a13d32146437c4f38a986e13
Author: Yanbo Liang 
Date:   2015-12-13T08:39:35Z

Rename weights to coefficients for examples/DeveloperApiExample




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164251993
  
@yhuai I think having the two clearly separated paths (this PR) is an 
improvement of the current situation. I also admit that I am responsible for 
introducing the second path. Your comment on having four aggregate steps 
without the exhange triggered me, and I was thinking out loud on how we could 
do this using the rewriting rule (the removal of one of the paths would have 
been a bonus).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link

2015-12-13 Thread BenFradet
GitHub user BenFradet opened a pull request:

https://github.com/apache/spark/pull/10282

[Minor] [DOC] Fix broken word2vec link

Follow-up of 
[SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 
where a broken link has been left as is.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BenFradet/spark SPARK-12199

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10282.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10282


commit 8937801896a1b7cc350abbe89e720506b75dac56
Author: BenFradet 
Date:   2015-12-13T12:09:56Z

fix broken word2vec link




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11923][ML] Python API for ml.feature.Ch...

2015-12-13 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/10186#issuecomment-164238348
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12204][SPARKR] Implement drop method fo...

2015-12-13 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/10201#issuecomment-164320958
  
According to https://issues.apache.org/jira/browse/SPARK-6635 and 
https://issues.apache.org/jira/browse/SPARK-10073, the feature for Scala was in 
Spark 1.4.0 and python in 1.5.0. But seems both just have API updated without 
any migration guide for compatibility break. Do we need to do it specifically 
for SparkR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/10281#discussion_r47456558
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1448,23 +1457,28 @@ test_that("write.df() on DataFrame and works with 
read.parquet", {
   expect_equal(count(df), count(parquetDF))
 })
 
-test_that("read.parquet()/parquetFile() works with multiple input paths", {
+test_that("read.parquet()/parquetFile() and 
write.parquet()/saveAsParquetFile()", {
--- End diff --

How about renaming this test case to "read/write Parquet files", where 
read.parquet(), parquetFile(), read.df(), write.parquet(), saveAsParquetFile(), 
write.df() are tested to read/write json files. This test case can be merged 
with the one "write.df() as parquet file"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12275][SQL] No plan for BroadcastHint i...

2015-12-13 Thread yucai
Github user yucai commented on the pull request:

https://github.com/apache/spark/pull/10265#issuecomment-164325959
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8880#issuecomment-164327277
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12315][SQL] isnotnull operator not push...

2015-12-13 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/10287

[SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource.

`IsNotNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to 
[SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), 
[SQL:1999](http://web.cs.ualberta.ca/~yuan/courses/db_readings/ansi-iso-9075-2-1999.pdf),
 [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and 
[SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases 
support this.

In this PR, I simply added the case for `IsNotNull` filter to produce a 
proper filter string.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-12315

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10287


commit eacb4eac990ebf18b51f9269e4115b1b83a7be62
Author: hyukjinkwon 
Date:   2015-12-14T02:53:49Z

Add a case for isnotnull operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10281#discussion_r47458635
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -371,13 +371,13 @@ test_that("Collect DataFrame with complex types", {
   expect_equal(bob$height, 176.5)
 })
 
-test_that("read.json()/jsonFile() on a local file returns a DataFrame", {
+test_that("read.json()/jsonFile() and write.json()", {
   df <- read.json(sqlContext, jsonPath)
   expect_is(df, "DataFrame")
   expect_equal(count(df), 3)
   # read.json()/jsonFile() works with multiple input paths
   jsonPath2 <- tempfile(pattern="jsonPath2", fileext=".json")
-  write.df(df, jsonPath2, "json", mode="overwrite")
+  write.json(df, jsonPath2)
--- End diff --

I agree to have a test for ```write.df```, but it still exists even I 
removing this such as 
[here](https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L870).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10281#discussion_r47458647
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -614,12 +641,25 @@ setMethod("toJSON",
 #' sqlContext <- sparkRSQL.init(sc)
 #' path <- "path/to/file.json"
 #' df <- read.json(sqlContext, path)
-#' saveAsParquetFile(df, "/tmp/sparkr-tmp/")
+#' write.parquet(df, "/tmp/sparkr-tmp1/")
+#' saveAsParquetFile(df, "/tmp/sparkr-tmp2/")
 #'}
+setMethod("write.parquet",
+  signature(x = "DataFrame", path = "character"),
+  function(x, path) {
+write <- callJMethod(x@sdf, "write")
+invisible(callJMethod(write, "parquet", path))
+  })
+
+#' @family DataFrame functions
--- End diff --

Good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12186] [WEB UI] Send the complete reque...

2015-12-13 Thread mindprince
Github user mindprince commented on the pull request:

https://github.com/apache/spark/pull/10180#issuecomment-164333244
  
Can someone review this or if it seems okay, merge it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8880#issuecomment-164334418
  
**[Test build #47637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47637/consoleFull)**
 for PR 8880 at commit 
[`f88f011`](https://github.com/apache/spark/commit/f88f011081ad07377758c5b187fe50fd1c7f4791).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8880#issuecomment-164334532
  
**[Test build #47637 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47637/consoleFull)**
 for PR 8880 at commit 
[`f88f011`](https://github.com/apache/spark/commit/f88f011081ad07377758c5b187fe50fd1c7f4791).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8880#issuecomment-164334536
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12275][SQL] No plan for BroadcastHint i...

2015-12-13 Thread yucai
Github user yucai commented on the pull request:

https://github.com/apache/spark/pull/10265#issuecomment-164339684
  
@andrewor14 , @yhuai could you kindly help re-trigger testing? Much thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6518][MLlib][Example][DOC] Add example ...

2015-12-13 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/9952#issuecomment-164341602
  
Oh, I'm terribly sorry about that. Pushing the update was failed 
- Modified what you pointed out
- Add a Java example and its doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12057] [SQL] Prevent failure on corrupt...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10288#issuecomment-164343229
  
@simplyianm  @srowen  I create this PR based on #10043. Please see 
https://github.com/yhuai/spark/commit/d8722bb30d347fa97c7d358284eb433edb1be6ec 
for my change. This PR should fix the test. It also makes the schema inference 
more robust to records that we cannot parse.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12275][SQL] No plan for BroadcastHint i...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10265#issuecomment-164346090
  
**[Test build #47642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47642/consoleFull)**
 for PR 10265 at commit 
[`1b8d570`](https://github.com/apache/spark/commit/1b8d5701cf3804040748219729a092af3af1e70e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10282#issuecomment-164255724
  
**[Test build #47627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47627/consoleFull)**
 for PR 10282 at commit 
[`8937801`](https://github.com/apache/spark/commit/8937801896a1b7cc350abbe89e720506b75dac56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10280#issuecomment-164256735
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10280#issuecomment-164256737
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47625/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10280#issuecomment-164256562
  
**[Test build #47625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47625/consoleFull)**
 for PR 10280 at commit 
[`f19b748`](https://github.com/apache/spark/commit/f19b748e980edbb0a13d32146437c4f38a986e13).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164263964
  
**[Test build #47622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47622/consoleFull)**
 for PR 10228 at commit 
[`51ca055`](https://github.com/apache/spark/commit/51ca0553c1b4f3307512954858741ad89cea89f6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164264208
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164264212
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47622/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10283#issuecomment-164270890
  
**[Test build #47628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47628/consoleFull)**
 for PR 10283 at commit 
[`ca4326a`](https://github.com/apache/spark/commit/ca4326ab0c4d26af3eddcc9550624db91d195e8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10282#issuecomment-164258023
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10281#issuecomment-164258061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10281#issuecomment-164257767
  
**[Test build #47626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47626/consoleFull)**
 for PR 10281 at commit 
[`e19a100`](https://github.com/apache/spark/commit/e19a100236d3cf6213a0268ca461d99f099ddf6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10282#issuecomment-164258021
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10281#issuecomment-164258058
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8431#issuecomment-164269210
  
  [Test build #2214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2214/console)
 for   PR 8431 at commit 
[`9998df6`](https://github.com/apache/spark/commit/9998df67b1c15c7cc350ef59c10af8275b947e02).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaTwitterHashTagJoinSentiments `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [ML] Rename weights to coefficients fo...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10280#issuecomment-164255583
  
**[Test build #47625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47625/consoleFull)**
 for PR 10280 at commit 
[`f19b748`](https://github.com/apache/spark/commit/f19b748e980edbb0a13d32146437c4f38a986e13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10279#issuecomment-164255590
  
**[Test build #47624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47624/consoleFull)**
 for PR 10279 at commit 
[`c696d2b`](https://github.com/apache/spark/commit/c696d2bb14ecbfa44da9ec97cc495eff0728f2ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10281#issuecomment-164255617
  
**[Test build #47626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47626/consoleFull)**
 for PR 10281 at commit 
[`e19a100`](https://github.com/apache/spark/commit/e19a100236d3cf6213a0268ca461d99f099ddf6b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164265237
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164265233
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164264969
  
**[Test build #47623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)**
 for PR 10278 at commit 
[`50733c6`](https://github.com/apache/spark/commit/50733c6239b721ecb1f0691bb3d4680235c15a18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...

2015-12-13 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/10283

[SPARK-12293][SQL] Support UnsafeRow in LocalTableScan

JIRA: https://issues.apache.org/jira/browse/SPARK-12293

Make LocalTableScan support UnsafeRow.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 unsaferow-localtablescan

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10283


commit ca4326ab0c4d26af3eddcc9550624db91d195e8a
Author: Liang-Chi Hsieh 
Date:   2015-12-13T15:46:13Z

Support UnsafeRow in LocalTableScan.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164255399
  
**[Test build #47622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47622/consoleFull)**
 for PR 10228 at commit 
[`51ca055`](https://github.com/apache/spark/commit/51ca0553c1b4f3307512954858741ad89cea89f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link

2015-12-13 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10282#issuecomment-164252960
  
cc @jkbradley @yinxusen @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...

2015-12-13 Thread Agent007
Github user Agent007 commented on the pull request:

https://github.com/apache/spark/pull/8431#issuecomment-164253461
  
Hi @srowen Yes, it has been discussed in this pull request. The AFINN word 
sentiment file is only 27.4 KB and is used to calculate the Twitter hash tag 
sentiments for the streaming example. At first, the file was downloaded at 
runtime as suggested by @brennonyork but @tdas wanted the file in the 
spark/data directory instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [DOC] Fix broken word2vec link

2015-12-13 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/10282#issuecomment-164253470
  
LGTM thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164255495
  
**[Test build #47623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)**
 for PR 10278 at commit 
[`50733c6`](https://github.com/apache/spark/commit/50733c6239b721ecb1f0691bb3d4680235c15a18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread reggert
Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47446916
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

Oh, is that all? :-)

Comment line removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9057] [STREAMING] Twitter example joini...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8431#issuecomment-164255270
  
  [Test build #2214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2214/consoleFull)
 for   PR 8431 at commit 
[`9998df6`](https://github.com/apache/spark/commit/9998df67b1c15c7cc350ef59c10af8275b947e02).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/10228#discussion_r47444204
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
 ---
@@ -165,237 +137,100 @@ abstract class AggregationIterator(
 
   // Initializing functions used to process a row.
   protected val processRow: (MutableRow, InternalRow) => Unit = {
-val rowToBeProcessed = new JoinedRow
-val aggregationBufferSchema = 
allAggregateFunctions.flatMap(_.aggBufferAttributes)
-aggregationMode match {
-  // Partial-only
-  case (Some(Partial), None) =>
-val updateExpressions = nonCompleteAggregateFunctions.flatMap {
-  case ae: DeclarativeAggregate => ae.updateExpressions
-  case agg: AggregateFunction => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
-}
-val expressionAggUpdateProjection =
-  newMutableProjection(updateExpressions, aggregationBufferSchema 
++ valueAttributes)()
-
-(currentBuffer: MutableRow, row: InternalRow) => {
-  expressionAggUpdateProjection.target(currentBuffer)
-  // Process all expression-based aggregate functions.
-  expressionAggUpdateProjection(rowToBeProcessed(currentBuffer, 
row))
-  // Process all imperative aggregate functions.
-  var i = 0
-  while (i < nonCompleteImperativeAggregateFunctions.length) {
-
nonCompleteImperativeAggregateFunctions(i).update(currentBuffer, row)
-i += 1
-  }
-}
-
-  // PartialMerge-only or Final-only
-  case (Some(PartialMerge), None) | (Some(Final), None) =>
-val inputAggregationBufferSchema = if (initialInputBufferOffset == 
0) {
-  // If initialInputBufferOffset, the input value does not contain
-  // grouping keys.
-  // This part is pretty hacky.
-  allAggregateFunctions.flatMap(_.inputAggBufferAttributes).toSeq
-} else {
-  groupingKeyAttributes ++ 
allAggregateFunctions.flatMap(_.inputAggBufferAttributes)
-}
-// val inputAggregationBufferSchema =
-//  groupingKeyAttributes ++
-//allAggregateFunctions.flatMap(_.cloneBufferAttributes)
-val mergeExpressions = nonCompleteAggregateFunctions.flatMap {
-  case ae: DeclarativeAggregate => ae.mergeExpressions
-  case agg: AggregateFunction => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
-}
-// This projection is used to merge buffer values for all 
expression-based aggregates.
-val expressionAggMergeProjection =
-  newMutableProjection(
-mergeExpressions,
-aggregationBufferSchema ++ inputAggregationBufferSchema)()
-
-(currentBuffer: MutableRow, row: InternalRow) => {
-  // Process all expression-based aggregate functions.
-  
expressionAggMergeProjection.target(currentBuffer)(rowToBeProcessed(currentBuffer,
 row))
-  // Process all imperative aggregate functions.
-  var i = 0
-  while (i < nonCompleteImperativeAggregateFunctions.length) {
-
nonCompleteImperativeAggregateFunctions(i).merge(currentBuffer, row)
-i += 1
+val joinedRow = new JoinedRow
+if (aggregateExpressions.nonEmpty) {
+  val mergeExpressions = aggregateFunctions.zipWithIndex.flatMap {
+case (ae: DeclarativeAggregate, i) =>
+  aggregateExpressions(i).mode match {
+case Partial | Complete => ae.updateExpressions
+case PartialMerge | Final => ae.mergeExpressions
   }
-}
-
-  // Final-Complete
-  case (Some(Final), Some(Complete)) =>
-val completeAggregateFunctions: Array[AggregateFunction] =
-  
allAggregateFunctions.takeRight(completeAggregateExpressions.length)
-// All imperative aggregate functions with mode Complete.
-val completeImperativeAggregateFunctions: 
Array[ImperativeAggregate] =
-  completeAggregateFunctions.collect { case func: 
ImperativeAggregate => func }
-
-// The first initialInputBufferOffset values of the input 
aggregation buffer is
-// for grouping expressions and distinct columns.
-val groupingAttributesAndDistinctColumns = 
valueAttributes.take(initialInputBufferOffset)
-
-val completeOffsetExpressions =
-  
Seq.fill(completeAggregateFunctions.map(_.aggBufferAttributes.length).sum)(NoOp)
-// We do not touch buffer values of aggregate functions with the 
Final mode.
-val finalOffsetExpressions =
-  

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/10228#discussion_r47444202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
 ---
@@ -165,237 +137,100 @@ abstract class AggregationIterator(
 
   // Initializing functions used to process a row.
   protected val processRow: (MutableRow, InternalRow) => Unit = {
-val rowToBeProcessed = new JoinedRow
-val aggregationBufferSchema = 
allAggregateFunctions.flatMap(_.aggBufferAttributes)
-aggregationMode match {
-  // Partial-only
-  case (Some(Partial), None) =>
-val updateExpressions = nonCompleteAggregateFunctions.flatMap {
-  case ae: DeclarativeAggregate => ae.updateExpressions
-  case agg: AggregateFunction => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
-}
-val expressionAggUpdateProjection =
-  newMutableProjection(updateExpressions, aggregationBufferSchema 
++ valueAttributes)()
-
-(currentBuffer: MutableRow, row: InternalRow) => {
-  expressionAggUpdateProjection.target(currentBuffer)
-  // Process all expression-based aggregate functions.
-  expressionAggUpdateProjection(rowToBeProcessed(currentBuffer, 
row))
-  // Process all imperative aggregate functions.
-  var i = 0
-  while (i < nonCompleteImperativeAggregateFunctions.length) {
-
nonCompleteImperativeAggregateFunctions(i).update(currentBuffer, row)
-i += 1
-  }
-}
-
-  // PartialMerge-only or Final-only
-  case (Some(PartialMerge), None) | (Some(Final), None) =>
-val inputAggregationBufferSchema = if (initialInputBufferOffset == 
0) {
-  // If initialInputBufferOffset, the input value does not contain
-  // grouping keys.
-  // This part is pretty hacky.
-  allAggregateFunctions.flatMap(_.inputAggBufferAttributes).toSeq
-} else {
-  groupingKeyAttributes ++ 
allAggregateFunctions.flatMap(_.inputAggBufferAttributes)
-}
-// val inputAggregationBufferSchema =
-//  groupingKeyAttributes ++
-//allAggregateFunctions.flatMap(_.cloneBufferAttributes)
-val mergeExpressions = nonCompleteAggregateFunctions.flatMap {
-  case ae: DeclarativeAggregate => ae.mergeExpressions
-  case agg: AggregateFunction => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
-}
-// This projection is used to merge buffer values for all 
expression-based aggregates.
-val expressionAggMergeProjection =
-  newMutableProjection(
-mergeExpressions,
-aggregationBufferSchema ++ inputAggregationBufferSchema)()
-
-(currentBuffer: MutableRow, row: InternalRow) => {
-  // Process all expression-based aggregate functions.
-  
expressionAggMergeProjection.target(currentBuffer)(rowToBeProcessed(currentBuffer,
 row))
-  // Process all imperative aggregate functions.
-  var i = 0
-  while (i < nonCompleteImperativeAggregateFunctions.length) {
-
nonCompleteImperativeAggregateFunctions(i).merge(currentBuffer, row)
-i += 1
+val joinedRow = new JoinedRow
+if (aggregateExpressions.nonEmpty) {
+  val mergeExpressions = aggregateFunctions.zipWithIndex.flatMap {
--- End diff --

```zip(aggregateExpressions)``` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10279#issuecomment-164260918
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10279#issuecomment-164260893
  
**[Test build #47624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47624/consoleFull)**
 for PR 10279 at commit 
[`c696d2b`](https://github.com/apache/spark/commit/c696d2bb14ecbfa44da9ec97cc495eff0728f2ac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12309] [ML] Use sqlContext from MLlibTe...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10279#issuecomment-164260919
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47624/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10283#issuecomment-164272319
  
**[Test build #47628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47628/consoleFull)**
 for PR 10283 at commit 
[`ca4326a`](https://github.com/apache/spark/commit/ca4326ab0c4d26af3eddcc9550624db91d195e8a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`case class LocalRelation(output: Seq[Attribute], data: Seq[UnsafeRow] = Nil)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10283#issuecomment-164272329
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47628/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10283#issuecomment-164272328
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164276707
  
**[Test build #47629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47629/consoleFull)**
 for PR 9264 at commit 
[`539ac43`](https://github.com/apache/spark/commit/539ac43c3be54abb61b5a44450cc13cc196113b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/10281#discussion_r47449369
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -454,6 +454,8 @@ test_that("insertInto() on a registered table", {
   expect_equal(count(sql(sqlContext, "select * from table1")), 2)
   expect_equal(first(sql(sqlContext, "select * from table1 order by 
age"))$name, "Bob")
   dropTempTable(sqlContext, "table1")
+
+  unlink(parquetPath2)
--- End diff --

should there be `unlink(jsonPath2)` as well? if so could you please add that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12204][SPARKR] Implement drop method fo...

2015-12-13 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/10201#issuecomment-164294560
  
@shivaram I checked but release notes and programming guide/migration guide 
and I don't see referencing to withColumn for Spark 1.4.0 or 1.4.1. Perhaps the 
behavior change happened before the 1.4.0 release?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9819#issuecomment-164284282
  
@hvanhovell just a quick heads up. I am going to review this PR today. Will 
post a comment once I finish my review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164284205
  
@hvanhovell Yea, it's great to think about how to use a single rule to 
handle aggregation queries with distinct after we have this improvement. The 
logical rewriter rules probably is a good place because rewriting logical plans 
is easier. If it is the right approach, we can make some changes to our 
physical planner to make it respect the aggregation mode of an agg expression 
in a logical agg operator (right now, our physical planner always ignore the 
mode). So, when we create physical plan, we can understand that, for example, a 
logical agg operator is used to merge aggregation buffers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164285868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47449872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
--- End diff --

Add scala doc for `default`, `offset`, and `offsetSign`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47450253
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47450842
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
--- End diff --

I guess the results still depend on the number of rows of that partition. I 
tried a few queries in postgres
```
yhuai=# select lead(1, 2) over();
 lead 
--
 
(1 row)

yhuai=# select lead(1, 2) over() from (select 100 as k union all select 99 
as k) tmp;
 lead 
--
 
 
(2 rows)

yhuai=# select lead(1, 2) over() from (select 100 as k union all select 99 
as k union all select 98 as k) tmp;
 lead 
--
1
 
 
(3 rows)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164285867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164285830
  
**[Test build #47629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47629/consoleFull)**
 for PR 9264 at commit 
[`539ac43`](https://github.com/apache/spark/commit/539ac43c3be54abb61b5a44450cc13cc196113b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`trait JobSubmitter `\n  * `class ComplexFutureAction[T](run : JobSubmitter => 
Future[T])`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/10281#discussion_r47449353
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -371,13 +371,13 @@ test_that("Collect DataFrame with complex types", {
   expect_equal(bob$height, 176.5)
 })
 
-test_that("read.json()/jsonFile() on a local file returns a DataFrame", {
+test_that("read.json()/jsonFile() and write.json()", {
   df <- read.json(sqlContext, jsonPath)
   expect_is(df, "DataFrame")
   expect_equal(count(df), 3)
   # read.json()/jsonFile() works with multiple input paths
   jsonPath2 <- tempfile(pattern="jsonPath2", fileext=".json")
-  write.df(df, jsonPath2, "json", mode="overwrite")
+  write.json(df, jsonPath2)
--- End diff --

I think it actually make sense to have a test for `write.df` with 
additional option (in this case `mode="overwrite"`) - is there a reason for 
removing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12310] [SparkR] Add write.json and writ...

2015-12-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/10281#discussion_r47449336
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -614,12 +641,25 @@ setMethod("toJSON",
 #' sqlContext <- sparkRSQL.init(sc)
 #' path <- "path/to/file.json"
 #' df <- read.json(sqlContext, path)
-#' saveAsParquetFile(df, "/tmp/sparkr-tmp/")
+#' write.parquet(df, "/tmp/sparkr-tmp1/")
+#' saveAsParquetFile(df, "/tmp/sparkr-tmp2/")
 #'}
+setMethod("write.parquet",
+  signature(x = "DataFrame", path = "character"),
+  function(x, path) {
+write <- callJMethod(x@sdf, "write")
+invisible(callJMethod(write, "parquet", path))
+  })
+
+#' @family DataFrame functions
--- End diff --

putting @family here would create multiple "See Also" section in the 
resulting html page, we would only need one for all those with the same @rdname


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47449863
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47449859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47449589
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
--- End diff --

Do we expect a `OffsetWindowFunction` be foldable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47450230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
--- End diff --

It can be foldable, for instance: ```LEAD(1, 2) OVER (PARTITION BY ... 
ORDER BY ...)``` should always return ```2```.

However there is currently no Optimizer rule to eliminate this. Should I 
add one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8880#issuecomment-164318656
  
**[Test build #47632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47632/consoleFull)**
 for PR 8880 at commit 
[`b912465`](https://github.com/apache/spark/commit/b912465481623fb50068b8f29622991616233cf4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47451483
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
--- End diff --

Ahhh... Yes, you have a point. Without a foldable default value it is 
definately not foldable. Lets make it non-foldable for now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47452110
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47452210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...

2015-12-13 Thread BryanCutler
GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/10284

[SPARK-12062] [CORE]  Change Master to asyc rebuild UI when application 
completes

This change builds the event history of completed apps asynchronously so 
the RPC thread will not be blocked and allow new workers to register/remove if 
the event log history is very large and takes a long time to rebuild.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark async-MasterUI-SPARK-12062

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10284.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10284


commit fe67dd79e86c635ccecde7aa9ca288d159cf474f
Author: Bryan Cutler 
Date:   2015-12-08T02:36:46Z

[SPARK-12062] Changed Master rebuildSparkUI to run async from the RPC thread

commit b748dc10b6c47ea88794a5d596c44e77266607c3
Author: Bryan Cutler 
Date:   2015-12-08T18:51:27Z

rebuildUI thread was not being shutdown

commit 7d60de1728d46c75c47d7fec77c898e93fcc5e52
Author: Bryan Cutler 
Date:   2015-12-08T20:16:43Z

remove line that was accidentally left in for testing

commit af20e77940c33044dc8f7075e941219e737bb744
Author: Bryan Cutler 
Date:   2015-12-12T00:24:50Z

cleanup of log file opening logic

commit 54089aa1c1dc73b8f09e54973dd7ef77f361547c
Author: Bryan Cutler 
Date:   2015-12-12T00:47:36Z

minor cleanup

commit 80076374068e58fc9d89cf78c3fa90bfeb23ecb6
Author: Bryan Cutler 
Date:   2015-12-13T21:52:06Z

changed catching Exception to NonFatal, which is better

commit 456c806eee6577fcb2721879613f20351e57ff14
Author: Bryan Cutler 
Date:   2015-12-13T23:10:58Z

fixed indentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47452245
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala
 ---
@@ -472,7 +475,7 @@ class HiveWindowFunctionQuerySuite extends 
HiveComparisonTest with BeforeAndAfte
   |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
   | rows between 2 preceding and 2 following)
 """.stripMargin, reset = false)
-
+  */
--- End diff --

Instead of disabling it, how about we convert it to a regular `test` (it 
will not use Hive for the golden answers and we can use `checkAnswer`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...

2015-12-13 Thread BryanCutler
Github user BryanCutler commented on the pull request:

https://github.com/apache/spark/pull/10284#issuecomment-164309595
  
I tested this by making an artificially large event log file, then tried to 
stop the worker and make sure it could re-register before rebuilding the UI was 
completed, which worked fine for me on a local cluster at least.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47452622
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala ---
@@ -19,10 +19,13 @@ package org.apache.spark.sql.execution
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.types.IntegerType
 import org.apache.spark.rdd.RDD
-import org.apache.spark.util.collection.CompactBuffer
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
--- End diff --

order imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9819#issuecomment-164312709
  
Can you add scala doc to explain how we evaluate an regular agg function 
when it is used as a window function? (Maybe I missed it)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9819#issuecomment-164313553
  
@hvanhovell This is very cool! I have finished my review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10284#issuecomment-164315793
  
**[Test build #47630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47630/consoleFull)**
 for PR 10284 at commit 
[`456c806`](https://github.com/apache/spark/commit/456c806eee6577fcb2721879613f20351e57ff14).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  
case class AttachCompletedRebuildUI(appId: String)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10284#issuecomment-164315823
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47630/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10284#issuecomment-164315821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

2015-12-13 Thread winningsix
Github user winningsix commented on the pull request:

https://github.com/apache/spark/pull/8880#issuecomment-164317758
  
>>we have to do it in the level of Yarn module since we need to obtain the 
security key from the credentials stored in user group information

>Yeah, I see... I was going to suggest moving the implementation of 
YarnSparkHadoopUtil.getCurrentUserCredentials to core, but hadoop 1 doesn't 
have the needed API. So we have to wait for SPARK-11807 to do that. Let's leave 
the test in the yarn module in the meantime and file a bug (blocked by 
SPARK-11807) to move it to the right place later. Can you do that?

Ticket is filed in https://issues.apache.org/jira/browse/SPARK-12278


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12281][Core]Fix a race condition when r...

2015-12-13 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/10269#issuecomment-164319280
  
Thanks @zsxwing for finding out this wrong state transition issue. For now 
I think `assert` should be OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12062] [CORE] Change Master to asyc reb...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10284#issuecomment-164310014
  
**[Test build #47630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47630/consoleFull)**
 for PR 10284 at commit 
[`456c806`](https://github.com/apache/spark/commit/456c806eee6577fcb2721879613f20351e57ff14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47452553
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1097,6 +1118,42 @@ class Analyzer(
   }
 }
   }
+
+  /**
+   * Check and add proper window frames for all window functions.
+   */
+  object ResolveWindowFrame extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformExpressions {
+case WindowExpression(wf: WindowFunction,
+WindowSpecDefinition(_, _, f: SpecifiedWindowFrame))
+  if wf.frame != UnspecifiedFrame && wf.frame != f =>
+  failAnalysis(s"Window Frame $f must match the required frame 
${wf.frame}")
+case WindowExpression(wf: WindowFunction,
+s @ WindowSpecDefinition(_, o, UnspecifiedFrame))
+  if wf.frame != UnspecifiedFrame =>
+  WindowExpression(wf, s.copy(frameSpecification = wf.frame))
+case we @ WindowExpression(e, s @ WindowSpecDefinition(_, o, 
UnspecifiedFrame)) =>
+  val frame = SpecifiedWindowFrame.defaultWindowFrame(o.nonEmpty, 
acceptWindowFrame = true)
+  we.copy(windowSpec = s.copy(frameSpecification = frame))
+  }
+}
+  }
+
+  /**
+* Check and add order to [[AggregateWindowFunction]]s.
+*/
+  object ResolveWindowOrder extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformExpressions {
+case WindowExpression(wf: WindowFunction, spec) if 
spec.orderSpec.isEmpty =>
+  failAnalysis(s"WindowFunction $wf requires window to be ordered")
--- End diff --

Is it required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47453068
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47453214
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala ---
@@ -736,15 +691,148 @@ private[execution] final class 
UnboundedFollowingWindowFunctionFrame(
 
 // Only recalculate and update when the buffer changes.
 if (bufferUpdated) {
-  evaluatePrepared(buffer, inputIndex, buffer.length)
-  fill(target, outputIndex)
+  processor.initialize(input.size)
+  processor.update(input, inputIndex, input.size)
+  processor.evaluate(target)
 }
 
 // Move to the next row.
 outputIndex += 1
   }
+}
+
+/**
+ * This class prepares and manages the processing of a number of aggregate 
functions.
+ *
+ * This implementation only supports evaluation in [[Complete]] mode. This 
is enough for
+ * Window processing.
+ *
+ * Processing of distinct aggregates is currently not supported.
+ *
+ * The implementation is split into an object which takes care of 
construction, and a the actual
+ * processor class. Construction might be expensive and could be separated 
into a 'driver' and a
+ * 'executor' part.
+ */
+private[execution] object AggregateProcessor {
+  def apply(functions: Array[Expression],
+ordinal: Int,
+inputAttributes: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => () => 
MutableProjection):
--- End diff --

format


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-164313814
  
@hvanhovell How about we merge this first and we take a look at how to use 
a single rule to handle aggregation queries with distinct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47451906
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47452123
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
+}
 
-  def reset(): Unit
+/**
+ * An offset window function is a window function that returns the value 
of the input column offset
+ * by a number of rows within the partition. For instance: an 
OffsetWindowfunction for value x with
+ * offset -2, will get the value of x 2 rows back in the partition.
+ */
+abstract class OffsetWindowFunction
+  extends Expression with WindowFunction with Unevaluable with 
ImplicitCastInputTypes {
+  val input: Expression
+  val default: Expression
+  val offset: Expression
+  val offsetSign: Int
+
+  override def children: Seq[Expression] = Seq(input, offset, default)
 
-  def prepareInputParameters(input: InternalRow): AnyRef
+  override def foldable: Boolean = input.foldable && (default == null || 
default.foldable)
 
-  def update(input: AnyRef): Unit
+  override def nullable: Boolean = input.nullable && (default == null || 
default.nullable)
 
-  def batchUpdate(inputs: Array[AnyRef]): Unit
+  override lazy val frame = {
+// This will be triggered by the Analyzer.
+val offsetValue = offset.eval() match {
+  case o: Int => o
+  case x => throw new AnalysisException(
+s"Offset expression must be a foldable integer expression: $x")
+}
+val boundary = ValueFollowing(offsetSign * offsetValue)
+SpecifiedWindowFrame(RowFrame, boundary, boundary)
+  }
 
-  def evaluate(): Unit
+  override def dataType: DataType = input.dataType
 
-  def get(index: Int): Any
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(AnyDataType, IntegerType, TypeCollection(input.dataType, NullType))
 
-  def newInstance(): WindowFunction
+  override def toString: String = s"$prettyName($input, $offset, $default)"
 }
 
-case class UnresolvedWindowFunction(
-name: String,
-children: Seq[Expression])
-  extends Expression with WindowFunction with Unevaluable {
+case class Lead(input: Expression, offset: Expression, default: Expression)
+extends OffsetWindowFunction {
 
-  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
-  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
-  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override lazy val resolved = false
+  def this(input: Expression, offset: Expression) = this(input, offset, 
Literal(null))
 
-  override def init(): Unit = throw new UnresolvedException(this, "init")
-  override def reset(): Unit = throw new UnresolvedException(this, "reset")
-  override def prepareInputParameters(input: InternalRow): AnyRef =
-throw new UnresolvedException(this, "prepareInputParameters")
-  override def update(input: AnyRef): Unit = throw new 
UnresolvedException(this, "update")
-  override def batchUpdate(inputs: Array[AnyRef]): Unit =
-throw new UnresolvedException(this, "batchUpdate")
-  override 

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9819#issuecomment-164310795
  
Do we have a test case that uses a UDAF as window function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47453065
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
--- End diff --

Actually, is it used as the default frame?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9819#discussion_r47453040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -246,85 +260,238 @@ object SpecifiedWindowFrame {
   }
 }
 
+case class UnresolvedWindowExpression(
+child: Expression,
+windowSpec: WindowSpecReference) extends UnaryExpression with 
Unevaluable {
+
+  override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
+  override def foldable: Boolean = throw new UnresolvedException(this, 
"foldable")
+  override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
+  override lazy val resolved = false
+}
+
+case class WindowExpression(
+windowFunction: Expression,
+windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {
+
+  override def children: Seq[Expression] = windowFunction :: windowSpec :: 
Nil
+
+  override def dataType: DataType = windowFunction.dataType
+  override def foldable: Boolean = windowFunction.foldable
+  override def nullable: Boolean = windowFunction.nullable
+
+  override def toString: String = s"$windowFunction $windowSpec"
+}
+
 /**
- * Every window function needs to maintain a output buffer for its output.
- * It should expect that for a n-row window frame, it will be called n 
times
- * to retrieve value corresponding with these n rows.
+ * A window function is a function that can only be evaluated in the 
context of a window operator.
  */
 trait WindowFunction extends Expression {
-  def init(): Unit
+  /** Frame in which the window operator must be executed. */
+  def frame: WindowFrame = UnspecifiedFrame
--- End diff --

I guess we need to say it is also used as the default frame?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >