date:20151229

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10488#issuecomment-167834296
  
Can you update the pr? Once it passes jenkins, I will merge it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167842481
  
**[Test build #48417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48417/consoleFull)**
 for PR 10509 at commit 
[`cb60ba0`](https://github.com/apache/spark/commit/cb60ba045ff6663ed83c308b2423bdb87152a092).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167847055
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48414/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10453#issuecomment-167848381
  
I have a few comments on phrasing but otherwise it lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12079][BUILD][SQL] Run Catalyst subproj...

2015-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10077#issuecomment-167848248
  
Closing this for now; this is blocked on an investigation into custom log 
appenders in tests in order to fix the log interleaving problems, as well as an 
investigation into the build hang issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48559339
  
--- Diff: docs/streaming-programming-guide.md ---
@@ -2029,6 +2029,11 @@ If the data is being received by the receivers 
faster than what can be processed
 you can limit the rate by setting the [configuration 
parameter](configuration.html#spark-streaming)
 `spark.streaming.receiver.maxRate`.
 
+If using S3 for checkpointing, please remember to enable 
`spark.streaming.driver.writeAheadLog.closeFileAfterWrite`
+and `spark.streaming.receiver.writeAheadLog.closeFileAfterWrite`. You can 
also enable
+`spark.streaming.driver.writeAheadLog.allowBatching` to improve the 
performance of writing write
+ahead logs in driver. See [Spark Streaming 
Configuration](configuration.html#spark-streaming) or more details.
--- End diff --

`on the driver` and `for more details`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12079][BUILD][SQL] Run Catalyst subproj...

2015-12-29 Thread JoshRosen

Github user JoshRosen closed the pull request at:

https://github.com/apache/spark/pull/10077


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48559275
  
--- Diff: docs/configuration.md ---
@@ -1600,6 +1600,33 @@ Apart from these, the following properties are also 
available, and may be useful
 How many batches the Spark Streaming UI and status APIs remember 
before garbage collecting.
   
 
+
+  
spark.streaming.driver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
driver. Because S3 doesn't
+support flushing of data, when using S3 for checkpointing, you should 
enable it to achieve read
+after write consistency.
+  
+
+
+  
spark.streaming.receiver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
receivers. Because S3
+doesn't support flushing of data, when using S3 for checkpointing, you 
should enable it to
+achieve read after write consistency.
+  
+
+
+  spark.streaming.driver.writeAheadLog.allowBatching
+  false
+  
+Whether to batch write ahead logs in driver to write. When using S3 
for checkpointing, write
+operations in driver usually take too long. Enable batching write 
ahead logs will improve
+the performance of writing.
--- End diff --

I'd say `will improve the performance of write operations`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10498#issuecomment-167823882
  
**[Test build #48415 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48415/consoleFull)**
 for PR 10498 at commit 
[`a9dc997`](https://github.com/apache/spark/commit/a9dc99722bfea886c6381abbd2e1e9366fcf9064).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...

2015-12-29 Thread radekg

Github user radekg commented on the pull request:

https://github.com/apache/spark/pull/9608#issuecomment-167826085
  
I have 3 tests failing locally but I don't think these are related to my 
changes. `scalastyle` seems to be ok now. Failing tests:

```
- launch simple application with spark-submit *** FAILED ***
  Process returned with exit code 1. See the log4j logs for more detail. 
(SparkSubmitSuite.scala:583)
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/io/Serializable.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Object.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/String.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/io/Serializable.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation': 
class file for jdk.Profile+Annotation not found

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Object.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/String.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Override.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Annotation.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Target.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/ElementType.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Retention.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.
warning: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/RetentionPolicy.class):
 major version 52 is newer than 51, the highest major version supported by this 
compiler.
  It is recommended that the compiler be upgraded.

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Override.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Annotation.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Target.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/ElementType.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Retention.class):
 warning: Cannot find annotation method 'value()' in type 'Profile+Annotation'

/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/RetentionPolicy.class):
 warning: Cannot find annotation method 'value()' in type

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread kiszk

Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/10488#issuecomment-167848836
  
@yhuai, do you mean that I would update all of the string concatenation in 
@ExpressionDescription by using multi-line string literals rather than only the 
original one?
If so, I will do this update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-167820998
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-167821001
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48413/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10498#issuecomment-167843318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48415/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10498#issuecomment-167843317
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10498#issuecomment-167843065
  
**[Test build #48415 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48415/consoleFull)**
 for PR 10498 at commit 
[`a9dc997`](https://github.com/apache/spark/commit/a9dc99722bfea886c6381abbd2e1e9366fcf9064).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class BucketSpec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167848994
  
**[Test build #48420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48420/consoleFull)**
 for PR 10509 at commit 
[`feee2ba`](https://github.com/apache/spark/commit/feee2ba1fc4ecf649328604b5cc29e972d0f4ae9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7874][MESOS] Don’t allocate more than...

2015-12-29 Thread blbradley

Github user blbradley commented on the pull request:

https://github.com/apache/spark/pull/9027#issuecomment-167848940
  
@dragos Where can you see that fine-grained mode is slated for removal? All 
I see is #9795.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12554][Core]Standalone scheduler hangs ...

2015-12-29 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10507#issuecomment-167856135
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10488#issuecomment-167834026
  
@kiszk Thank you for the investigation. Yeah, let's use multi-line string 
literals. If we have to have a line with more than 100 characters, let's use 
`// scalastyle:off line.size.limit` and `// scalastyle:on line.size.limit` to 
just bypass the line number requirement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167847328
  
**[Test build #48419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48419/consoleFull)**
 for PR 10509 at commit 
[`feee2ba`](https://github.com/apache/spark/commit/feee2ba1fc4ecf649328604b5cc29e972d0f4ae9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48559444
  
--- Diff: docs/configuration.md ---
@@ -1600,6 +1600,33 @@ Apart from these, the following properties are also 
available, and may be useful
 How many batches the Spark Streaming UI and status APIs remember 
before garbage collecting.
   
 
+
+  
spark.streaming.driver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
driver. Because S3 doesn't
+support flushing of data, when using S3 for checkpointing, you should 
enable it to achieve read
+after write consistency.
+  
+
+
+  
spark.streaming.receiver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
receivers. Because S3
+doesn't support flushing of data, when using S3 for checkpointing, you 
should enable it to
+achieve read after write consistency.
+  
+
+
+  spark.streaming.driver.writeAheadLog.allowBatching
+  false
--- End diff --

for me: the default value is `true`.

That's why I want to expose this one since the behavior is different from 
1.5.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167850051
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167850043
  
**[Test build #48420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48420/consoleFull)**
 for PR 10509 at commit 
[`feee2ba`](https://github.com/apache/spark/commit/feee2ba1fc4ecf649328604b5cc29e972d0f4ae9).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class LZ4BlockInputStream extends FilterInputStream `
  * `class JavaWordBlacklist `
  * `class JavaDroppedWordsCounter `
  * `case class AssertNotNull(`
  * `  * Abstract class all optimizers should inherit of, contains the 
standard batches (extending`
  * `abstract class Optimizer extends RuleExecutor[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167850053
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48420/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10488


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12548][Build] Add more exceptions to Gu...

2015-12-29 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10442#issuecomment-167855574
  
Also, I just noticed this is opened against the wrong branch. Please close 
this PR and re-open it against master if we do decide to continue work on this 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-29 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48548650
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
+
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
+import com.google.common.cache.LoadingCache
+import com.google.common.util.concurrent.UncheckedExecutionException
+import org.eclipse.jetty.servlet.ServletContextHandler
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => 
AttemptInfo, ApplicationInfo}
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.{Clock, ManualClock, Utils}
+import org.apache.spark.{Logging, SparkFunSuite}
+
+class ApplicationCacheSuite extends SparkFunSuite with Logging with 
MockitoSugar with Matchers {
+
+  /**
+   * subclass with access to the cache internals
+   * @param refreshInterval interval between refreshes in milliseconds.
+   * @param retainedApplications number of retained applications
+   */
+  class TestApplicationCache(
+  operations: ApplicationCacheOperations = new StubCacheOperations(),
+  refreshInterval: Long,
+  retainedApplications: Int,
+  clock: Clock = new ManualClock(0))
+  extends ApplicationCache(operations, refreshInterval, 
retainedApplications, clock) {
+
+def cache(): LoadingCache[CacheKey, CacheEntry] = appCache
+  }
+
+  /**
+   * Stub cache operations.
+   * The state is kept in a map of [[CacheKey]] to [[CacheEntry]],
+   * the `probeTime` field in the cache entry setting the timestamp of the 
entry
+   */
+  class StubCacheOperations extends ApplicationCacheOperations with 
Logging {
+
+/** map to UI instances, including timestamps, which are used in 
update probes */
+val instances = mutable.HashMap.empty[CacheKey, CacheEntry]
+
+/** Map of attached spark UIs */
+val attached = mutable.HashMap.empty[CacheKey, SparkUI]
+
+var getAppUICount = 0L
+var attachCount = 0L
+var detachCount = 0L
+var updateProbeCount = 0L
+
+/**
+ * Get the application UI
+ * @param appId application ID
+ * @param attemptId attempt ID
+ * @return If found, the Spark UI and any history information to be 
used in the cache
+ */
+override def getAppUI(appId: String, attemptId: Option[String]): 
Option[LoadedAppUI] = {
+  logDebug(s"getAppUI($appId, $attemptId)")
+  getAppUICount += 1
+  instances.get(CacheKey(appId, attemptId)).map( e =>
+LoadedAppUI(e.ui, Some(new 
StubHistoryProviderUpdateState(e.probeTime
+}
+
+override def attachSparkUI(appId: String, attemptId: Option[String], 
ui: SparkUI,
+completed: Boolean): Unit = {
+  logDebug(s"attachSparkUI($appId, $attemptId, $ui)")
+  attachCount += 1
+  attached += (CacheKey(appId, attemptId) -> ui)
+}
+
+def putAndAttach(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = putAppUI(appId, attemptId, completed, started, ended, 
timestamp)
+  attachSparkUI(appId, attemptId, ui, completed)
+  ui
+}
+
+def putAppUI(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = newUI(appId, attemptId, completed, started, ended)
+  putInstance(appId, attemptId, ui, completed,

[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-167820776
  
**[Test build #48413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48413/consoleFull)**
 for PR 10506 at commit 
[`710f5de`](https://github.com/apache/spark/commit/710f5de578449c9f8156540bdc26b4b12d2567d5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167835640
  
@HyukjinKwon Thank you for the PR? Can you post some benchmarking results 
(with your testing code)? It will be good to have these numbers to help others 
understand if it can provide benefit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167843425
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48417/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167843416
  
**[Test build #48417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48417/consoleFull)**
 for PR 10509 at commit 
[`cb60ba0`](https://github.com/apache/spark/commit/cb60ba045ff6663ed83c308b2423bdb87152a092).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167843424
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167843520
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48558035
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit and no Limit
+// was pushed down before. This heuristic is valid assuming there does 
not exist any Limit
+// push-down rule that is unable to infer the value of maxRows. Any 
operator that a Limit can
+// be pushed passed should override this function.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Is there a reason to not check left and right separately?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167845210
  
Thanks for working on this.  I think its getting pretty close.  A few minor 
cleanups that might be nice:
 - I think we should consider pulling all the Limit rules into their own 
`LimitPushDown` rule.  The reasoning here is twofold: we can clearly comment in 
one central place the requirements with respect to implementing maxRows.  It 
will be easier to turn off if it is ever doing the wrong thing.
 - We should do a pass through and add `maxRows` to any other logical plans 
where it make sense.  Off the top of my head:
  - Filter = `child.maxRows`
  - Union = `for(leftMax <- left.maxRows; rightMax <- rightMax) yield 
Add(leftMax, rightMax)`
  - Distinct = `child.maxRows`
  - Aggregate - `child.maxRows`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48559015
  
--- Diff: docs/configuration.md ---
@@ -1600,6 +1600,33 @@ Apart from these, the following properties are also 
available, and may be useful
 How many batches the Spark Streaming UI and status APIs remember 
before garbage collecting.
   
 
+
+  
spark.streaming.driver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
driver. Because S3 doesn't
+support flushing of data, when using S3 for checkpointing, you should 
enable it to achieve read
+after write consistency.
+  
+
+
+  
spark.streaming.receiver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
receivers. Because S3
--- End diff --

same thing here: `on the receivers` instead of `in receivers`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48558980
  
--- Diff: docs/configuration.md ---
@@ -1600,6 +1600,33 @@ Apart from these, the following properties are also 
available, and may be useful
 How many batches the Spark Streaming UI and status APIs remember 
before garbage collecting.
   
 
+
+  
spark.streaming.driver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
driver. Because S3 doesn't
--- End diff --

I'd say `on the driver` instead of `in driver`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2015-12-29 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10421#discussion_r48562209
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala
 ---
@@ -171,7 +171,7 @@ object GenerateUnsafeRowJoiner extends 
CodeGenerator[(StructType, StructType), U
|// row1: ${schema1.size} fields, $bitset1Words words in bitset
|// row2: ${schema2.size}, $bitset2Words words in bitset
|// output: ${schema1.size + schema2.size} fields, 
$outputBitsetWords words in bitset
-   |final int sizeInBytes = row1.getSizeInBytes() + 
row2.getSizeInBytes();
+   |final int sizeInBytes = row1.getSizeInBytes() + 
row2.getSizeInBytes() - ($sizeReduction * 8);
--- End diff --

It may be better to use number of bytes for `sizeReduction` (also update 
the comments).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12554][Core]Standalone scheduler hangs ...

2015-12-29 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10507#issuecomment-167857621
  
@JerryLead I commented on the JIRA on why I don't think it's an issue. 
Let's move the discussion there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167836899
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167846349
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167846350
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48418/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48559086
  
--- Diff: docs/configuration.md ---
@@ -1600,6 +1600,33 @@ Apart from these, the following properties are also 
available, and may be useful
 How many batches the Spark Streaming UI and status APIs remember 
before garbage collecting.
   
 
+
+  
spark.streaming.driver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
driver. Because S3 doesn't
+support flushing of data, when using S3 for checkpointing, you should 
enable it to achieve read
+after write consistency.
+  
+
+
+  
spark.streaming.receiver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
receivers. Because S3
+doesn't support flushing of data, when using S3 for checkpointing, you 
should enable it to
+achieve read after write consistency.
+  
+
+
+  spark.streaming.driver.writeAheadLog.allowBatching
+  false
+  
+Whether to batch write ahead logs in driver to write. When using S3 
for checkpointing, write
--- End diff --

Here, I'd say `on the driver` instead of `in driver to write`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10488#issuecomment-167849606
  
Let me merge this one first to fix the spark master maven snapshot. Then, 
can you create another jira to update other places?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread kiszk

Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/10488#issuecomment-167853136
  
I see. I will create another JIRA entry to update other usages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...

2015-12-29 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10368#issuecomment-167855990
  
What first two days of work are you referring to? The problem is we just 
can't use Kryo for serializing task results because there might be unregistered 
classes. Because of this constraint we can't use `spark.serializer` here since 
the user can specify Kryo there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9608#issuecomment-167826650
  
**[Test build #48416 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48416/consoleFull)**
 for PR 9608 at commit 
[`b712b8d`](https://github.com/apache/spark/commit/b712b8d3f0bd11575533af6bb5931df096bce239).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...

2015-12-29 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10435#discussion_r48554681
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -176,3 +179,223 @@ case class Crc32(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 })
   }
 }
+
+/**
+ * A function that calculates hash value for a group of expressions.
+ *
+ * The hash value for an expression depends on its type:
+ *  - null:   0
+ *  - boolean:0 for true, 1 for false.
--- End diff --

Let's also add comments to explain the benefit of this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167847029
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167847052
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9608#issuecomment-167847920
  
**[Test build #48416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48416/consoleFull)**
 for PR 9608 at commit 
[`b712b8d`](https://github.com/apache/spark/commit/b712b8d3f0bd11575533af6bb5931df096bce239).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9608#issuecomment-167848079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48416/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9608#issuecomment-167848076
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10453#discussion_r48559209
  
--- Diff: docs/configuration.md ---
@@ -1600,6 +1600,33 @@ Apart from these, the following properties are also 
available, and may be useful
 How many batches the Spark Streaming UI and status APIs remember 
before garbage collecting.
   
 
+
+  
spark.streaming.driver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
driver. Because S3 doesn't
+support flushing of data, when using S3 for checkpointing, you should 
enable it to achieve read
+after write consistency.
+  
+
+
+  
spark.streaming.receiver.writeAheadLog.closeFileAfterWrite
+  false
+  
+Whether to close the file after writing a write ahead log record in 
receivers. Because S3
+doesn't support flushing of data, when using S3 for checkpointing, you 
should enable it to
+achieve read after write consistency.
+  
+
+
+  spark.streaming.driver.writeAheadLog.allowBatching
+  false
+  
+Whether to batch write ahead logs in driver to write. When using S3 
for checkpointing, write
+operations in driver usually take too long. Enable batching write 
ahead logs will improve
--- End diff --

same: `on the`
and `Enabling` instead of `Enable`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2015-12-29 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-167855300
  
@rxin I had finished the refactoring long time ago.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12560][SQL] SqlTestUtils.stripSparkFilt...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10510#issuecomment-167899182
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48423/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167900502
  
**[Test build #48430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48430/consoleFull)**
 for PR 10509 at commit 
[`b8e76b2`](https://github.com/apache/spark/commit/b8e76b257063db79f05a83aa4a05578ce8807c03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167901239
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167901126
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167901129
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48430/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167902518
  
After rethinking the `Limit` push-down rules, we are unable to push Limit 
through any operator that could change the values or the number of rows. Thus, 
so far, the eligible candidates are `Project`, `Union All` and 
`Outer/LeftOuter/RightOuter Join`. Please correct me if my understanding is not 
right. 

Feel free to let me know if the codes need an update. Thank you! 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10514#issuecomment-167902559
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48581020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -80,6 +81,33 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
 object DefaultOptimizer extends Optimizer
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ */
+object PushDownLimit extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit or
+// do not have Limit descendants. This heuristic is valid assuming 
there does not exist
+// any Limit push-down rule that is unable to infer the value of 
maxRows.
+// Note, right now, Union means UNION ALL, which does not de-duplicate 
rows. So, it is
+// safe to pushdown Limit through it. Once we add UNION DISTINCT, we 
will not be able to
+// pushdown Limit.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Yeah, you are right. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10514#issuecomment-167905088
  
**[Test build #48429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48429/consoleFull)**
 for PR 10514 at commit 
[`9be0d0a`](https://github.com/apache/spark/commit/9be0d0a01edbb6615871c84f6d8f4b608501a8f0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10514#issuecomment-167905122
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10514#issuecomment-167905123
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48429/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12568] [SQL] Add BINARY to Encoders

2015-12-29 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/10516

[SPARK-12568] [SQL] Add BINARY to Encoders



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark datasetCleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10516


commit a2c98795fbe217efc065be2ab0f1a5400d7653f6
Author: Michael Armbrust 
Date:   2015-12-24T05:51:39Z

WIP




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10446#discussion_r48581372
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala
 ---
@@ -24,10 +24,9 @@ import scala.collection.JavaConverters._
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.lib.output.{FileOutputCommitter => 
MapReduceFileOutputCommitter}
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
 import org.apache.spark._
--- End diff --

nit: not your fault but a good opportunity to add a blank line here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10446#discussion_r48581350
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SqlNewHadoopRDD.scala
 ---
@@ -26,10 +26,10 @@ import org.apache.hadoop.conf.{Configurable, 
Configuration}
 import org.apache.hadoop.io.Writable
 import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}
+import org.apache.hadoop.mapreduce.task.{TaskAttemptContextImpl, 
JobContextImpl}
--- End diff --

nit: order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-29 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/10446#issuecomment-167907581
  
LGTM; there's at least one extra possible cleanup, but feel free to punt on 
that one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10514#issuecomment-167912484
  
**[Test build #48438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48438/consoleFull)**
 for PR 10514 at commit 
[`9be0d0a`](https://github.com/apache/spark/commit/9be0d0a01edbb6615871c84f6d8f4b608501a8f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12549][SQL] Take Option[Seq[DataType]] ...

2015-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10504


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10906][MLlib] More efficient SparseMatr...

2015-12-29 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/8960#issuecomment-167914321
  
@rahulpalamuttam Sorry for the delay in reviewing!  How would you feel 
about updating the implementation in Breeze, rather than in Spark?  I expect 
you could use much of the code you've already written, and I'd be happy to help 
review your PR to Breeze if it's helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2015-12-29 Thread xguo27

Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10515#issuecomment-167915743
  
@marmbrus Thanks Michael for your feedback!

Looks like the 'value' is to give the single string column a arbitrary 
name. Current implementation strips schema information when creating 
TextRelation (after verifying the schema is single field with string type). It 
is fine during read, but fails during write.

Would you mind taking another look at my updated change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10435#issuecomment-167918200
  
**[Test build #48442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48442/consoleFull)**
 for PR 10435 at commit 
[`8703b1a`](https://github.com/apache/spark/commit/8703b1a127235c49614d326334548f125b81383b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-16798
  
I'm going to merge this pull request given the size.

@hvanhovell  please submit follow up prs to address the todos. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/10514

[SPARK-12511][PySpark][Streaming]Make sure TransformFunctionSerializer is 
created only once

Although SPARK-12511 is because of an issue in Py4j that 
PythonProxyHandler.finalize blocks forever, we can bypass it.

When checkpoint is enabled, right now `Streaming._ensure_initialized` will 
be called twice and create two `TransformFunctionSerializer`s. Because the 
first `TransformFunctionSerializer` is replaced and GCed, 
`PythonProxyHandler.finalize` will be triggered. Actually, we only need one 
`TransformFunctionSerializer`.

This PR added a simple check to avoid creating TransformFunctionSerializer 
multiple times, then `PythonProxyHandler.finalize` won't be called since 
TransformFunctionSerializer will always be used.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-12511

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10514.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10514


commit 9be0d0a01edbb6615871c84f6d8f4b608501a8f0
Author: Shixiong Zhu 
Date:   2015-12-29T23:14:16Z

Make sure TransformFunctionSerializer is created only once




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12560][SQL] SqlTestUtils.stripSparkFilt...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10510#issuecomment-167899086
  
**[Test build #48423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48423/consoleFull)**
 for PR 10510 at commit 
[`308294a`](https://github.com/apache/spark/commit/308294ac538a3215ce2d5f51297556586f0ade5c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-12-29 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-167899082
  
> Earlier, you had also suggested offering an option for the amount of 
memory per executor. Is that still valid in your proposal?

What do you mean? You can already do that through `spark.executor.memory`, 
even before this patch.

> At one point, you also suggested that the framework should also execute 
as many executors as needed to use all or nearly all the cores on each node. I 
would prefer that this is overridable by specifying the maximum number of 
executors to use per node. This makes it easier to use Spark on a cluster 
shared by multiple users or applications.

I agree, though we should try to come up with a minimal set of 
configurations that conflict with each other least. I haven't decided exactly 
what those would look like but it could come in a later patch.

> It's really unfortunate that this patch was closed without merging.

Actually it will be re-opened shortly, just with a slightly different 
approach. I believe @tnachen is currently on vacation but once he comes back 
we'll move forward again. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12560][SQL] SqlTestUtils.stripSparkFilt...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10510#issuecomment-167899180
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10514#issuecomment-167900273
  
**[Test build #48429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48429/consoleFull)**
 for PR 10514 at commit 
[`9be0d0a`](https://github.com/apache/spark/commit/9be0d0a01edbb6615871c84f6d8f4b608501a8f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48578529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -69,6 +73,33 @@ object DefaultOptimizer extends Optimizer {
 }
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ *
+ * Note: This rule has to be done when the logical plan is stable;
+ *   Otherwise, it could impact the other rules.
--- End diff --

I'm not sure what this means?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167901091
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48425/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167901090
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167901009
  
**[Test build #48432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48432/consoleFull)**
 for PR 10511 at commit 
[`513597c`](https://github.com/apache/spark/commit/513597c3172cec7e68bd15f9c543533248d1c3e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167901053
  
**[Test build #48425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48425/consoleFull)**
 for PR 10511 at commit 
[`cba3934`](https://github.com/apache/spark/commit/cba393448c2d581bd62e31d3181a11e290a2a83d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10509#issuecomment-167901123
  
**[Test build #48430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48430/consoleFull)**
 for PR 10509 at commit 
[`b8e76b2`](https://github.com/apache/spark/commit/b8e76b257063db79f05a83aa4a05578ce8807c03).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10453#issuecomment-167902755
  
**[Test build #48436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48436/consoleFull)**
 for PR 10453 at commit 
[`bce7a29`](https://github.com/apache/spark/commit/bce7a29de2966024103258031eeecb369e6d45b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate dependencies in a file...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10461#issuecomment-167906816
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10446#discussion_r48581268
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -97,7 +97,7 @@ private[spark] class EventLoggingListener(
* Creates the log file in the configured log directory.
*/
   def start() {
-if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
+if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDirectory) {
--- End diff --

Unrelated to this line: this class has the `hadoopFlushMethod` hack which 
probably can go away now, if you want to do more cleanup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48581249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
--- End diff --

Actually we will push down `Project` through `Limit` in `ColumnPruning`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate dependencies in a file...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10461#issuecomment-167906817
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48428/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12490][Core]Limit the css style scope t...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10517#issuecomment-167910392
  
**[Test build #48439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48439/consoleFull)**
 for PR 10517 at commit 
[`414b274`](https://github.com/apache/spark/commit/414b27416ae51c644bc1b8fb4d8226e945809d7b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5479] [yarn] Handle --py-files correctl...

2015-12-29 Thread zjffdu

Github user zjffdu commented on the pull request:

https://github.com/apache/spark/pull/6360#issuecomment-167912850
  
Thanks @vanzin, my fault, I specify the hdfs location using hostname, while 
it is ip address in core-site.xml (anyway, maybe we can improve it here)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48582039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -80,6 +81,33 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
 object DefaultOptimizer extends Optimizer
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ */
+object PushDownLimit extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit or
+// do not have Limit descendants. This heuristic is valid assuming 
there does not exist
+// any Limit push-down rule that is unable to infer the value of 
maxRows.
+// Note, right now, Union means UNION ALL, which does not de-duplicate 
rows. So, it is
+// safe to pushdown Limit through it. Once we add UNION DISTINCT, we 
will not be able to
+// pushdown Limit.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Yeah, that also makes sense. Will do the change after these three running 
test cases. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167914257
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48432/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167914254
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...

2015-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10511#issuecomment-167914227
  
**[Test build #48432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48432/consoleFull)**
 for PR 10511 at commit 
[`513597c`](https://github.com/apache/spark/commit/513597c3172cec7e68bd15f9c543533248d1c3e3).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7995][SPARK-6280][Core]Remove AkkaRpcEn...

2015-12-29 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10459#issuecomment-167914770
  
CC @vanzin @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 485 matches

Mail list logo