[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167834296 Can you update the pr? Once it passes jenkins, I will merge it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167842481 **[Test build #48417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48417/consoleFull)** for PR 10509 at commit [`cb60ba0`](https://github.com/apache/spark/commit/cb60ba045ff6663ed83c308b2423bdb87152a092). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167847055 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48414/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10453#issuecomment-167848381 I have a few comments on phrasing but otherwise it lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12079][BUILD][SQL] Run Catalyst subproj...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10077#issuecomment-167848248 Closing this for now; this is blocked on an investigation into custom log appenders in tests in order to fix the log interleaving problems, as well as an investigation into the build hang issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48559339 --- Diff: docs/streaming-programming-guide.md --- @@ -2029,6 +2029,11 @@ If the data is being received by the receivers faster than what can be processed you can limit the rate by setting the [configuration parameter](configuration.html#spark-streaming) `spark.streaming.receiver.maxRate`. +If using S3 for checkpointing, please remember to enable `spark.streaming.driver.writeAheadLog.closeFileAfterWrite` +and `spark.streaming.receiver.writeAheadLog.closeFileAfterWrite`. You can also enable +`spark.streaming.driver.writeAheadLog.allowBatching` to improve the performance of writing write +ahead logs in driver. See [Spark Streaming Configuration](configuration.html#spark-streaming) or more details. --- End diff -- `on the driver` and `for more details` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12079][BUILD][SQL] Run Catalyst subproj...
Github user JoshRosen closed the pull request at: https://github.com/apache/spark/pull/10077 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48559275 --- Diff: docs/configuration.md --- @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful How many batches the Spark Streaming UI and status APIs remember before garbage collecting. + + spark.streaming.driver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't +support flushing of data, when using S3 for checkpointing, you should enable it to achieve read +after write consistency. + + + + spark.streaming.receiver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in receivers. Because S3 +doesn't support flushing of data, when using S3 for checkpointing, you should enable it to +achieve read after write consistency. + + + + spark.streaming.driver.writeAheadLog.allowBatching + false + +Whether to batch write ahead logs in driver to write. When using S3 for checkpointing, write +operations in driver usually take too long. Enable batching write ahead logs will improve +the performance of writing. --- End diff -- I'd say `will improve the performance of write operations` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167823882 **[Test build #48415 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48415/consoleFull)** for PR 10498 at commit [`a9dc997`](https://github.com/apache/spark/commit/a9dc99722bfea886c6381abbd2e1e9366fcf9064). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...
Github user radekg commented on the pull request: https://github.com/apache/spark/pull/9608#issuecomment-167826085 I have 3 tests failing locally but I don't think these are related to my changes. `scalastyle` seems to be ok now. Failing tests: ``` - launch simple application with spark-submit *** FAILED *** Process returned with exit code 1. See the log4j logs for more detail. (SparkSubmitSuite.scala:583) warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/io/Serializable.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Object.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/String.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/io/Serializable.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation': class file for jdk.Profile+Annotation not found /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Object.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/String.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Override.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Annotation.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Target.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/ElementType.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Retention.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. warning: /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/RetentionPolicy.class): major version 52 is newer than 51, the highest major version supported by this compiler. It is recommended that the compiler be upgraded. /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/Override.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Annotation.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Target.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/ElementType.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/Retention.class): warning: Cannot find annotation method 'value()' in type 'Profile+Annotation' /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ct.sym(META-INF/sym/rt.jar/java/lang/annotation/RetentionPolicy.class): warning: Cannot find annotation method 'value()' in type
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167848836 @yhuai, do you mean that I would update all of the string concatenation in @ExpressionDescription by using multi-line string literals rather than only the original one? If so, I will do this update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-167820998 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-167821001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48413/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167843318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48415/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167843317 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167843065 **[Test build #48415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48415/consoleFull)** for PR 10498 at commit [`a9dc997`](https://github.com/apache/spark/commit/a9dc99722bfea886c6381abbd2e1e9366fcf9064). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class BucketSpec(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167848994 **[Test build #48420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48420/consoleFull)** for PR 10509 at commit [`feee2ba`](https://github.com/apache/spark/commit/feee2ba1fc4ecf649328604b5cc29e972d0f4ae9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7874][MESOS] Don’t allocate more than...
Github user blbradley commented on the pull request: https://github.com/apache/spark/pull/9027#issuecomment-167848940 @dragos Where can you see that fine-grained mode is slated for removal? All I see is #9795. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12554][Core]Standalone scheduler hangs ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10507#issuecomment-167856135 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167834026 @kiszk Thank you for the investigation. Yeah, let's use multi-line string literals. If we have to have a line with more than 100 characters, let's use `// scalastyle:off line.size.limit` and `// scalastyle:on line.size.limit` to just bypass the line number requirement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167847328 **[Test build #48419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48419/consoleFull)** for PR 10509 at commit [`feee2ba`](https://github.com/apache/spark/commit/feee2ba1fc4ecf649328604b5cc29e972d0f4ae9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48559444 --- Diff: docs/configuration.md --- @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful How many batches the Spark Streaming UI and status APIs remember before garbage collecting. + + spark.streaming.driver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't +support flushing of data, when using S3 for checkpointing, you should enable it to achieve read +after write consistency. + + + + spark.streaming.receiver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in receivers. Because S3 +doesn't support flushing of data, when using S3 for checkpointing, you should enable it to +achieve read after write consistency. + + + + spark.streaming.driver.writeAheadLog.allowBatching + false --- End diff -- for me: the default value is `true`. That's why I want to expose this one since the behavior is different from 1.5.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167850051 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167850043 **[Test build #48420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48420/consoleFull)** for PR 10509 at commit [`feee2ba`](https://github.com/apache/spark/commit/feee2ba1fc4ecf649328604b5cc29e972d0f4ae9). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class LZ4BlockInputStream extends FilterInputStream ` * `class JavaWordBlacklist ` * `class JavaDroppedWordsCounter ` * `case class AssertNotNull(` * ` * Abstract class all optimizers should inherit of, contains the standard batches (extending` * `abstract class Optimizer extends RuleExecutor[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167850053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48420/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10488 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12548][Build] Add more exceptions to Gu...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10442#issuecomment-167855574 Also, I just noticed this is opened against the wrong branch. Please close this PR and re-open it against master if we do decide to continue work on this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/6935#discussion_r48548650 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala --- @@ -0,0 +1,476 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.util.{Date, NoSuchElementException} + +import javax.servlet.Filter + +import scala.collection.mutable +import scala.collection.mutable.ListBuffer +import scala.language.postfixOps + +import com.codahale.metrics.Counter +import com.google.common.cache.LoadingCache +import com.google.common.util.concurrent.UncheckedExecutionException +import org.eclipse.jetty.servlet.ServletContextHandler +import org.mockito.Mockito._ +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar + +import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => AttemptInfo, ApplicationInfo} +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.{Clock, ManualClock, Utils} +import org.apache.spark.{Logging, SparkFunSuite} + +class ApplicationCacheSuite extends SparkFunSuite with Logging with MockitoSugar with Matchers { + + /** + * subclass with access to the cache internals + * @param refreshInterval interval between refreshes in milliseconds. + * @param retainedApplications number of retained applications + */ + class TestApplicationCache( + operations: ApplicationCacheOperations = new StubCacheOperations(), + refreshInterval: Long, + retainedApplications: Int, + clock: Clock = new ManualClock(0)) + extends ApplicationCache(operations, refreshInterval, retainedApplications, clock) { + +def cache(): LoadingCache[CacheKey, CacheEntry] = appCache + } + + /** + * Stub cache operations. + * The state is kept in a map of [[CacheKey]] to [[CacheEntry]], + * the `probeTime` field in the cache entry setting the timestamp of the entry + */ + class StubCacheOperations extends ApplicationCacheOperations with Logging { + +/** map to UI instances, including timestamps, which are used in update probes */ +val instances = mutable.HashMap.empty[CacheKey, CacheEntry] + +/** Map of attached spark UIs */ +val attached = mutable.HashMap.empty[CacheKey, SparkUI] + +var getAppUICount = 0L +var attachCount = 0L +var detachCount = 0L +var updateProbeCount = 0L + +/** + * Get the application UI + * @param appId application ID + * @param attemptId attempt ID + * @return If found, the Spark UI and any history information to be used in the cache + */ +override def getAppUI(appId: String, attemptId: Option[String]): Option[LoadedAppUI] = { + logDebug(s"getAppUI($appId, $attemptId)") + getAppUICount += 1 + instances.get(CacheKey(appId, attemptId)).map( e => +LoadedAppUI(e.ui, Some(new StubHistoryProviderUpdateState(e.probeTime +} + +override def attachSparkUI(appId: String, attemptId: Option[String], ui: SparkUI, +completed: Boolean): Unit = { + logDebug(s"attachSparkUI($appId, $attemptId, $ui)") + attachCount += 1 + attached += (CacheKey(appId, attemptId) -> ui) +} + +def putAndAttach(appId: String, attemptId: Option[String], completed: Boolean, started: Long, +ended: Long, timestamp: Long): SparkUI = { + val ui = putAppUI(appId, attemptId, completed, started, ended, timestamp) + attachSparkUI(appId, attemptId, ui, completed) + ui +} + +def putAppUI(appId: String, attemptId: Option[String], completed: Boolean, started: Long, +ended: Long, timestamp: Long): SparkUI = { + val ui = newUI(appId, attemptId, completed, started, ended) + putInstance(appId, attemptId, ui, completed,
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-167820776 **[Test build #48413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48413/consoleFull)** for PR 10506 at commit [`710f5de`](https://github.com/apache/spark/commit/710f5de578449c9f8156540bdc26b4b12d2567d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10502#issuecomment-167835640 @HyukjinKwon Thank you for the PR? Can you post some benchmarking results (with your testing code)? It will be good to have these numbers to help others understand if it can provide benefit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167843425 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48417/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167843416 **[Test build #48417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48417/consoleFull)** for PR 10509 at commit [`cb60ba0`](https://github.com/apache/spark/commit/cb60ba045ff6663ed83c308b2423bdb87152a092). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167843424 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167843520 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48558035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit and no Limit +// was pushed down before. This heuristic is valid assuming there does not exist any Limit +// push-down rule that is unable to infer the value of maxRows. Any operator that a Limit can +// be pushed passed should override this function. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Is there a reason to not check left and right separately? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167845210 Thanks for working on this. I think its getting pretty close. A few minor cleanups that might be nice: - I think we should consider pulling all the Limit rules into their own `LimitPushDown` rule. The reasoning here is twofold: we can clearly comment in one central place the requirements with respect to implementing maxRows. It will be easier to turn off if it is ever doing the wrong thing. - We should do a pass through and add `maxRows` to any other logical plans where it make sense. Off the top of my head: - Filter = `child.maxRows` - Union = `for(leftMax <- left.maxRows; rightMax <- rightMax) yield Add(leftMax, rightMax)` - Distinct = `child.maxRows` - Aggregate - `child.maxRows` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48559015 --- Diff: docs/configuration.md --- @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful How many batches the Spark Streaming UI and status APIs remember before garbage collecting. + + spark.streaming.driver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't +support flushing of data, when using S3 for checkpointing, you should enable it to achieve read +after write consistency. + + + + spark.streaming.receiver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in receivers. Because S3 --- End diff -- same thing here: `on the receivers` instead of `in receivers` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48558980 --- Diff: docs/configuration.md --- @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful How many batches the Spark Streaming UI and status APIs remember before garbage collecting. + + spark.streaming.driver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't --- End diff -- I'd say `on the driver` instead of `in driver`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10421#discussion_r48562209 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala --- @@ -171,7 +171,7 @@ object GenerateUnsafeRowJoiner extends CodeGenerator[(StructType, StructType), U |// row1: ${schema1.size} fields, $bitset1Words words in bitset |// row2: ${schema2.size}, $bitset2Words words in bitset |// output: ${schema1.size + schema2.size} fields, $outputBitsetWords words in bitset - |final int sizeInBytes = row1.getSizeInBytes() + row2.getSizeInBytes(); + |final int sizeInBytes = row1.getSizeInBytes() + row2.getSizeInBytes() - ($sizeReduction * 8); --- End diff -- It may be better to use number of bytes for `sizeReduction` (also update the comments). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12554][Core]Standalone scheduler hangs ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10507#issuecomment-167857621 @JerryLead I commented on the JIRA on why I don't think it's an issue. Let's move the discussion there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167836899 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167846349 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167846350 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48418/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48559086 --- Diff: docs/configuration.md --- @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful How many batches the Spark Streaming UI and status APIs remember before garbage collecting. + + spark.streaming.driver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't +support flushing of data, when using S3 for checkpointing, you should enable it to achieve read +after write consistency. + + + + spark.streaming.receiver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in receivers. Because S3 +doesn't support flushing of data, when using S3 for checkpointing, you should enable it to +achieve read after write consistency. + + + + spark.streaming.driver.writeAheadLog.allowBatching + false + +Whether to batch write ahead logs in driver to write. When using S3 for checkpointing, write --- End diff -- Here, I'd say `on the driver` instead of `in driver to write`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167849606 Let me merge this one first to fix the spark master maven snapshot. Then, can you create another jira to update other places? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167853136 I see. I will create another JIRA entry to update other usages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10368#issuecomment-167855990 What first two days of work are you referring to? The problem is we just can't use Kryo for serializing task results because there might be unregistered classes. Because of this constraint we can't use `spark.serializer` here since the user can specify Kryo there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9608#issuecomment-167826650 **[Test build #48416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48416/consoleFull)** for PR 9608 at commit [`b712b8d`](https://github.com/apache/spark/commit/b712b8d3f0bd11575533af6bb5931df096bce239). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/10435#discussion_r48554681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -176,3 +179,223 @@ case class Crc32(child: Expression) extends UnaryExpression with ImplicitCastInp }) } } + +/** + * A function that calculates hash value for a group of expressions. + * + * The hash value for an expression depends on its type: + * - null: 0 + * - boolean:0 for true, 1 for false. --- End diff -- Let's also add comments to explain the benefit of this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167847029 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167847052 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9608#issuecomment-167847920 **[Test build #48416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48416/consoleFull)** for PR 9608 at commit [`b712b8d`](https://github.com/apache/spark/commit/b712b8d3f0bd11575533af6bb5931df096bce239). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9608#issuecomment-167848079 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48416/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11638] [Mesos + Docker Bridge networkin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9608#issuecomment-167848076 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10453#discussion_r48559209 --- Diff: docs/configuration.md --- @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful How many batches the Spark Streaming UI and status APIs remember before garbage collecting. + + spark.streaming.driver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't +support flushing of data, when using S3 for checkpointing, you should enable it to achieve read +after write consistency. + + + + spark.streaming.receiver.writeAheadLog.closeFileAfterWrite + false + +Whether to close the file after writing a write ahead log record in receivers. Because S3 +doesn't support flushing of data, when using S3 for checkpointing, you should enable it to +achieve read after write consistency. + + + + spark.streaming.driver.writeAheadLog.allowBatching + false + +Whether to batch write ahead logs in driver to write. When using S3 for checkpointing, write +operations in driver usually take too long. Enable batching write ahead logs will improve --- End diff -- same: `on the` and `Enabling` instead of `Enable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-167855300 @rxin I had finished the refactoring long time ago. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12560][SQL] SqlTestUtils.stripSparkFilt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10510#issuecomment-167899182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48423/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167900502 **[Test build #48430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48430/consoleFull)** for PR 10509 at commit [`b8e76b2`](https://github.com/apache/spark/commit/b8e76b257063db79f05a83aa4a05578ce8807c03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167901239 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167901126 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167901129 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48430/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167902518 After rethinking the `Limit` push-down rules, we are unable to push Limit through any operator that could change the values or the number of rows. Thus, so far, the eligible candidates are `Project`, `Union All` and `Outer/LeftOuter/RightOuter Join`. Please correct me if my understanding is not right. Feel free to let me know if the codes need an update. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10514#issuecomment-167902559 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48581020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -80,6 +81,33 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { object DefaultOptimizer extends Optimizer /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + */ +object PushDownLimit extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit or +// do not have Limit descendants. This heuristic is valid assuming there does not exist +// any Limit push-down rule that is unable to infer the value of maxRows. +// Note, right now, Union means UNION ALL, which does not de-duplicate rows. So, it is +// safe to pushdown Limit through it. Once we add UNION DISTINCT, we will not be able to +// pushdown Limit. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Yeah, you are right. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10514#issuecomment-167905088 **[Test build #48429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48429/consoleFull)** for PR 10514 at commit [`9be0d0a`](https://github.com/apache/spark/commit/9be0d0a01edbb6615871c84f6d8f4b608501a8f0). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10514#issuecomment-167905122 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10514#issuecomment-167905123 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48429/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12568] [SQL] Add BINARY to Encoders
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/10516 [SPARK-12568] [SQL] Add BINARY to Encoders You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark datasetCleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10516.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10516 commit a2c98795fbe217efc065be2ab0f1a5400d7653f6 Author: Michael ArmbrustDate: 2015-12-24T05:51:39Z WIP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/10446#discussion_r48581372 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -24,10 +24,9 @@ import scala.collection.JavaConverters._ import org.apache.hadoop.fs.Path import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.lib.output.{FileOutputCommitter => MapReduceFileOutputCommitter} +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl import org.apache.spark._ --- End diff -- nit: not your fault but a good opportunity to add a blank line here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/10446#discussion_r48581350 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SqlNewHadoopRDD.scala --- @@ -26,10 +26,10 @@ import org.apache.hadoop.conf.{Configurable, Configuration} import org.apache.hadoop.io.Writable import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit} +import org.apache.hadoop.mapreduce.task.{TaskAttemptContextImpl, JobContextImpl} --- End diff -- nit: order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/10446#issuecomment-167907581 LGTM; there's at least one extra possible cleanup, but feel free to punt on that one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10514#issuecomment-167912484 **[Test build #48438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48438/consoleFull)** for PR 10514 at commit [`9be0d0a`](https://github.com/apache/spark/commit/9be0d0a01edbb6615871c84f6d8f4b608501a8f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12549][SQL] Take Option[Seq[DataType]] ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10504 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10906][MLlib] More efficient SparseMatr...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8960#issuecomment-167914321 @rahulpalamuttam Sorry for the delay in reviewing! How would you feel about updating the implementation in Breeze, rather than in Spark? I expect you could use much of the code you've already written, and I'd be happy to help review your PR to Breeze if it's helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...
Github user xguo27 commented on the pull request: https://github.com/apache/spark/pull/10515#issuecomment-167915743 @marmbrus Thanks Michael for your feedback! Looks like the 'value' is to give the single string column a arbitrary name. Current implementation strips schema information when creating TextRelation (after verifying the schema is single field with string type). It is fine during read, but fails during write. Would you mind taking another look at my updated change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10435#issuecomment-167918200 **[Test build #48442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48442/consoleFull)** for PR 10435 at commit [`8703b1a`](https://github.com/apache/spark/commit/8703b1a127235c49614d326334548f125b81383b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-16798 I'm going to merge this pull request given the size. @hvanhovell please submit follow up prs to address the todos. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/10514 [SPARK-12511][PySpark][Streaming]Make sure TransformFunctionSerializer is created only once Although SPARK-12511 is because of an issue in Py4j that PythonProxyHandler.finalize blocks forever, we can bypass it. When checkpoint is enabled, right now `Streaming._ensure_initialized` will be called twice and create two `TransformFunctionSerializer`s. Because the first `TransformFunctionSerializer` is replaced and GCed, `PythonProxyHandler.finalize` will be triggered. Actually, we only need one `TransformFunctionSerializer`. This PR added a simple check to avoid creating TransformFunctionSerializer multiple times, then `PythonProxyHandler.finalize` won't be called since TransformFunctionSerializer will always be used. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-12511 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10514 commit 9be0d0a01edbb6615871c84f6d8f4b608501a8f0 Author: Shixiong ZhuDate: 2015-12-29T23:14:16Z Make sure TransformFunctionSerializer is created only once --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12560][SQL] SqlTestUtils.stripSparkFilt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10510#issuecomment-167899086 **[Test build #48423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48423/consoleFull)** for PR 10510 at commit [`308294a`](https://github.com/apache/spark/commit/308294ac538a3215ce2d5f51297556586f0ade5c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4027#issuecomment-167899082 > Earlier, you had also suggested offering an option for the amount of memory per executor. Is that still valid in your proposal? What do you mean? You can already do that through `spark.executor.memory`, even before this patch. > At one point, you also suggested that the framework should also execute as many executors as needed to use all or nearly all the cores on each node. I would prefer that this is overridable by specifying the maximum number of executors to use per node. This makes it easier to use Spark on a cluster shared by multiple users or applications. I agree, though we should try to come up with a minimal set of configurations that conflict with each other least. I haven't decided exactly what those would look like but it could come in a later patch. > It's really unfortunate that this patch was closed without merging. Actually it will be re-opened shortly, just with a slightly different approach. I believe @tnachen is currently on vacation but once he comes back we'll move forward again. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12560][SQL] SqlTestUtils.stripSparkFilt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10510#issuecomment-167899180 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12511][PySpark][Streaming]Make sure Tra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10514#issuecomment-167900273 **[Test build #48429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48429/consoleFull)** for PR 10514 at commit [`9be0d0a`](https://github.com/apache/spark/commit/9be0d0a01edbb6615871c84f6d8f4b608501a8f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48578529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -69,6 +73,33 @@ object DefaultOptimizer extends Optimizer { } /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + * + * Note: This rule has to be done when the logical plan is stable; + * Otherwise, it could impact the other rules. --- End diff -- I'm not sure what this means? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167901091 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48425/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167901090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167901009 **[Test build #48432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48432/consoleFull)** for PR 10511 at commit [`513597c`](https://github.com/apache/spark/commit/513597c3172cec7e68bd15f9c543533248d1c3e3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167901053 **[Test build #48425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48425/consoleFull)** for PR 10511 at commit [`cba3934`](https://github.com/apache/spark/commit/cba393448c2d581bd62e31d3181a11e290a2a83d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12362][SQL][WIP] Inline Hive Parser
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10509#issuecomment-167901123 **[Test build #48430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48430/consoleFull)** for PR 10509 at commit [`b8e76b2`](https://github.com/apache/spark/commit/b8e76b257063db79f05a83aa4a05578ce8807c03). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10453#issuecomment-167902755 **[Test build #48436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48436/consoleFull)** for PR 10453 at commit [`bce7a29`](https://github.com/apache/spark/commit/bce7a29de2966024103258031eeecb369e6d45b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate dependencies in a file...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10461#issuecomment-167906816 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/10446#discussion_r48581268 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -97,7 +97,7 @@ private[spark] class EventLoggingListener( * Creates the log file in the configured log directory. */ def start() { -if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) { +if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDirectory) { --- End diff -- Unrelated to this line: this class has the `hadoopFlushMethod` hack which probably can go away now, if you want to do more cleanup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48581249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- Actually we will push down `Project` through `Limit` in `ColumnPruning`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate dependencies in a file...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10461#issuecomment-167906817 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48428/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12490][Core]Limit the css style scope t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10517#issuecomment-167910392 **[Test build #48439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48439/consoleFull)** for PR 10517 at commit [`414b274`](https://github.com/apache/spark/commit/414b27416ae51c644bc1b8fb4d8226e945809d7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5479] [yarn] Handle --py-files correctl...
Github user zjffdu commented on the pull request: https://github.com/apache/spark/pull/6360#issuecomment-167912850 Thanks @vanzin, my fault, I specify the hdfs location using hostname, while it is ip address in core-site.xml (anyway, maybe we can improve it here) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48582039 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -80,6 +81,33 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { object DefaultOptimizer extends Optimizer /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + */ +object PushDownLimit extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit or +// do not have Limit descendants. This heuristic is valid assuming there does not exist +// any Limit push-down rule that is unable to infer the value of maxRows. +// Note, right now, Union means UNION ALL, which does not de-duplicate rows. So, it is +// safe to pushdown Limit through it. Once we add UNION DISTINCT, we will not be able to +// pushdown Limit. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Yeah, that also makes sense. Will do the change after these three running test cases. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167914257 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48432/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167914254 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10511#issuecomment-167914227 **[Test build #48432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48432/consoleFull)** for PR 10511 at commit [`513597c`](https://github.com/apache/spark/commit/513597c3172cec7e68bd15f9c543533248d1c3e3). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7995][SPARK-6280][Core]Remove AkkaRpcEn...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10459#issuecomment-167914770 CC @vanzin @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org