[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-58142790 By the way, I checked and this patch cleanly cherry-picks into `branch-1.0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412] Replace Epydoc with Sphinx to gen...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2689 [SPARK-3412] Replace Epydoc with Sphinx to generate Python API docs Retire Epydoc, use Sphinx to generate API docs. Refine Sphinx docs, also convert some docstrings into Sphinx style. It looks like: ![api doc](https://cloud.githubusercontent.com/assets/40902/4538272/9e2d4f10-4dec-11e4-8d96-6e45a8fe51f9.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2689 commit 240b3933ffa21c97d8bf53b91b6274f715877980 Author: Davies Liu davies@gmail.com Date: 2014-10-06T22:00:11Z replace epydoc with sphinx doc commit 746d0b67ba782faf660c24dc3ab11caefc9a0cc2 Author: Davies Liu davies@gmail.com Date: 2014-10-06T22:46:24Z @param - :param commit 4bc1c3c794c2ecfb213d5fc0379c03c3615b5c89 Author: Davies Liu davies@gmail.com Date: 2014-10-07T06:29:49Z refactor commit d5b874a1dd0f49e1dee84746ef64ec08efeccaf9 Author: Davies Liu davies@gmail.com Date: 2014-10-07T06:35:49Z Merge branch 'master' of github.com:apache/spark into docs Conflicts: python/pyspark/mllib/classification.py python/pyspark/mllib/regression.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2624#issuecomment-58143439 @mateiz Aside from restoring the `getThreadLocal` method in order to preserve API compatibility, is this patch otherwise ready to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412] [PySpark] Replace Epydoc with Sph...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2689#issuecomment-58143632 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21367/consoleFull) for PR 2689 at commit [`d5b874a`](https://github.com/apache/spark/commit/d5b874a1dd0f49e1dee84746ef64ec08efeccaf9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix the Building Spark url
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2558#issuecomment-58143668 Hi @yangl, Do you mind closing this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2624#issuecomment-58143704 @mateiz @JoshRosen I had put getThreadLocal() back and deprecated it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix the Building Spark url
Github user yangl commented on the pull request: https://github.com/apache/spark/pull/2558#issuecomment-58143905 close please,ths! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix the Building Spark url
Github user yangl closed the pull request at: https://github.com/apache/spark/pull/2558 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2624#issuecomment-58143977 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull) for PR 2624 at commit [`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3825] Log more detail when unrolling a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2688#issuecomment-58144285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21366/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3825] Log more detail when unrolling a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2688#issuecomment-58144279 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21366/consoleFull) for PR 2688 at commit [`5638c49`](https://github.com/apache/spark/commit/5638c49f1a441b338b8998294aacae27300cd522). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2690 [SPARK-3829] Make Spark logo image on the header of HistoryPage as a link to HistoryPage's page #1 There is a Spark logo on the header of HistoryPage. We can have too many HistoryPages if we run 20+ applications. So I think, it's useful if the logo is as a link to the HistoryPage's page #1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-3829 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2690.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2690 commit dd874805678629df0fdb489644a1330ff35c94a4 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-10-07T06:54:27Z Made header Spark log image as a link to History Server's top page. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2690#issuecomment-58145012 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21369/consoleFull) for PR 2690 at commit [`dd87480`](https://github.com/apache/spark/commit/dd874805678629df0fdb489644a1330ff35c94a4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3270] Spark API for Application Extensi...
GitHub user mmalohlava opened a pull request: https://github.com/apache/spark/pull/2691 [SPARK-3270] Spark API for Application Extensions SPARK-3270: Initial proposal of application extensions. The change set introduces: * Spark extension API to implement * hook into Executor to handle extension lifecycle * a method to specify extension via SparkConf * a 'spark.extensions' configuration variable to pass extension list to spark context * a test verifying that extension is correctly started inside executor lifecycle For more details please folow SPARK-3270 or design document https://docs.google.com/document/d/1dHF9zi7GzFbYnbV2PwaOQ2eLPoTeiN9IogUe4PAOtrQ/edit?usp=sharing You can merge this pull request into a Git repository by running: $ git pull https://github.com/0xdata/perrier core_ext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2691.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2691 commit 255357e7f1451b592bdd7374b5007aa3ce63690b Author: mmalohlava michal.malohl...@gmail.com Date: 2014-10-03T01:53:02Z SPARK-3270: Initial proposal of application extensions. The commit introduces: - Spark extension API to implement - hook into Executor to handle extension lifecycle - a method to specify extension via SparkConf - a 'spark.extensions' configuration variable to pass extension list to spark context For more details please folow SPARK-3270 or design document https://docs.google.com/document/d/1dHF9zi7GzFbYnbV2PwaOQ2eLPoTeiN9IogUe4PAOtrQ/edit?usp=sharing commit 532d352936b47c9b38635976aa33e9010fd6e81a Author: mmalohlava michal.malohl...@gmail.com Date: 2014-10-07T00:02:06Z SPARK-3270 : test suite for application extension The basic test suite verifying that a given extension is started on all executors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3270] Spark API for Application Extensi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2691#issuecomment-58146366 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3270] Spark API for Application Extensi...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2691#issuecomment-58146963 This seems quite heavyweight compared to Patrick's suggestion of just using a static object. Why the need for custom logic to load classes? (which even opens up security questions) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412] [PySpark] Replace Epydoc with Sph...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2689#issuecomment-58148670 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21367/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412] [PySpark] Replace Epydoc with Sph...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2689#issuecomment-58148662 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21367/consoleFull) for PR 2689 at commit [`d5b874a`](https://github.com/apache/spark/commit/d5b874a1dd0f49e1dee84746ef64ec08efeccaf9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2624#issuecomment-58149038 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull) for PR 2624 at commit [`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2624#issuecomment-58149042 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21368/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2690#issuecomment-58150433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21369/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2690#issuecomment-58150425 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21369/consoleFull) for PR 2690 at commit [`dd87480`](https://github.com/apache/spark/commit/dd874805678629df0fdb489644a1330ff35c94a4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] Filter rule Improvement and bool ...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2692 [SPARK-3831] Filter rule Improvement and bool expression optimization. If we write the filter which is always FALSE like SELECT * from person WHERE FALSE; 200 tasks will run. I think, 1 task is enough. And current optimizer cannot optimize the case NOT is duplicated like SELECT * from person WHERE NOT ( NOT (age 30)); The filter rule above should be simplified You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-3831 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2692.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2692 commit 8ea872b0131f75ae0797add9bda6dbc79d92736a Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-10-07T12:34:06Z Fixed the number of tasks when the data of LocalRelation is empty. Added optimization rule related to bool expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58178035 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21370/consoleFull) for PR 2692 at commit [`8ea872b`](https://github.com/apache/spark/commit/8ea872b0131f75ae0797add9bda6dbc79d92736a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58179115 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21370/consoleFull) for PR 2692 at commit [`8ea872b`](https://github.com/apache/spark/commit/8ea872b0131f75ae0797add9bda6dbc79d92736a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58179119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21370/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58183559 We had a build against the spark master on Oct 2, and when ran our application with data around 600GB, we got the following exception. Does this PR fix this issue which is seen by @JoshRosen Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 8312, ams03-002.ff): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1004) org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89) org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58186913 @sarutak LGTM. Can you take a look at the failing test? The log is ``` [info] - NOT (i 88) *** FAILED *** [info] 2 did not equal 10 Wrong number of read batches (PartitionBatchPruningSuite.scala:91) ``` Seems we need to update the test suite since with your change, we can handle this predicate when doing batch pruning for cached tables. Also, it will be good to add another case involving `NOT` into unsupported predicate if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: code style format
Github user shijinkui commented on the pull request: https://github.com/apache/spark/pull/2643#issuecomment-58187162 in the intellij IDEA, too much yellow tips to fix. after changing, the code looks better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL] Fixes test suites in hive-th...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2675#issuecomment-58188701 Hi @liancheng, I think i get the root cause here. In TestHive.scala we reset the log4j level ``` // HACK: Hive is too noisy by default. org.apache.log4j.LogManager.getCurrentLoggers.foreach { log = log.asInstanceOf[org.apache.log4j.Logger].setLevel(org.apache.log4j.Level.WARN) } ``` So here the level is WARN and the process will not loginfo ThriftBinaryCLIService listening on, that lead to time out exception and test failed. Maybe we should reset log4j level here to test this:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58190309 @yhuai Thanks picking this PR up and for your comment! I'll try soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2693 [SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10 In Breeze 0.10, the L1regParam can be configured through anonymous function in OWLQN, and each component can be penalized differently. This is required for GLMNET in MLlib with L1/L2 regularization. https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai/spark breeze0.10 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2693.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2693 commit 7a0c45cda7d388152774722a2f6728294cc81b4e Author: DB Tsai dbt...@dbtsai.com Date: 2014-10-07T14:20:41Z In Breeze 0.10, the L1regParam can be configured through anonymous function in OWLQN, and each component can be penalized differently. This is required for GLMNET in MLlib with L1/L2 regularization. https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58192163 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21371/consoleFull) for PR 2693 at commit [`7a0c45c`](https://github.com/apache/spark/commit/7a0c45cda7d388152774722a2f6728294cc81b4e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2577#issuecomment-58197006 Thanks @andrewor14. I've merged this into 1.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2577 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58201237 It could be fixed by https://github.com/apache/spark/pull/2624 It's strange that I can not see this comment on PR #2030. On Tue, Oct 7, 2014 at 6:28 AM, DB Tsai notificati...@github.com wrote: We had a build against the spark master on Oct 2, and when ran our application with data around 600GB, we got the following exception. Does this PR fix this issue which is seen by @JoshRosen https://github.com/JoshRosen Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 8312, ams03-002.ff): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1004) org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89) org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: -- Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2030#issuecomment-58183559. -- - Davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58202720 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21371/consoleFull) for PR 2693 at commit [`7a0c45c`](https://github.com/apache/spark/commit/7a0c45cda7d388152774722a2f6728294cc81b4e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58202731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21371/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58203345 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21372/consoleFull) for PR 2692 at commit [`a11b9f3`](https://github.com/apache/spark/commit/a11b9f31751f23ba306c2549108a3c6ab47191fe). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests
Github user jameszhouyi commented on the pull request: https://github.com/apache/spark/pull/2646#issuecomment-58205767 Hi @davies @JoshRosen Found below errors after add 'time' in run-tests Running PySpark tests. Output is in python/unit-tests.log. Testing with Python version: Python 2.6.6 Run core tests ... Running test: pyspark/rdd.py ./python/run-tests: line 37: time: command not found ./python/run-tests: line 37: time: command not found --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58209751 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21373/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58209752 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21374/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58209766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21376/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58209763 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21375/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58210133 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58210179 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58210228 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58210261 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2692#discussion_r18529676 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala --- @@ -67,10 +67,11 @@ class PartitionBatchPruningSuite extends FunSuite with BeforeAndAfterAll with Be checkBatchPruning(i 8 AND i = 21, 9 to 21, 2, 3) checkBatchPruning(i 2 OR i 99, Seq(1, 100), 2, 2) checkBatchPruning(i 2 OR (i 78 AND i 92), Seq(1) ++ (79 to 91), 3, 4) + checkBatchPruning(NOT (i 88), 88 to 100, 1, 2) // With unsupported predicate checkBatchPruning(i 12 AND i IS NOT NULL, 1 to 11, 1, 2) - checkBatchPruning(NOT (i 88), 88 to 100, 5, 10) + checkBatchPruning(NOT (i in (1)), 2 to 100, 5, 10) --- End diff -- How about ``` checkBatchPruning(sNOT (i in (${(1 to 30).mkString(,)})), 31 to 100, 5, 10) ``` For this case, we will read 4 partitions including 7 batches when we can support it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58210908 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21379/consoleFull) for PR 2612 at commit [`b0585da`](https://github.com/apache/spark/commit/b0585da796aeb91957956f61d97fa98953d1c5e5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58210898 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21377/consoleFull) for PR 2647 at commit [`ad1f96e`](https://github.com/apache/spark/commit/ad1f96ea36f7a4750d6fdaf3ab91239a20a7e6a1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-5827 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21372/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-5823 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21372/consoleFull) for PR 2692 at commit [`a11b9f3`](https://github.com/apache/spark/commit/a11b9f31751f23ba306c2549108a3c6ab47191fe). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2612#discussion_r18530534 --- Diff: dev/lint-windows-cmd --- @@ -0,0 +1,40 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +SCRIPT_DIR=$( cd $( dirname $0 ) pwd ) +SPARK_ROOT_DIR=$(dirname $SCRIPT_DIR) +TARGET_DIR=$SPARK_ROOT_DIR/bin +HAS_ERROR=0 + +# check whether all of lines ends with CRLF. +for file in $TARGET_DIR/*.cmd ; do + grep ^.*$'\r'$ $file /dev/null + if [ $? -ne 0 ]; then +HAS_ERROR=1 +echo $file has line(s) not ends with CRLF. + fi +done + +if [ $HAS_ERROR -eq 0 ];then + echo -e Windows batch file style checks passed. +else + echo -e Windows batch file style checks failed. --- End diff -- Looks like there's an extra space here between `style` and `checks`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2612#discussion_r18530666 --- Diff: dev/lint-windows-cmd --- @@ -0,0 +1,40 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +SCRIPT_DIR=$( cd $( dirname $0 ) pwd ) +SPARK_ROOT_DIR=$(dirname $SCRIPT_DIR) +TARGET_DIR=$SPARK_ROOT_DIR/bin +HAS_ERROR=0 + +# check whether all of lines ends with CRLF. +for file in $TARGET_DIR/*.cmd ; do + grep ^.*$'\r'$ $file /dev/null + if [ $? -ne 0 ]; then --- End diff -- In Bash it's a good practice to always quote tested variables. So `$? -eq 0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2612#discussion_r18530699 --- Diff: dev/lint-windows-cmd --- @@ -0,0 +1,40 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +SCRIPT_DIR=$( cd $( dirname $0 ) pwd ) +SPARK_ROOT_DIR=$(dirname $SCRIPT_DIR) +TARGET_DIR=$SPARK_ROOT_DIR/bin +HAS_ERROR=0 + +# check whether all of lines ends with CRLF. +for file in $TARGET_DIR/*.cmd ; do + grep ^.*$'\r'$ $file /dev/null + if [ $? -ne 0 ]; then +HAS_ERROR=1 +echo $file has line(s) not ends with CRLF. + fi +done + +if [ $HAS_ERROR -eq 0 ];then --- End diff -- Same here. I suggest quoting both terms in the comparison. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58211878 @yhuai Thanks, it make sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58210929 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21380/consoleFull) for PR 2520 at commit [`de91bbd`](https://github.com/apache/spark/commit/de91bbd37d0986abc8d154efde2418e07b685eb0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58212070 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21378/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58212418 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21381/consoleFull) for PR 2692 at commit [`23c750c`](https://github.com/apache/spark/commit/23c750cd5eb883171737d1a622fd30954315232a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58212737 Thanks @nchammas , I've done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58212936 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58213209 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21382/consoleFull) for PR 2612 at commit [`cfaa176`](https://github.com/apache/spark/commit/cfaa176a299b4c7b3f02e7dc8bf35627997021c5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58214144 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21383/consoleFull) for PR 2661 at commit [`8b64bb7`](https://github.com/apache/spark/commit/8b64bb7feb0ddea9f573cabfd96150bce673aa31). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58214186 I thought it was a close issue, so I moved my comment to JIRA. I ran into this issue in spark-shell not the standalone application, does SPARK-3762 apply in this situation? Thanks. Sent from my Google Nexus 5 On Oct 7, 2014 5:17 PM, Davies Liu notificati...@github.com wrote: It could be fixed by https://github.com/apache/spark/pull/2624 It's strange that I can not see this comment on PR #2030. On Tue, Oct 7, 2014 at 6:28 AM, DB Tsai notificati...@github.com wrote: We had a build against the spark master on Oct 2, and when ran our application with data around 600GB, we got the following exception. Does this PR fix this issue which is seen by @JoshRosen https://github.com/JoshRosen Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 8312, ams03-002.ff): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1004) org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89) org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: -- Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2030#issuecomment-58183559. -- - Davies — Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2030#issuecomment-58201237. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2646#issuecomment-58214253 What shell are you running it in? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58216712 Thanks for the explanation. The issue with the Scala versions makes sense. What threw me off was the Hadoop example: I've always seen people say that the Spark API is independent of the Hadoop version, and they should explicitly say the Hadoop version they want in their projects (and have a matching Spark deployment). So explaining this PR in terms of publishing a different Hadoop versions sounds a little bit at odds with that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3710] Fix Yarn integration tests on Had...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2682#issuecomment-58216947 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21384/consoleFull) for PR 2682 at commit [`701d4fb`](https://github.com/apache/spark/commit/701d4fb9fbeb52856ab4611b00f2ecfb35cc9e88). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750] support https in spark web ui
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1980#discussion_r18532610 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -205,10 +231,74 @@ private[spark] object JettyUtils extends Logging { ServerInfo(server, boundPort, collection) } + // to generate a new url string scheme://server:port+path --- End diff -- Hi @scwf, The reason I asked for a comment is that it seems like the method is doing a little more than just that. For example, L238 seems to be doing some sort of parsing of the `server` string, so it's more than just concatenating the different arguments into a URL. It would be nice if the comment explained exactly what the relationship between the input and the output is. A unit test wouldn't hurt either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58218797 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21385/consoleFull) for PR 1269 at commit [`cb951cc`](https://github.com/apache/spark/commit/cb951cc3693bec9e1694efd25db0a599869899b5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58218990 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21385/consoleFull) for PR 1269 at commit [`cb951cc`](https://github.com/apache/spark/commit/cb951cc3693bec9e1694efd25db0a599869899b5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable` * `class DocumentParameters(val document: Document,` * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)` * `class PLSA(@transient protected val sc: SparkContext,` * `class RobustDocumentParameters(document: Document,` * `class RobustGlobalParameters(phi : Array[Array[Float]],` * `class RobustPLSA(@transient protected val sc: SparkContext,` * `trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification ` * `class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)` * `class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer` * `trait TopicsRegularizer extends MatrixInPlaceModification ` * `class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer ` * `class UniformTopicRegularizer extends TopicsRegularizer ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58218992 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21385/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58220390 Unfortunately, our cluster is unavailable due to some technical issues. Probably, the problem you report is related to the fact that `backgound : Array[Float]` in the line ``` val newParameters = parameters.map(parameter = parameter.getNewTheta(topicsBC, background, eps, gamma)).cache() ``` is serialized with the task. But it's clear that `backgound` variable should be approximately 0.5 MB, and I still have no idea why does the task grow up to several MB. I don't understand why it grows neither. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58220515 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21381/consoleFull) for PR 2692 at commit [`23c750c`](https://github.com/apache/spark/commit/23c750cd5eb883171737d1a622fd30954315232a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58220527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21381/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/2692#issuecomment-58221944 LGTM cc @marmbrus. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58221886 @dbtsai Could you check whether there is any dependency change in breeze-0.10 and the number of files in breeze-0.10 jar? Does it compatible with both Scala 2.10 and 2.11? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58222074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21380/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58222518 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21377/consoleFull) for PR 2647 at commit [`ad1f96e`](https://github.com/apache/spark/commit/ad1f96ea36f7a4750d6fdaf3ab91239a20a7e6a1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58222529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21377/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58222629 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21379/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58222616 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21379/consoleFull) for PR 2612 at commit [`b0585da`](https://github.com/apache/spark/commit/b0585da796aeb91957956f61d97fa98953d1c5e5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18534397 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -142,48 +151,56 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * Tries to reuse as much of the data already in memory as possible, by not reading * applications that haven't been updated since last time the logs were checked. */ - private def checkForLogs() = { + private[history] def checkForLogs() = { lastLogCheckTimeMs = getMonotonicTimeMs() logDebug(Checking for logs. Time is now %d..format(lastLogCheckTimeMs)) -try { - val logStatus = fs.listStatus(new Path(resolvedLogDir)) - val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]() - // Load all new logs from the log directory. Only directories that have a modification time - // later than the last known log directory will be loaded. +def getModificationTime(fsEntry: FileStatus) = { --- End diff -- Are we adding return types for every method now? I thought we were only doing this for public ones. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18534494 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -142,48 +151,56 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * Tries to reuse as much of the data already in memory as possible, by not reading * applications that haven't been updated since last time the logs were checked. */ - private def checkForLogs() = { + private[history] def checkForLogs() = { lastLogCheckTimeMs = getMonotonicTimeMs() logDebug(Checking for logs. Time is now %d..format(lastLogCheckTimeMs)) -try { - val logStatus = fs.listStatus(new Path(resolvedLogDir)) - val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]() - // Load all new logs from the log directory. Only directories that have a modification time - // later than the last known log directory will be loaded. +def getModificationTime(fsEntry: FileStatus) = { + if (fsEntry.isDir) { +fs.listStatus(fsEntry.getPath).map(_.getModificationTime()).max + } else { +fsEntry.getModificationTime() + } +} + +try { var newLastModifiedTime = lastModifiedTime - val logInfos = logDirs -.filter { dir = - if (fs.isFile(new Path(dir.getPath(), EventLoggingListener.APPLICATION_COMPLETE))) { -val modTime = getModificationTime(dir) + val logInfos = fs.listStatus(new Path(logDir)) +.filter { entry = --- End diff -- That makes the alignment of flatMap / sortBy really weird. Do you have an example of what you have in mind so I can follow it? Others I have found follow this style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58222849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21386/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58222059 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21380/consoleFull) for PR 2520 at commit [`de91bbd`](https://github.com/apache/spark/commit/de91bbd37d0986abc8d154efde2418e07b685eb0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18534683 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -142,48 +151,56 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * Tries to reuse as much of the data already in memory as possible, by not reading * applications that haven't been updated since last time the logs were checked. */ - private def checkForLogs() = { + private[history] def checkForLogs() = { lastLogCheckTimeMs = getMonotonicTimeMs() logDebug(Checking for logs. Time is now %d..format(lastLogCheckTimeMs)) -try { - val logStatus = fs.listStatus(new Path(resolvedLogDir)) - val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]() - // Load all new logs from the log directory. Only directories that have a modification time - // later than the last known log directory will be loaded. +def getModificationTime(fsEntry: FileStatus) = { + if (fsEntry.isDir) { +fs.listStatus(fsEntry.getPath).map(_.getModificationTime()).max + } else { +fsEntry.getModificationTime() + } +} + +try { var newLastModifiedTime = lastModifiedTime - val logInfos = logDirs -.filter { dir = - if (fs.isFile(new Path(dir.getPath(), EventLoggingListener.APPLICATION_COMPLETE))) { -val modTime = getModificationTime(dir) + val logInfos = fs.listStatus(new Path(logDir)) +.filter { entry = + val isLogEntry = +if (entry.isDir()) { + fs.exists(new Path(entry.getPath(), APPLICATION_COMPLETE)) +} else { + !entry.getPath().getName().endsWith(EventLoggingListener.IN_PROGRESS) +} + + if (isLogEntry) { +val modTime = getModificationTime(entry) newLastModifiedTime = math.max(newLastModifiedTime, modTime) -modTime lastModifiedTime +modTime = lastModifiedTime } else { false } } -.flatMap { dir = +.flatMap { entry = try { -val (replayBus, appListener) = createReplayBus(dir) -replayBus.replay() +val appListener = replay(entry, new ReplayListenerBus()) Some(new FsApplicationHistoryInfo( - dir.getPath().getName(), - appListener.appId.getOrElse(dir.getPath().getName()), + entry.getPath().getName(), + appListener.appId.getOrElse(entry.getPath().getName()), appListener.appName.getOrElse(NOT_STARTED), appListener.startTime.getOrElse(-1L), appListener.endTime.getOrElse(-1L), - getModificationTime(dir), + getModificationTime(entry), appListener.sparkUser.getOrElse(NOT_STARTED))) } catch { case e: Exception = - logInfo(sFailed to load application log data from $dir., e) + logInfo(sFailed to load application log data from $entry., e) None } } .sortBy { info = -info.endTime } - lastModifiedTime = newLastModifiedTime --- End diff -- Oops. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3133] embed small object in broadcast t...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2681#discussion_r18534899 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -161,6 +178,10 @@ private[spark] class TorrentBroadcast[T: ClassTag]( _value = x.asInstanceOf[T] case None = + if (numBlocks == 0) { --- End diff -- when will this ever happen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3133] embed small object in broadcast t...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2681#discussion_r18534919 --- Diff: core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala --- @@ -257,7 +257,7 @@ class BroadcastSuite extends FunSuite with LocalSparkContext { new SparkContext(local, test, broadcastConf) } val blockManagerMaster = sc.env.blockManager.master -val list = List[Int](1, 2, 3, 4) +val list = (1 to 4096).toList --- End diff -- can u make sure we have unit tests for both cases? i.e. small broadcast and large ones. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18535127 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -214,29 +231,64 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } } - private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, ApplicationEventListener) = { -val path = logDir.getPath() -val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs) -val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, elogInfo.compressionCodec) -val appListener = new ApplicationEventListener -replayBus.addListener(appListener) -(replayBus, appListener) + private def replay(logPath: FileStatus, bus: ReplayListenerBus): ApplicationEventListener = { +val (logInput, sparkVersion) = + if (logPath.isDir()) { +openOldLog(logPath.getPath()) + } else { +EventLoggingListener.openEventLog(logPath.getPath(), fs) + } +try { + val appListener = new ApplicationEventListener + bus.addListener(appListener) + bus.replay(logInput, sparkVersion) + appListener +} finally { + logInput.close() +} } - /** Return when this directory was last modified. */ - private def getModificationTime(dir: FileStatus): Long = { -try { - val logFiles = fs.listStatus(dir.getPath) - if (logFiles != null !logFiles.isEmpty) { -logFiles.map(_.getModificationTime).max - } else { -dir.getModificationTime + /** + * Load the app log information from a Spark 1.0.0 log directory, for backwards compatibility. + * This assumes that the log directory contains a single event log file, which is the case for + * directories generated by the code in that release. + */ + private[history] def openOldLog(dir: Path): (InputStream, String) = { --- End diff -- Why? EventLoggingListener nor any of its callers need to deal with legacy event logs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58224711 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21382/consoleFull) for PR 2612 at commit [`cfaa176`](https://github.com/apache/spark/commit/cfaa176a299b4c7b3f02e7dc8bf35627997021c5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58224721 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21382/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-58224804 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18535195 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -688,41 +691,34 @@ private[spark] class Master( def rebuildSparkUI(app: ApplicationInfo): Boolean = { val appName = app.desc.name val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + /not-found -val eventLogDir = app.desc.eventLogDir.getOrElse { - // Event logging is not enabled for this application - app.desc.appUiUrl = notFoundBasePath - return false -} - -val appEventLogDir = EventLoggingListener.getLogDirPath(eventLogDir, app.id) -val fileSystem = Utils.getHadoopFileSystem(appEventLogDir, - SparkHadoopUtil.get.newConfiguration(conf)) -val eventLogInfo = EventLoggingListener.parseLoggingInfo(appEventLogDir, fileSystem) -val eventLogPaths = eventLogInfo.logPaths -val compressionCodec = eventLogInfo.compressionCodec - -if (eventLogPaths.isEmpty) { - // Event logging is enabled for this application, but no event logs are found - val title = sApplication history not found (${app.id}) - var msg = sNo event logs found for application $appName in $appEventLogDir. - logWarning(msg) - msg += Did you specify the correct logging directory? - msg = URLEncoder.encode(msg, UTF-8) - app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title - return false -} +val eventLogFile = app.desc.eventLogFile.getOrElse { return false } try { - val replayBus = new ReplayListenerBus(eventLogPaths, fileSystem, compressionCodec) - val ui = new SparkUI(new SparkConf, replayBus, appName + (completed), -HistoryServer.UI_PATH_PREFIX + s/${app.id}) - replayBus.replay() + val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf) + val (logInput, sparkVersion) = EventLoggingListener.openEventLog(new Path(eventLogFile), fs) + val replayBus = new ReplayListenerBus() --- End diff -- No, because I changed that signature. The event stream is now passed in the `replay()` method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-58225142 Ah so you really mean when using viewFS. You can use federation without viewfs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58225042 Hi @pwendell, I had a similar issue related to artifacts in Maven Central and Hadoop versions. Could you take a look at [SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764) and #2638 please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18535305 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -688,41 +691,34 @@ private[spark] class Master( def rebuildSparkUI(app: ApplicationInfo): Boolean = { val appName = app.desc.name val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + /not-found -val eventLogDir = app.desc.eventLogDir.getOrElse { - // Event logging is not enabled for this application - app.desc.appUiUrl = notFoundBasePath - return false -} - -val appEventLogDir = EventLoggingListener.getLogDirPath(eventLogDir, app.id) -val fileSystem = Utils.getHadoopFileSystem(appEventLogDir, - SparkHadoopUtil.get.newConfiguration(conf)) -val eventLogInfo = EventLoggingListener.parseLoggingInfo(appEventLogDir, fileSystem) -val eventLogPaths = eventLogInfo.logPaths -val compressionCodec = eventLogInfo.compressionCodec - -if (eventLogPaths.isEmpty) { - // Event logging is enabled for this application, but no event logs are found - val title = sApplication history not found (${app.id}) - var msg = sNo event logs found for application $appName in $appEventLogDir. - logWarning(msg) - msg += Did you specify the correct logging directory? - msg = URLEncoder.encode(msg, UTF-8) - app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title - return false -} +val eventLogFile = app.desc.eventLogFile.getOrElse { return false } try { - val replayBus = new ReplayListenerBus(eventLogPaths, fileSystem, compressionCodec) - val ui = new SparkUI(new SparkConf, replayBus, appName + (completed), -HistoryServer.UI_PATH_PREFIX + s/${app.id}) - replayBus.replay() + val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf) + val (logInput, sparkVersion) = EventLoggingListener.openEventLog(new Path(eventLogFile), fs) + val replayBus = new ReplayListenerBus() + val ui = new SparkUI(new SparkConf, replayBus, appName + (completed), /history/ + app.id) + try { +replayBus.replay(logInput, sparkVersion) + } finally { +logInput.close() + } + appIdToUI(app.id) = ui webUi.attachSparkUI(ui) // Application UI is successfully rebuilt, so link the Master UI to it - app.desc.appUiUrl = ui.getBasePath + app.desc.appUiUrl = ui.basePath true } catch { + case fnf: FileNotFoundException = +// Event logging is enabled for this application, but no event logs are found +val title = sApplication history not found (${app.id}) +var msg = sNo event logs found for application $appName in $eventLogFile. +logWarning(msg) +msg += Did you specify the correct logging directory? +msg = URLEncoder.encode(msg, UTF-8) +app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title +false --- End diff -- I disagree. `if (file exists)` checks are racy, and entail more RPCs to the NN. And we're really interested in handling that particular exception, so I don't see any advantage in the explicit check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/1222#discussion_r18535329 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -58,43 +61,79 @@ private[spark] class EventLoggingListener( private val shouldOverwrite = sparkConf.getBoolean(spark.eventLog.overwrite, false) private val testing = sparkConf.getBoolean(spark.eventLog.testing, false) private val outputBufferSize = sparkConf.getInt(spark.eventLog.buffer.kb, 100) * 1024 - val logDir = EventLoggingListener.getLogDirPath(logBaseDir, appId) - val logDirName: String = logDir.split(/).last - protected val logger = new FileLogger(logDir, sparkConf, hadoopConf, outputBufferSize, -shouldCompress, shouldOverwrite, Some(LOG_FILE_PERMISSIONS)) + private val fileSystem = Utils.getHadoopFileSystem(new URI(logBaseDir), hadoopConf) + + // Only defined if the file system scheme is not local + private var hadoopDataStream: Option[FSDataOutputStream] = None + + // The Hadoop APIs have changed over time, so we use reflection to figure out + // the correct method to use to flush a hadoop data stream. See SPARK-1518 + // for details. + private val hadoopFlushMethod = { --- End diff -- This is how the code was before. I'm just moving it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58225743 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21383/consoleFull) for PR 2661 at commit [`8b64bb7`](https://github.com/apache/spark/commit/8b64bb7feb0ddea9f573cabfd96150bce673aa31). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58225756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21383/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org