[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16927 I'm worrying about this now: have my attempts to fix the messages gone horribly wrong. Admittedly, it was sitting in a Budapest airport with a post-ApacheCon hangover, but @afs was giving

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-24 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r102977480 --- Diff: docs/streaming-programming-guide.md --- @@ -630,35 +630,106 @@ which creates a DStream from text data received over a TCP socket

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen: reviewed this, tweaked the docs slightly but otherwise, there's nothing left to do that I can see --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16927 I could add the 2.6.5 binaries if you want, though the 2.6.4 ones should be compatible. I think I just lifted the 2.6.x artifacts out of an HDP build; its only the 2.7.x ones where I

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 (apologies for not replying; rebuilding a deceased laptop) My main concern is to have the ability to make spark releases which include the object store client libraries and a set

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-02-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Still waiting reviews for this. Anyone? Ideally before my forthcoming Spark Summit talk... --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16815 LGTM, though checkpointing to S3 has its own separate issues related to rename performance and listing inconsistency. While this fix lets people request different filesystems for the data

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 @nchammas sorry, should be clearer: "you must never use an aws-sdk version other than the one hadoop-aws was built with, else things will break". if you pull in hadoop-aws, th

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-01-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r98449488 --- Diff: pom.xml --- @@ -2586,6 +2591,100 @@ 3.4.6 2.6.0

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-01-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r98448893 --- Diff: cloud/src/test/scala/org/apache/spark/cloud/AzureInstantiationSuite.scala --- @@ -0,0 +1,29 @@ +/* --- End diff

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 @nchammas the AWS SDK you get will be in sync with hadoop-aws; you have to keep them in sync. what is more brittle is the transients: httpclient, joda time, jackson, etc, which

[GitHub] spark issue #9168: [SPARK-11182] HDFS Delegation Token will be expired when ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9168 looking at the HDFS patch, it's in branch-2.9. We could backport to branch-2.8, though it's too late to get into the 2.8.0 RC --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen I've updated it. Note that [HADOOP-13946](https://issues.apache.org/jira/browse/HADOOP-13946) tracks the changes in the Hadoop docs, which writes down what HDFS actually does

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r97367042 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -196,29 +191,29 @@ class FileInputDStream[K, V, F

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r97363419 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -235,18 +236,97 @@ class InputStreamsSuite extends

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 let me do a quick review & update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fea

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 latest patch: has updated the dependency settings. As noted, works for Hadoop versions from 2.7 to 3.0.2-alpha & the HADOOP-13345 branch, at least if you build the last

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Here's why this matters, and why a simple "isn't this just a matter of dropping in the JARs" isn't the solution: *getting getting the right jars together with the right spa

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 this patch is ready for review. Anyone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r95079472 --- Diff: docs/streaming-programming-guide.md --- @@ -630,35 +630,106 @@ which creates a DStream from text data received over a TCP socket

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Sean, I think I've managed to delete the lines where you were asking about globs > Am I right that the net change here is not an optimization but an expansion of the behav

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r94410195 --- Diff: docs/streaming-programming-guide.md --- @@ -644,17 +644,90 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r94407382 --- Diff: docs/streaming-programming-guide.md --- @@ -644,17 +644,90 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r94407115 --- Diff: docs/streaming-programming-guide.md --- @@ -644,17 +644,90 @@ methods for creating DStreams from files as input sources

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-12-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-12-09 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 Test failure is pretty unlikely to be related. Looks more like a timing or timeout problem. ``` org.apache.spark.rdd.AsyncRDDActionsSuite.async failure handling Failing

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-12-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-12-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 stylecheck; unexpected, as I thought I'd run them in the `mvn install` of the module. ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark

[GitHub] spark pull request #13579: [SPARK-15844] [core] HistoryServer doesn't come u...

2016-12-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13579#discussion_r91548419 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -289,6 +289,30 @@ object HistoryServer extends Logging

[GitHub] spark pull request #13579: [SPARK-15844] [core] HistoryServer doesn't come u...

2016-12-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13579#discussion_r91546615 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala --- @@ -349,6 +349,17 @@ class HistoryServerSuite extends

[GitHub] spark pull request #13579: [SPARK-15844] [core] HistoryServer doesn't come u...

2016-12-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13579#discussion_r91546575 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala --- @@ -349,6 +349,17 @@ class HistoryServerSuite extends

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-12-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 yeah, I've just got so many other distractions. Let me do it again while tests run in different windows --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-12-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Test failure due to new artifacts ``` +++ b/dev/pr-deps/spark-deps-hadoop-2.7 @@ -16,8 +16,6 @@ arpack_combined_all-0.1.jar avro-1.7.7.jar avro-ipc-1.7.7.jar avro

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-12-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 the latest patch moves to the suggested name `spark-hadoop-cloud`; the external test repo is in sync. Those test are all working happily against s3 ireland, Azure and rackspace swift

[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...

2016-12-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16089 I ask about committers as I'm staring at the V1 and V2 committer APIs right now related to S3 destinations; not directly related to this though. --- If your project is set up for it, you can

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90502669 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -194,4 +194,8 @@ private[sql] class

[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...

2016-12-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16089 AFAIK, the big thing the FileOutputFormat really adds is not the compression, but the output committer and the stuff to go with that (working directories, paths, etc etc). If you aren't going

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90499765 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -147,6 +147,17 @@ public void writeTo(ByteBuffer buffer

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90497882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -194,4 +194,8 @@ private[sql] class

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-12-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 The latest patch 1. keeps the cloud package separate from hadoop-2.7. This is important avoid outstanding problems related to org.json licensed artifacts in the aws SDK JARs

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-11-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r89839965 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala --- @@ -441,6 +441,44 @@ class

[GitHub] spark pull request #15594: [SPARK-18061][SQL][Security] Spark Thriftserver n...

2016-11-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15594#discussion_r89787175 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala --- @@ -57,7 +59,24 @@ private[hive

[GitHub] spark pull request #15991: [SPARK-17843][WEB UI] Indicate event logs pending...

2016-11-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15991#discussion_r89786503 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala --- @@ -33,21 +33,40 @@ private[history] class HistoryPage(parent

[GitHub] spark issue #15648: [SPARK-18119][SPARK-CORE] Namenode safemode check is onl...

2016-11-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15648 LGTM, as the javadocs say *If true check only for Active NNs status, else check first NN's status*. But I don't know enough about HDFS HA to be It'll check the first NN

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89402090 --- Diff: pom.xml --- @@ -2558,6 +2660,26 @@ +

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89352877 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89346124 --- Diff: cloud/src/test/scala/org/apache/spark/cloud/AzureInstantiationSuite.scala --- @@ -0,0 +1,29 @@ +/* --- End diff

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89340962 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89340299 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89340198 --- Diff: docs/storage-openstack-swift.md --- @@ -19,41 +19,32 @@ Although not mandatory, it is recommended to configure the proxy server of Swift

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89315595 --- Diff: pom.xml --- @@ -2558,6 +2660,26 @@ +

[GitHub] spark issue #15984: [SPARK-18551] [Web UI] [Core] [WIP] Add functionality to...

2016-11-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15984 Like you note, tests will be good here. Don't forget the corner cases: unknown app, duplicate POSTs, known app but unknown attempt. I'm also curious about what the policy would

[GitHub] spark pull request #15984: [SPARK-18551] [Web UI] [Core] [WIP] Add functiona...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15984#discussion_r89312793 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -121,6 +123,12 @@ class HistoryServer( def initialize

[GitHub] spark pull request #15984: [SPARK-18551] [Web UI] [Core] [WIP] Add functiona...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15984#discussion_r89312600 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -535,6 +535,26 @@ private[history] class

[GitHub] spark pull request #15991: [SPARK-17843][WEB UI] Indicate event logs pending...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15991#discussion_r89311812 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala --- @@ -33,21 +33,40 @@ private[history] class HistoryPage(parent

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89202156 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89176373 --- Diff: pom.xml --- @@ -2558,6 +2660,26 @@ +

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 This is the patch stripped down to the packaging and some tests to load the direct and indirect dependencies, so verifying that the classpath is valid within the module itself. It also

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-11-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14038 @maropu if you create a PR for your work I'll comment on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15869: [YARN][DOC] Update Yarn configuration doc

2016-11-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15869 The plugin point is more generic than ATS integration; it lets you stick anything in to come up in the driver. Weakness: it's actually yarn specific; I could imagine uses in standalone too

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 2.6 vs 2.7 vs later releases —a moving target, with AWS versions and other issues to worry about. [HADOOP-13687](https://issues.apache.org/jira/browse/HADOOP-13687) is going to add

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 I had something tangible: the integration tests. It's clear those aren't wanted. Now I'm proposing something more minimal, yet still tangible for anyone trying to build spark

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 sean: there's two things: tests and packaging. 1. The packaging has to go in as probably the only way to get whatever spark is built with to be consistent. That includes excluding

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Has anyone had a chance to review this? Is there more clarification needed, or some specific aspect of the patch which needs changing? Without this it is near-impossible to have

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-11-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15594: [SPARK-18061][SQL][Security] Spark Thriftserver needs to...

2016-11-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15594 I'm not spark committer so can't review it well enough to get in; I was just watching it out of concern for the word "kerberos". How about you ask on the spark developer li

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-11-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 Has anyone had a chance to review this. It's nicely self-contained, makes it easier to use Spark as regression testing for ASF prerelease binaries of any dependent project. --- If your

[GitHub] spark pull request #15556: [SPARK-18010][Core] Reduce work performed for bui...

2016-10-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15556#discussion_r84267482 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -43,38 +43,56 @@ private[spark] class ReplayListenerBus

[GitHub] spark issue #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryProviders...

2016-10-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15490 Oh, I see, new UI had meant I'd left the comment partially incomplete. Sorry. Just the one: printing out the actual log dir location. That makes it much easier to identify a configuration

[GitHub] spark pull request #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryPr...

2016-10-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15490#discussion_r83530966 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -262,6 +263,17 @@ private[history] class

[GitHub] spark issue #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryProviders...

2016-10-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15490 @ajbozarth I'm not a spark committer, I'm not capaclbe of getting stuff in. I did dd one comment to some of the code, otherwise nothing I have issues with. LGTM --- If your project is set

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-10-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 that's it warning that the manifest has changed. Which it has: there's now hadoop-azure, hadoop-openstack and hadoop-aws JARs on the CP, along with dependencies (amazon-aws SDK, microsoft

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen have you got any comments on the last patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 I see: you want the HS to set it? Yeah, that would work. I'll change this patch accordingly --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 The main problem here is in a cluster where auth is turned on globally, the HS gets really confused: it's enabled but doesn't have any secrets. This patch sets things up so that even

[GitHub] spark issue #15374: [SPARK-17800] Introduce InterfaceStability annotation

2016-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15374 Interesting to compare this with Hadoop's annotation, where I have mixed opinions. A key advantage Apache spark has is that Scala language lets you really scope out things

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-10-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r82502588 --- Diff: cloud/src/main/scala/org/apache/spark/cloud/s3/S3AConstants.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-10-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 # Packaging: 1. this addresses the problem that it's not always immediately obvious to people what they have to do to get, say s3a working. Do you know precisely which version

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82441129 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2474,36 @@ private[spark] class CallerContext( val context

[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...

2016-10-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15137 I see this in master; but the JIRA associated with the PR is still opened & unversioned. Which version did it make it into? --- If your project is set up for it, you can reply to this e

[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-09-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14644 ...so if you ask for 1 GPU you may only get 0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-09-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14644 1. Any plans to add documentation? 2. What happens if you ask for (any, more) GPUs than there are? 2. If it fails, that could be a good test: ask for a very large number and expect

[GitHub] spark issue #15115: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7....

2016-09-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15115 thanks for clarifying; sometimes I feel that my patches get under reviewed —which holds for Hadoop too, where some have been outstanding for so long they're approach school age

[GitHub] spark issue #15115: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7....

2016-09-19 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15115 This seems a duplicate of the #14827 patch I filed 3 weeks earlier. Is there some aspect of the PR submission process that I'm missing out on? I would like to get my patches

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-09-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r79209668 --- Diff: docs/streaming-programming-guide.md --- @@ -644,13 +644,44 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-09-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r78932943 --- Diff: docs/streaming-programming-guide.md --- @@ -644,13 +644,44 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-09-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r78932779 --- Diff: docs/streaming-programming-guide.md --- @@ -644,13 +644,44 @@ methods for creating DStreams from files as input sources

[GitHub] spark issue #14827: [SPARK-17259] [build] [WiP] Hadoop 2.7 profile to depend...

2016-09-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14827 I don't know what the default Hadoop version should be, that's the kind of thing to discuss on mailing lists personally, I'd rush to make 2.6 the bare minimum version; nobody should

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2016-09-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 Could an automated test be done here. propagation can be tested with a a function run on the executor (such as a map) which fails if the required properties are missing 1. (set

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-09-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14601#discussion_r78521683 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -102,11 +102,20 @@ class SparkHadoopUtil extends Logging

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-09-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14601#discussion_r78520985 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -102,11 +102,20 @@ class SparkHadoopUtil extends Logging

[GitHub] spark issue #14827: [SPARK-17259] [build] [WiP] Hadoop 2.7 profile to depend...

2016-09-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14827 Sean, the reason for a 2.7 profile is more significant with SPARK-7481 and cloud support, as it can explicitly pull in hadoop-azure (2.7+ only) and hadoop-aws (2.6+ only). --- If your

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-09-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r77193230 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -664,6 +707,116 @@ private[history] class

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The latest patch pulls out the shortcutting of the globStatus call if there's no wildcard chars in the path; closer to the original patch --- If your project is set up for it, you can reply

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-31 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r76972524 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -664,6 +707,116 @@ private[history] class

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-31 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r76952864 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -664,6 +707,116 @@ private[history] class

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14659 This context is just something passed over IPC to provide a general string for the audit logs, the main actual access of it is in the HDFS audit log ``` HdfsAuditLogger

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r76952473 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2418,18 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r76951863 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2418,18 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r76773046 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider

<    1   2   3   4   5   6   7   8   9   10   >