[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 (apologies for not replying; rebuilding a deceased laptop) My main concern is to have the ability to make spark releases which include the object store client libraries and a set of

[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16927 I could add the 2.6.5 binaries if you want, though the 2.6.4 ones should be compatible. I think I just lifted the 2.6.x artifacts out of an HDP build; its only the 2.7.x ones where I

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen: reviewed this, tweaked the docs slightly but otherwise, there's nothing left to do that I can see --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-24 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r102977480 --- Diff: docs/streaming-programming-guide.md --- @@ -630,35 +630,106 @@ which creates a DStream from text data received over a TCP socket

[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16927 I'm worrying about this now: have my attempts to fix the messages gone horribly wrong. Admittedly, it was sitting in a Budapest airport with a post-ApacheCon hangover, but @afs was g

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2017-02-25 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 spark.hadoop.fs.* would work. The (not yet shipped in ASF code) Azure Data Lake FS has, for reasons I don't know and have only just noticed, adopted "dfs.adl" as their

[GitHub] spark issue #16990: [SPARK-19660][CORE][SQL] Replace the configuration prope...

2017-02-25 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16990 LGTM, though you'd have to go do the full coverage to verify that there's not a typo in any of the strings. This is why although Spark has adopted the more readable inline strings

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2017-02-25 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 1. It's good to have some tests 2. I note that `appendS3AndSparkHadoopConfigurations()` has a weakness in how it propagates env vars: no propagation of the session enviro

[GitHub] spark pull request #16990: [SPARK-19660][CORE][SQL] Replace the configuratio...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16990#discussion_r103183030 --- Diff: sql/hive/src/test/resources/ql/src/test/queries/clientpositive/smb_mapjoin_25.q --- @@ -19,7 +19,7 @@ select * from (select a.key from

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r103183646 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -140,7 +137,7 @@ class FileInputDStream[K, V, F

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r103184528 --- Diff: docs/streaming-programming-guide.md --- @@ -615,35 +615,114 @@ which creates a DStream from text data received over a TCP socket

[GitHub] spark pull request #16990: [SPARK-19660][CORE][SQL] Replace the configuratio...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16990#discussion_r103185158 --- Diff: sql/hive/src/test/resources/ql/src/test/queries/clientpositive/smb_mapjoin_25.q --- @@ -19,7 +19,7 @@ select * from (select a.key from

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-02-27 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 LGTM. Verified option name in `org.apache.hadoop.fs.s3a.Constants` file; env var name in `com.amazonaws.SDKGlobalConfiguration' --- If your project is set up for it, you can rep

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r94407115 --- Diff: docs/streaming-programming-guide.md --- @@ -644,17 +644,90 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r94407382 --- Diff: docs/streaming-programming-guide.md --- @@ -644,17 +644,90 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r94410195 --- Diff: docs/streaming-programming-guide.md --- @@ -644,17 +644,90 @@ methods for creating DStreams from files as input sources

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Sean, I think I've managed to delete the lines where you were asking about globs > Am I right that the net change here is not an optimization but an expansion of the beh

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r95079472 --- Diff: docs/streaming-programming-guide.md --- @@ -630,35 +630,106 @@ which creates a DStream from text data received over a TCP socket

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 this patch is ready for review. Anyone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82441129 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2474,36 @@ private[spark] class CallerContext( val context

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-10-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 # Packaging: 1. this addresses the problem that it's not always immediately obvious to people what they have to do to get, say s3a working. Do you know precisely which versi

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-10-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r82502588 --- Diff: cloud/src/main/scala/org/apache/spark/cloud/s3/S3AConstants.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15374: [SPARK-17800] Introduce InterfaceStability annotation

2016-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15374 Interesting to compare this with Hadoop's annotation, where I have mixed opinions. A key advantage Apache spark has is that Scala language lets you really scope out thin

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 The main problem here is in a cluster where auth is turned on globally, the HS gets really confused: it's enabled but doesn't have any secrets. This patch sets things up so that

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13579 I see: you want the HS to set it? Yeah, that would work. I'll change this patch accordingly --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen have you got any comments on the last patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-10-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 that's it warning that the manifest has changed. Which it has: there's now hadoop-azure, hadoop-openstack and hadoop-aws JARs on the CP, along with dependencies (amazon-aws SDK,

[GitHub] spark issue #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryProviders...

2016-10-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15490 @ajbozarth I'm not a spark committer, I'm not capaclbe of getting stuff in. I did dd one comment to some of the code, otherwise nothing I have issues with. LGTM --- If your proj

[GitHub] spark pull request #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryPr...

2016-10-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15490#discussion_r83530966 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -262,6 +263,17 @@ private[history] class

[GitHub] spark issue #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryProviders...

2016-10-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15490 Oh, I see, new UI had meant I'd left the comment partially incomplete. Sorry. Just the one: printing out the actual log dir location. That makes it much easier to identify a configur

[GitHub] spark pull request #15556: [SPARK-18010][Core] Reduce work performed for bui...

2016-10-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15556#discussion_r84267482 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -43,38 +43,56 @@ private[spark] class ReplayListenerBus

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-19 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 I haven't forgotten this; I've just been trying to make the module POM-only, while adding support for Hadoop 2.6 builds, which is causing some issues downstream. Specifically, my

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r107001274 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int

[GitHub] spark pull request #17364: [SPARK-20038] [core]: move the currentWriter=null...

2017-03-20 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/17364 [SPARK-20038] [core]: move the currentWriter=null assignments into finally {} … ## What changes were proposed in this pull request? have the

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 Note that as [the exception handler](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L244) tries to close

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 The latest patch embraces the fact that 2.6 is the base hadoop version so the `hadoop-aws` JAR is always pulled in, dependencies set up. One thing to bear in mind here that the [Phase I

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 I haven't reviewed that bit of code: make it a separate JIRA and assign to me. This one I came across in the HADOOP-2.8.0 RC3 testing; the underlying fix there is going in, but the spark

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 Created [SPARK-20045](https://issues.apache.org/jira/browse/SPARK-20045). I think there's room to improve resilience in the abort code, primarily to ensure that the underlying failure

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-03-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r107152263 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala --- @@ -557,4 +557,16 @@ trait TestSuiteBase extends SparkFunSuite

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-03-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r107152771 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -27,7 +27,8 @@ import scala.collection.JavaConverters

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-03-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r107194624 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -27,7 +27,8 @@ import scala.collection.JavaConverters

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 looking some more, yes, as `tryWithSafeFinallyAndFailureCallbacks` wraps task commit, it guarantees that the original cause doesn't get lost. The abortJob code isn't so well gu

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r107381697 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 I don't have a time/plans to do the test here, as it's a fairly complex piece of test setup for what a review should show isn't doing anything other than guarantee

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Is there anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Any comments on the latest patch? Anyone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-02-28 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 I agree. I was just checking the files to make sure the strings were consistent/correct, rather than trusting the documentation --- If your project is set up for it, you can reply to this

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2017-03-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 Style police. FWIW I think the lines that failed were already >100 chars, it was just they got indented slightly more. ``` Scalastyle checks failed at following occurrences: [er

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17120 -1, non binding I understand the rationale for this, to aid migration from s3/s3n to s3a, but given the need is schema independence, you should be using the full path name from

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-03-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 @srowen dont worry, been tracking this: I filed the JIRA. Core code is good (i.e. property/env var names). One thing to bear in mind, the existing code propagates the env vars even

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-03-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 thanks. One thing I realised last night is that logging the session token, even at debug level, would have been a security risk. So it's very good that the log statement got cut, even a

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17120 I know that's the *current* use case, but I'm thinking about future confusion, especially as the use case you espoused, "move from s3n to s3a within the same window" isn

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-04 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r104286483 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -1253,8 +1253,26 @@ class FileStreamSourceSuite

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 It's invariably the transient stuff isn't it? Mvnrepo on Avro 1.8.1 logs [jackson as a a compile time dependency](http://mvnrepository.com/artifact/org.apache.avro/avro/1.8.1

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 Looking @ hadoop source, there's not much in hadoop common terms of use of `import org.apache.avro`, ut the `avro.Utf8` surfaces, and someone has tagged `fs.Path` as `@Stringable`, which

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 Checj with @busbey about binary compatibility with older generated/compiled classes; that's the recurrent problem with protobuf --- If your project is set up for it, you can reply to

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The Hadoop FS Spec has now been updated to declare exactly what HDFS does w.r.t timestamps, and warn that what other filesystems and object stores do are implementation and installation

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 oh, this sucks. Find anyone who experienced "The great protobuf update of 2012" and ask them if they want to do it again. Looking at the issues, AVRO-997 catches out "w

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 FWIW, if there is something related to serialization that people should be pushing for in Hadoop 3, it is making all the little types serializable, such as `Path`, `FileStatus` and the like

[GitHub] spark issue #15869: [YARN][DOC] Update Yarn configuration doc

2016-11-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15869 The plugin point is more generic than ATS integration; it lets you stick anything in to come up in the driver. Weakness: it's actually yarn specific; I could imagine uses in standalon

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-11-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14038 @maropu if you create a PR for your work I'll comment on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 This is the patch stripped down to the packaging and some tests to load the direct and indirect dependencies, so verifying that the classpath is valid within the module itself. It also

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89176373 --- Diff: pom.xml --- @@ -2558,6 +2660,26 @@ +

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89202156 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #15991: [SPARK-17843][WEB UI] Indicate event logs pending...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15991#discussion_r89311812 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala --- @@ -33,21 +33,40 @@ private[history] class HistoryPage(parent

[GitHub] spark pull request #15984: [SPARK-18551] [Web UI] [Core] [WIP] Add functiona...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15984#discussion_r89312600 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -535,6 +535,26 @@ private[history] class

[GitHub] spark pull request #15984: [SPARK-18551] [Web UI] [Core] [WIP] Add functiona...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15984#discussion_r89312793 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -121,6 +123,12 @@ class HistoryServer( def initialize

[GitHub] spark issue #15984: [SPARK-18551] [Web UI] [Core] [WIP] Add functionality to...

2016-11-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15984 Like you note, tests will be good here. Don't forget the corner cases: unknown app, duplicate POSTs, known app but unknown attempt. I'm also curious about what the policy would be i

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89315595 --- Diff: pom.xml --- @@ -2558,6 +2660,26 @@ +

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89340198 --- Diff: docs/storage-openstack-swift.md --- @@ -19,41 +19,32 @@ Although not mandatory, it is recommended to configure the proxy server of Swift

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89340299 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89340962 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89346124 --- Diff: cloud/src/test/scala/org/apache/spark/cloud/AzureInstantiationSuite.scala --- @@ -0,0 +1,29 @@ +/* --- End diff -- In the

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89352877 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,953 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2016-11-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r89402090 --- Diff: pom.xml --- @@ -2558,6 +2660,26 @@ +

[GitHub] spark issue #15648: [SPARK-18119][SPARK-CORE] Namenode safemode check is onl...

2016-11-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15648 LGTM, as the javadocs say *If true check only for Active NNs status, else check first NN's status*. But I don't know enough about HDFS HA to be It'll check the first

[GitHub] spark pull request #15991: [SPARK-17843][WEB UI] Indicate event logs pending...

2016-11-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15991#discussion_r89786503 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala --- @@ -33,21 +33,40 @@ private[history] class HistoryPage(parent

[GitHub] spark pull request #15594: [SPARK-18061][SQL][Security] Spark Thriftserver n...

2016-11-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/15594#discussion_r89787175 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala --- @@ -57,7 +59,24 @@ private[hive

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-11-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r89839965 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala --- @@ -441,6 +441,44 @@ class

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-11-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 Has anyone had a chance to review this. It's nicely self-contained, makes it easier to use Spark as regression testing for ASF prerelease binaries of any dependent project. --- If

[GitHub] spark issue #15594: [SPARK-18061][SQL][Security] Spark Thriftserver needs to...

2016-11-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/15594 I'm not spark committer so can't review it well enough to get in; I was just watching it out of concern for the word "kerberos". How about you ask on the spark de

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-11-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Has anyone had a chance to review this? Is there more clarification needed, or some specific aspect of the patch which needs changing? Without this it is near-impossible to have a

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 sean: there's two things: tests and packaging. 1. The packaging has to go in as probably the only way to get whatever spark is built with to be consistent. That includes excl

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 I had something tangible: the integration tests. It's clear those aren't wanted. Now I'm proposing something more minimal, yet still tangible for anyone trying to build spa

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-11-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 2.6 vs 2.7 vs later releases —a moving target, with AWS versions and other issues to worry about. [HADOOP-13687](https://issues.apache.org/jira/browse/HADOOP-13687) is going to add a

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Here's why this matters, and why a simple "isn't this just a matter of dropping in the JARs" isn't the solution: *getting getting the right jars together wit

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 latest patch: has updated the dependency settings. As noted, works for Hadoop versions from 2.7 to 3.0.2-alpha & the HADOOP-13345 branch, at least if you build the last two wi

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 let me do a quick review & update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fea

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r97363419 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -235,18 +236,97 @@ class InputStreamsSuite extends

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-01-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r97367042 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -196,29 +191,29 @@ class FileInputDStream[K, V, F

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen I've updated it. Note that [HADOOP-13946](https://issues.apache.org/jira/browse/HADOOP-13946) tracks the changes in the Hadoop docs, which writes down what HDFS actually does,

[GitHub] spark issue #9168: [SPARK-11182] HDFS Delegation Token will be expired when ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9168 looking at the HDFS patch, it's in branch-2.9. We could backport to branch-2.8, though it's too late to get into the 2.8.0 RC --- If your project is set up for it, you can reply to

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 @nchammas the AWS SDK you get will be in sync with hadoop-aws; you have to keep them in sync. what is more brittle is the transients: httpclient, joda time, jackson, etc, which is

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-01-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r98448893 --- Diff: cloud/src/test/scala/org/apache/spark/cloud/AzureInstantiationSuite.scala --- @@ -0,0 +1,29 @@ +/* --- End diff -- They

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-01-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r98449488 --- Diff: pom.xml --- @@ -2586,6 +2591,100 @@ 3.4.6 2.6.0

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 @nchammas sorry, should be clearer: "you must never use an aws-sdk version other than the one hadoop-aws was built with, else things will break". if you pull in hadoop-aws, that h

[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16815 LGTM, though checkpointing to S3 has its own separate issues related to rename performance and listing inconsistency. While this fix lets people request different filesystems for the data

  1   2   3   4   5   6   7   8   9   10   >