[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2882#discussion_r19190251 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileSegment.scala --- @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2882#discussion_r19190926 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileSegment.scala --- @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2882#discussion_r19191068 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileSegment.scala --- @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...

2014-10-27 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-60702936 Hi TD, are you going to expose some store() API in `Receiver` which will directly use `WriteAheadLogBasedBlockHandler` to store block? Seems now these two

[GitHub] spark pull request: [SPARK-3954][Streaming] promote the speed of c...

2014-10-27 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2811#issuecomment-60713367 Maybe they are quite busy, let me ping @tdas . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-28 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/2991 [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector Add ReliableKafkaReceiver in Kafka connector to prevent data loss if WAL in Spark Streaming is enabled

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-28 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-60875453 Hi @tdas , would you mind taking a look at this? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2994#discussion_r19523356 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2994#discussion_r19523702 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2994#discussion_r19523728 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-61033489 Hi Hari, can you assure that `Producer` is thread-safe? Besides I have one concern, the overhead of creating `Producer` compared to the time cost for writing

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-30 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-61196953 OK, will do :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-05 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-61930141 Hi @tdas , would you mind reviewing this code? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2304] tera sort example program for shu...

2014-09-11 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1242#issuecomment-55352215 Hi @rxin , sorry to bring this out. Are you planning to merge this terasort example into Spark? I think this would be a good standard to test the performance of

[GitHub] spark pull request: Fix Kafka unit test hard coded Zookeeper port ...

2014-09-21 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/2483 Fix Kafka unit test hard coded Zookeeper port issue Details can be seen in [SPARK-3615](https://issues.apache.org/jira/browse/SPARK-3615). You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-3615][Streaming]Fix Kafka unit test har...

2014-09-21 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2483#issuecomment-56324080 @pwendell @tdas , mind taking a look at this? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3615][Streaming]Fix Kafka unit test har...

2014-09-23 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2483#discussion_r17950810 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala --- @@ -59,16 +58,35 @@ class KafkaStreamSuite extends

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-23 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/2514 [SPARK-3032][Shuffle] Fix key comparison integer overflow introduced sorting exception Previous key comparison in `ExternalSorter` will get wrong sorting result or exception when key comparison

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-23 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2514#discussion_r17954265 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -152,7 +152,7 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-23 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2514#discussion_r17954666 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -152,7 +152,7 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-23 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2514#discussion_r17954749 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -707,4 +707,53 @@ class ExternalSorterSuite extends

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2514#discussion_r18012867 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -707,4 +707,53 @@ class ExternalSorterSuite extends

[GitHub] spark pull request: Spark parquet improvements

2014-03-27 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/195#discussion_r11053203 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -40,7 +40,7 @@ import java.util.Date * Parquet

[GitHub] spark pull request: Spark parquet improvements

2014-03-27 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/195#discussion_r11053408 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -40,7 +40,7 @@ import java.util.Date * Parquet

[GitHub] spark pull request: [SPARK-1354][SQL] Add tableName as a qualifier...

2014-03-30 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/272 [SPARK-1354][SQL] Add tableName as a qualifier for SimpleCatelogy Fix attribute unresolved when query with table name as a qualifier in SQLContext with SimplCatelog, details please see [SPARK

[GitHub] spark pull request: [SPARK-1354][SQL] Add tableName as a qualifier...

2014-03-30 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/272#issuecomment-39046591 Thanks @marmbrus and @pwendell . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SQL] SPARK-1367 Fix NPE when joining parquet ...

2014-04-01 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/292#issuecomment-39279300 Hey Michael, it looks cool. One simple question: why `HiveTableScan` equivalent parameters do not specify as `transient`? --- If your project is set up for it, you can

[GitHub] spark pull request: [Streaming] SPARK-1510: Add Spark Streaming me...

2014-04-16 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/424 [Streaming] SPARK-1510: Add Spark Streaming metrics source for metrics system Add Streaming source in Spark Streaming for metrics system. Details can be seen in [SPARK-1510](https

[GitHub] spark pull request: [Streaming] SPARK-1510: Add Spark Streaming me...

2014-04-16 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/424#issuecomment-40575918 @tdas and @pwendell, please help to review, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2044] Pluggable interface for shuffles

2014-06-07 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1009#issuecomment-45430013 Hi Matei, since we already have `ShuffleWriter` and `ShuffleManager`, do we still need to keep `ShuffleBlockManager`, I think two functionalities of

[GitHub] spark pull request: [SPARK-2080] Yarn: report HS URL in client mod...

2014-06-11 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1002#discussion_r13683413 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -297,7 +297,7 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-06-11 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-45820580 Hi @sryza , two quick questions: 1. Will this add additional overhead to Spark run-time, especially for Spark Streaming jobs in which batchDuration is quite

[GitHub] spark pull request: [SPARK-2124] Move aggregation into shuffle imp...

2014-06-12 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1064 [SPARK-2124] Move aggregation into shuffle implementations This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations. You can merge this pull request

[GitHub] spark pull request: SPARK-2127: Use application specific folders t...

2014-06-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1067#discussion_r13700626 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala --- @@ -53,11 +53,14 @@ private[spark] class CsvSink(val property: Properties

[GitHub] spark pull request: SPARK-2127: Use application specific folders t...

2014-06-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1067#discussion_r13703371 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala --- @@ -53,11 +53,14 @@ private[spark] class CsvSink(val property: Properties

[GitHub] spark pull request: [SPARK-983] Support external sorting

2014-06-16 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1090#issuecomment-46255789 This is the [PR](https://github.com/apache/spark/pull/931) which uses `ExternalAppendOnlyMap` to do external sort. I think it would be nice to use

[GitHub] spark pull request: [SPARK-2124] Move aggregation into shuffle imp...

2014-06-18 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1064#issuecomment-46511981 Hi @mateiz, would you mind taking a look at this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-06-18 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/931#discussion_r13948953 --- Diff: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala --- @@ -41,30 +45,92 @@ import org.apache.spark.{Logging, RangePartitioner

[GitHub] spark pull request: [SPARK-2124] Move aggregation into shuffle imp...

2014-06-22 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1064#issuecomment-46798422 Hi Matei, thanks for your review, I will update the code soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2124] Move aggregation into shuffle imp...

2014-06-23 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1064#issuecomment-46818315 Hi Matei, I just updated the code according to your comments. For OrderedRDDFunctions, I only set `KeyOrding` into the shuffle, but not move the code path, so what&#

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-06-25 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1210 [SPARK-2125] Add sort flag and move sort into shuffle implementations This patch adds a sort flag into ShuffleDependecy and moves sort into hash shuffle implementation. Moving sort into

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-06-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1210#discussion_r14174925 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -49,6 +49,17 @@ class HashShuffleReader[K, C]( } else

[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-27 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1245 [SPARK-2104] Fix task serializing issues when sort with Java non serializable class Details can be see in [SPARK-2104](https://issues.apache.org/jira/browse/SPARK-2104). This work is based on

[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1245#discussion_r14333421 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -96,15 +98,15 @@ class HashPartitioner(partitions: Int) extends Partitioner

[GitHub] spark pull request: FIX: ShuffledDStream run tasks only when dstre...

2014-07-02 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1291#discussion_r14498076 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/ShuffledDStream.scala --- @@ -39,8 +39,10 @@ class ShuffledDStream[K: ClassTag, V

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-04 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1291#issuecomment-48020036 It makes sense for the problem you addressed, but modified in `ShuffleDStream` seems like just a workaround method, not a good solution, since you may also face this

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-07 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1210#issuecomment-48174580 Hi @mateiz, mind taking a look at this PR, thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-08 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1327 [SPARK-2402] Update the initial position when reuse DiskBlockObjectWriter Minor fix, `initialPosition` can not be updated after `close()` and re`open()`, which will lead to error when reusing

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1327#issuecomment-48329176 It is alright if this class is not reopen supported, but seems there is not obvious fence to defend user to reuse this object, so at least this modification will not

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1210#issuecomment-48433693 I'm not sure is that what you want, so I hope you can review it and give me come comments. Thanks a lot. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-09 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1327#issuecomment-48553246 Ok, sorry for my unthoughtful PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-09 Thread jerryshao
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/1327 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-09 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1327#issuecomment-48565390 IMHO, if we leave reuse of this object aside, I don't think this change will bring specific effect to the current code path. If we really want to defend the reu

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-09 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1291#discussion_r14749847 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -288,23 +288,33 @@ abstract class DStream[T: ClassTag

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-09 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1291#discussion_r14749857 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -288,23 +288,33 @@ abstract class DStream[T: ClassTag

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-09 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1291#discussion_r14749902 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -288,23 +288,33 @@ abstract class DStream[T: ClassTag

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-09 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1291#issuecomment-48566771 Like @tdas said, it's a nice-to-have feature, modification may narrow the flexibility of current implementation, so TD, what's your opinion? --- If your

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-09 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1291#discussion_r14750153 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -288,23 +288,33 @@ abstract class DStream[T: ClassTag

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

2014-08-03 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/557#issuecomment-50983733 Yeah, I will submit to new PR about this ASAP :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-03 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1751 [SPARK-1022][Streaming] Add Kafka real unit test This PR is a updated version of (https://github.com/apache/spark/pull/557) to actually test sending and receiving data through Kafka, and fix

[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-04 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1751#issuecomment-51137436 Hi TD, thanks for your review, I will update the code according to your comments. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-05 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1751#issuecomment-51287071 Hi @tdas and @srowen , sorry for the incomplete test and compile, I only compiled and tested under SBT and it looked fine, so I missed the Maven part. Thanks a lot for

[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...

2014-08-14 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1420#issuecomment-52269406 Hi @tdas , would you mind taking a look at this? thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3054][STREAMING] Add unit tests for Spa...

2014-08-14 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/1958#discussion_r16280387 --- Diff: external/flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink/SparkSinkSuite.scala --- @@ -0,0 +1,208 @@ +package

[GitHub] spark pull request: [SPARK-3146][Streaming] Improve the flexibilit...

2014-08-19 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/2053 [SPARK-3146][Streaming] Improve the flexibility of Spark Streaming Kafka API Improve the flexibility of Spark Streaming Kafka API to offer user the ability to pre-process message before stored

[GitHub] spark pull request: [SPARK-3146][Streaming] Improve the flexibilit...

2014-08-20 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2053#issuecomment-52741181 Seems some temporary files messed up the test environment... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3146][Streaming] Improve the flexibilit...

2014-08-20 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2053#issuecomment-52746388 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3146][Streaming] Improve the flexibilit...

2014-08-20 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2053#issuecomment-52749189 Hey @tdas , mind taking a look at this? Looks like checkpoint file messed up the environment. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-3146][Streaming] Improve the flexibilit...

2014-08-20 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2053#issuecomment-52868048 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3146][Streaming] Improve the flexibilit...

2014-08-20 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2053#issuecomment-52884496 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove ...

2017-02-10 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16810#discussion_r100488764 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -253,53 +233,24 @@ private[spark] class Client

[GitHub] spark pull request #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove ...

2017-02-10 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16810#discussion_r100490035 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -253,53 +233,24 @@ private[spark] class Client

[GitHub] spark pull request #16884: Fix compile issue for Spark on Yarn when building...

2017-02-10 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/16884 Fix compile issue for Spark on Yarn when building against Hadoop 2.6.0~2.6.3 ## What changes were proposed in this pull request? Due to the newly added API in Hadoop 2.6.4+, Spark builds

[GitHub] spark issue #16884: [SPARK-19545][YARN]Fix compile issue for Spark on Yarn w...

2017-02-10 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16884 @srowen @mridulm , please review, thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-14 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/16923 [SPARK-19038][Hive][YARN] Correctly figure out keytab file name in yarn client mode Change-Id: I06170769f83fe530361a2737427b46d657f40d75 ## What changes were proposed in this pull

[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-14 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16923 @tgravescs @mridulm would you please help to review, thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r101417534 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,31 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r101417697 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,31 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #16955: [SPARK-19626]update cred using spark.yarn.credent...

2017-02-16 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16955#discussion_r101481467 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/CredentialUpdater.scala --- @@ -55,14 +55,10 @@ private[spark

[GitHub] spark issue #16955: [SPARK-19626]update cred using spark.yarn.credentials.up...

2017-02-16 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16955 Agreed with @srowen , please describe the problem and fix both here in PR and in JIRA, also changing the title to be more meaningful. It would be better for others without the context to

[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-16 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16923 @vanzin , would you mind helping to review this PR, thanks a lot. IIUC the issue was introduced in #11510 . --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-16 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r101680479 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,33 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-16 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r101680633 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,33 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102357394 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102359272 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102362260 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #17038: [SPARK-19707][Core] Improve the invalid path chec...

2017-02-23 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/17038 [SPARK-19707][Core] Improve the invalid path check for sc.addJar ## What changes were proposed in this pull request? Currently in Spark there're two issues when we add jars with in

[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...

2017-02-23 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17038 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...

2017-02-23 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17038 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue ...

2017-02-27 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/17083 [SPARK-19750][UI][branch-2.1] Fix redirect issue from http to https ## What changes were proposed in this pull request? If spark ui port (4040) is not set, it will choose port number 0

[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...

2017-02-27 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17083 Due to the change of (https://github.com/apache/spark/pull/16625), the issue is obsolete. So it effects spark 2.1 and 2.0. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...

2017-02-27 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17083 Not sure why Jenkins test cannot be started automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16382: [SPARK-18975][Core] Add an API to remove SparkLis...

2016-12-21 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/16382 [SPARK-18975][Core] Add an API to remove SparkListener ## What changes were proposed in this pull request? In current Spark we could add customized SparkListener through `SparkContext

[GitHub] spark issue #16382: [SPARK-18975][Core] Add an API to remove SparkListener

2016-12-22 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16382 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16432: [SPARK-19021][YARN] Generailize HDFSCredentialPro...

2016-12-28 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/16432 [SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems Change-Id: I85d7963c4980cf9660f495f377ba27227da4b1b1 Currently Spark can only get token

[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2016-12-29 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16432 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16444: [Minor][Doc] Minor doc change for YARN credential provid...

2017-01-01 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16444 LGTM, thanks for the fix @viirya . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16470: [SPARK-19033][Core] Add admin acls for history se...

2017-01-04 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/16470 [SPARK-19033][Core] Add admin acls for history server ## What changes were proposed in this pull request? Current HistoryServer's ACLs is derived from application event-log, which

[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2017-01-04 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16432 Thanks @tgravescs for your comments. Yes actually it should be Hadoop compatible file systems, or in other words should be Hadoop `FileSystem` API supported FS. I tested with kerberized HDFS and

[GitHub] spark issue #16470: [SPARK-19033][Core] Add admin acls for history server

2017-01-04 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/16470 Thanks @tgravescs for your comments, I've updated the code to add more tests accordingly. Please review. --- If your project is set up for it, you can reply to this email and have your

  1   2   3   4   5   6   7   8   9   10   >