[GitHub] storm issue #1832: STORM-2250: Kafka Spout Refactoring to Increase Modularit...

2017-02-13 Thread srdo
Github user srdo commented on the issue: https://github.com/apache/storm/pull/1832 @HeartSaVioR Fixed conflict --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] storm pull request #1832: STORM-2250: Kafka Spout Refactoring to Increase Mo...

2017-02-13 Thread srdo
Github user srdo commented on a diff in the pull request: https://github.com/apache/storm/pull/1832#discussion_r100968547 --- Diff: storm-core/src/jvm/org/apache/storm/utils/Time.java --- @@ -88,59 +92,97 @@ public static boolean isSimulating() { public static

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread srdo
Github user srdo commented on the issue: https://github.com/apache/storm/pull/1919 @HeartSaVioR Right, acking a batch at a time is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] storm issue #1832: STORM-2250: Kafka Spout Refactoring to Increase Modularit...

2017-02-13 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1832 @srdo Could you rebase this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] storm pull request #1914: STORM-2334 - Join Bolt implementation with unit te...

2017-02-13 Thread roshannaik
Github user roshannaik commented on a diff in the pull request: https://github.com/apache/storm/pull/1914#discussion_r100959503 --- Diff: docs/Joins.md --- @@ -0,0 +1,126 @@ +--- +title: Joining Streams in Storm Core +layout: documentation +documentation: true

[GitHub] storm pull request #1914: STORM-2334 - Join Bolt implementation with unit te...

2017-02-13 Thread roshannaik
Github user roshannaik commented on a diff in the pull request: https://github.com/apache/storm/pull/1914#discussion_r100959498 --- Diff: docs/Joins.md --- @@ -0,0 +1,126 @@ +--- +title: Joining Streams in Storm Core +layout: documentation +documentation: true

[GitHub] storm pull request #1914: STORM-2334 - Join Bolt implementation with unit te...

2017-02-13 Thread roshannaik
Github user roshannaik commented on a diff in the pull request: https://github.com/apache/storm/pull/1914#discussion_r100959018 --- Diff: docs/Joins.md --- @@ -0,0 +1,126 @@ +--- +title: Joining Streams in Storm Core +layout: documentation +documentation: true

[GitHub] storm pull request #1914: STORM-2334 - Join Bolt implementation with unit te...

2017-02-13 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/storm/pull/1914#discussion_r100952160 --- Diff: docs/Joins.md --- @@ -0,0 +1,126 @@ +--- +title: Joining Streams in Storm Core +layout: documentation +documentation: true

[GitHub] storm pull request #1914: STORM-2334 - Join Bolt implementation with unit te...

2017-02-13 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/storm/pull/1914#discussion_r100952035 --- Diff: docs/Joins.md --- @@ -0,0 +1,126 @@ +--- +title: Joining Streams in Storm Core +layout: documentation +documentation: true

[GitHub] storm pull request #1914: STORM-2334 - Join Bolt implementation with unit te...

2017-02-13 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/storm/pull/1914#discussion_r100951653 --- Diff: docs/Joins.md --- @@ -0,0 +1,126 @@ +--- +title: Joining Streams in Storm Core +layout: documentation +documentation: true

[GitHub] storm issue #1934: STORM-2333: CGroup memory and CPU metrics

2017-02-13 Thread jerrypeng
Github user jerrypeng commented on the issue: https://github.com/apache/storm/pull/1934 @revans2 this is a really cool feature and thanks for adding it. My only question is that I see you have implemented some parsers/getters for the cgroup metrics. There is already a set of them

[GitHub] storm pull request #1934: STORM-2333: CGroup memory and CPU metrics

2017-02-13 Thread jerrypeng
Github user jerrypeng commented on a diff in the pull request: https://github.com/apache/storm/pull/1934#discussion_r100951683 --- Diff: storm-core/src/jvm/org/apache/storm/metric/cgroup/CGroupCpuGuarantee.java --- @@ -0,0 +1,48 @@ +/** + * Licensed to the Apache Software

[GitHub] storm pull request #1934: STORM-2333: CGroup memory and CPU metrics

2017-02-13 Thread jerrypeng
Github user jerrypeng commented on a diff in the pull request: https://github.com/apache/storm/pull/1934#discussion_r100951524 --- Diff: storm-core/src/jvm/org/apache/storm/metric/cgroup/CGroupMemoryLimit.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software

[GitHub] storm pull request #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/1919 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] storm issue #1932: [STORM-2194] Stop ignoring socket timeout error from exec...

2017-02-13 Thread jerrypeng
Github user jerrypeng commented on the issue: https://github.com/apache/storm/pull/1932 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

Re: [DISCUSS] Release Storm 1.1.0

2017-02-13 Thread Harsha Chintalapani
STORM-2340 is more of a feature . Auto-commit mode in storm-kafka used rarely and most users run the kafka spout with ackers and get at-least once guarantee. If its going to longer to address the PR reviews I am +1 on moving this out of Storm 1.1.0. We already quite a few patches

Re: [Discuss] Storm hdfs spout improvements

2017-02-13 Thread Sachin Pasalkar
I have created JIRA for this https://issues.apache.org/jira/browse/STORM-2358. For point 1: Its specific use case just to support why it needs to be public For point 2: We are limiting code to be very specific to these 2 implementations we should have generic implementation. I see there is

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1919 @XuMingmin OK agree that it's beyond the PR's scope. We still need to sort out case 1 to 4, but this PR itself looks great to me. +1 --- If your project is set up for it, you can reply

[GitHub] storm pull request #1832: STORM-2250: Kafka Spout Refactoring to Increase Mo...

2017-02-13 Thread hmcl
Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/1832#discussion_r100937812 --- Diff: storm-core/src/jvm/org/apache/storm/utils/Time.java --- @@ -162,9 +204,13 @@ public static long deltaMs(long timeInMilliseconds) { }

[GitHub] storm pull request #1832: STORM-2250: Kafka Spout Refactoring to Increase Mo...

2017-02-13 Thread hmcl
Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/1832#discussion_r100937801 --- Diff: storm-core/src/jvm/org/apache/storm/utils/Time.java --- @@ -162,9 +204,13 @@ public static long deltaMs(long timeInMilliseconds) { }

[GitHub] storm pull request #1832: STORM-2250: Kafka Spout Refactoring to Increase Mo...

2017-02-13 Thread hmcl
Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/1832#discussion_r100938344 --- Diff: storm-core/src/jvm/org/apache/storm/utils/Time.java --- @@ -88,59 +92,97 @@ public static boolean isSimulating() { public static

[GitHub] storm pull request #1832: STORM-2250: Kafka Spout Refactoring to Increase Mo...

2017-02-13 Thread hmcl
Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/1832#discussion_r100935401 --- Diff: storm-core/src/jvm/org/apache/storm/utils/Time.java --- @@ -42,10 +46,10 @@ public SimulatedTime() { public SimulatedTime(Number ms) {

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread XuMingmin
Github user XuMingmin commented on the issue: https://github.com/apache/storm/pull/1919 @HeartSaVioR instinctively I think your code can meet at-most-once. Records that are pulled and not yet emitted are discarded if any crash. Still, I'd want to limit the scope of this

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1919 What I'm referring is the blog doc: https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client/ Referring to the doc, "at most once" can be

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread XuMingmin
Github user XuMingmin commented on the issue: https://github.com/apache/storm/pull/1919 Do we really need a `commitSync` call for every `emit`? That would be lots of calls. The consumer reads data in batch. The pseudo-at-most-once, either case1 or case 3, is good option, as a

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1919 @srdo I think it's up to the Spout implementation and we don't need to consider ordering if we don't rely on ack() method. If the spout is aware of the count of ackers (0 or more)

Re: [Discuss] Storm hdfs spout improvements

2017-02-13 Thread Roshan Naik
On 2/13/17, 12:14 PM, "Sachin Pasalkar" wrote: >I have attached updated source code of HDFSSpout for more reference. I >have updated respective classes (not attached) Don¹t see any attachment. Answers are below. Better to do this discussion on a JIRA. On

Re: Sending periodic statistics to Spout from Bolts

2017-02-13 Thread Anis Nasir
Dear Bobby, Thank you for the feedback. I will start looking at the source code now. I would prefer the downstream operators to take care of these parameters locally and only send a message to the upstream operator to *increase load* or *decrease load.* Given this feature, upstream operator

[GitHub] storm issue #1914: STORM-2334 - Join Bolt implementation with unit tests

2017-02-13 Thread roshannaik
Github user roshannaik commented on the issue: https://github.com/apache/storm/pull/1914 @arunmahadevan i have update the PR with - Updated docs - Added sample topo - Added support to qualify keys with stream names for disambiguation - Added more UT Have

Re: Sending periodic statistics to Spout from Bolts

2017-02-13 Thread Bobby Evans
First off if you don't know clojure you are in luck. On the master branch all of the core code except for the UI, shell submission and a few classes needed to support them are in java.  There are still several tests that also need to move over to java but it should not be too big of an issue

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread srdo
Github user srdo commented on the issue: https://github.com/apache/storm/pull/1919 @HeartSaVioR That's pretty much my understanding of this as well. I'm not sure what order the ack/emit happens in when ackers=0, but if it's ack followed by emit, then it should be enough to put a

Re: Sending periodic statistics to Spout from Bolts

2017-02-13 Thread Anis Nasir
Dear Bobby, In this case, how can we enable such configuration? I am not very familiar with clojure. However, I would like the downstream operators to report various parameters on-need basis to the upstream operators, like service time, queue length, CPU utilization, memory utilization, idle

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1919 @srdo @XuMingmin If my understanding is right, at-most-once can be guaranteed with this step: 1. pull the data from datasource 2. send ack to the datasource 3. emit the data to

Re: [VOTE] Release Apache Storm 1.0.3 (RC2)

2017-02-13 Thread Kishorkumar Patil
Thanks Taylor, I was little late I guess. :) -Kishor On Monday, February 13, 2017 3:52 PM, P. Taylor Goetz wrote: They’ve been moved over to release now that the vote has passed: https://dist.apache.org/repos/dist/release/storm/apache-storm-1.0.3/

Re: [VOTE] Release Apache Storm 1.0.3 (RC2)

2017-02-13 Thread P. Taylor Goetz
They’ve been moved over to release now that the vote has passed: https://dist.apache.org/repos/dist/release/storm/apache-storm-1.0.3/ -Taylor > On Feb 13, 2017, at 3:41 PM, Kishorkumar Patil

Re: [VOTE] Release Apache Storm 1.0.3 (RC2)

2017-02-13 Thread Kishorkumar Patil
The files are missing from dist. I am getting 404. -Kishor On Monday, February 13, 2017 2:36 PM, P. Taylor Goetz wrote: +1 Performed my typical checks and verified topology visualization fix and local shutdown issue. -Taylor > On Feb 7, 2017, at 3:51 PM, P.

Re: [Discuss] Storm hdfs spout improvements

2017-02-13 Thread Sachin Pasalkar
I have attached updated source code of HDFSSpout for more reference. I have updated respective classes (not attached) On 13/02/17, 10:02 PM, "Sachin Pasalkar" wrote: >Hi, > >I was looking at storm hdfs spout code in 1.x branch, I found below >improvements can be

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread XuMingmin
Github user XuMingmin commented on the issue: https://github.com/apache/storm/pull/1919 @srdo agree, that's why I remove the word *at-most-once* in this conversation to make it clear. I haven't followed the latest Storm version for some time, I remember no later than 0.9.*

[GitHub] storm pull request #1928: Create ZippedTextFileReader.java

2017-02-13 Thread pasalkarsachin1
Github user pasalkarsachin1 commented on a diff in the pull request: https://github.com/apache/storm/pull/1928#discussion_r100880730 --- Diff: external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/ZippedTextFileReader.java --- @@ -0,0 +1,241 @@ +/** + * Licensed

[RESULT] [VOTE] Release Apache Storm 1.0.3 (RC2)

2017-02-13 Thread P. Taylor Goetz
This vote is now closed and passes with 4 +1 votes and no +0/-1 votes. Vote tally (* indicates a binding vote): +1: Jungtaek Lim* Bobby Evans* Xin Wang* P. Taylor Goetz* I will release the staged artifacts and announce the release after a 24 hour waiting period for mirror catch up. -Taylor

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread srdo
Github user srdo commented on the issue: https://github.com/apache/storm/pull/1919 Yes, that's what I mean. It's not really at-most-once. For real at-most-once, the spout should call `KafkaConsumer.commitSync` when it receives an ack, not periodically (and then you'd need to disable

Re: [VOTE] Release Apache Storm 1.0.3 (RC2)

2017-02-13 Thread P. Taylor Goetz
+1 Performed my typical checks and verified topology visualization fix and local shutdown issue. -Taylor > On Feb 7, 2017, at 3:51 PM, P. Taylor Goetz wrote: > > This is a call to vote on releasing Apache Storm 1.0.3 (rc2) > > Full list of changes in this release: > >

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread XuMingmin
Github user XuMingmin commented on the issue: https://github.com/apache/storm/pull/1919 @srdo if `kafkaConsumer.close()` is not guaranteed to call, strictly, there's no at-most-once guarantee for either case 3 or case 1 so far. And case 1 equals case 3 after this PR. --- If your

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread srdo
Github user srdo commented on the issue: https://github.com/apache/storm/pull/1919 @XuMingmin You shouldn't assume that `KafkaConsumer.close` is called when crashing. If the executor running the spout crashes (or throws an uncaught Exception), `close` is not called as far as I know.

[GitHub] storm issue #1919: STORM-2340: fix AutoCommitMode issue in KafkaSpout

2017-02-13 Thread XuMingmin
Github user XuMingmin commented on the issue: https://github.com/apache/storm/pull/1919 I would prefer to keep case 1 for at-most-once, when user want to avoid manage the offset additional in `KafkaSpout`. IMO, there's no difference for guarantee between case 3 and case 1.

[GitHub] storm issue #1932: [STORM-2194] Stop ignoring socket timeout error from exec...

2017-02-13 Thread kishorvpatil
Github user kishorvpatil commented on the issue: https://github.com/apache/storm/pull/1932 LGTM. +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[Discuss] Storm hdfs spout improvements

2017-02-13 Thread Sachin Pasalkar
Hi, I was looking at storm hdfs spout code in 1.x branch, I found below improvements can be made in below code. 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so that it can be used in generics. 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as String. It

[GitHub] storm pull request #1934: STORM-2333: CGroup memory and CPU metrics

2017-02-13 Thread tibkiss
Github user tibkiss commented on a diff in the pull request: https://github.com/apache/storm/pull/1934#discussion_r100821485 --- Diff: storm-core/src/jvm/org/apache/storm/metric/cgroup/CGroupCpu.java --- @@ -0,0 +1,68 @@ +/** + * Licensed to the Apache Software Foundation

Re: Sending periodic statistics to Spout from Bolts

2017-02-13 Thread Bobby Evans
Yes makes perfect since. - Bobby On Friday, February 10, 2017, 4:36:22 PM CST, Anis Nasir wrote:Dear Bobby, Thank you very much for your reply. In real deployments, it is often the case that executors are heterogenous and execution time per tuple is non-uniform (as