[GitHub] storm issue #1690: STORM-2100 Fix Trident SQL join tests to not rely on orde...

2016-09-20 Thread satishd
Github user satishd commented on the issue:

https://github.com/apache/storm/pull/1690
  
Thanks @HeartSaVioR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1690: STORM-2100 Fix Trident SQL join tests to not rely ...

2016-09-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1690


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1691: STORM-2090: Add integration test for storm windowi...

2016-09-20 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1691#discussion_r79754021
  
--- Diff: 
integration-test/src/main/java/org/apache/storm/st/topology/window/TumblingTimeCorrectness.java
 ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.st.topology.window;
+
+import com.google.common.base.Function;
+import com.google.common.collect.Collections2;
+import com.google.common.collect.Lists;
+import org.apache.storm.generated.StormTopology;
+import org.apache.storm.spout.SpoutOutputCollector;
+import org.apache.storm.st.topology.TestableTopology;
+import org.apache.storm.st.topology.window.data.TimeData;
+import org.apache.storm.st.utils.TimeUtil;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.topology.TopologyBuilder;
+import org.apache.storm.topology.base.BaseRichSpout;
+import org.apache.storm.topology.base.BaseWindowedBolt;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.apache.storm.windowing.TupleWindow;
+import org.apache.storm.st.utils.StringDecorator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.util.*;
--- End diff --

can you avoid wildcard imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1691: STORM-2090: Add integration test for storm windowi...

2016-09-20 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1691#discussion_r79754113
  
--- Diff: integration-test/src/test/resources/storm-conf/storm.yaml ---
@@ -0,0 +1,17 @@
+
+storm.zookeeper.servers:
+- "node1"
+
+nimbus.seeds: ["node1"]
+
+# netty transport
+storm.messaging.transport: "org.apache.storm.messaging.netty.Context"
+storm.messaging.netty.buffer_size: 16384
+storm.messaging.netty.max_retries: 10
+storm.messaging.netty.min_wait_ms: 1000
+storm.messaging.netty.max_wait_ms: 5000
+
+drpc.servers:
+  - "node1"
+
+supervisor.slots.ports: [6700, 6701, 6702, 6703, 6704, 6705, 6706, 6707, 
6708, 6709]
--- End diff --

any reason to have these many worker slots?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] ElasticSearch 2.x support

2016-09-20 Thread Jungtaek Lim
+1 on arranging module names on storm-kafka and storm-kafka-client (with
versions).

Actually there's another trick behind storm-kafka-client.
storm-kafka-client 1.0.x supports Kafka 0.9.x and storm-kafka-client 1.1.0
(SNAPSHOT) supports Kafka 0.10.x, which can make a confusion. There was
also a question from mailing list regarding this.

The thing is how we name the module based on support version (naming
convention).


2016년 9월 21일 (수) 오전 8:48, P. Taylor Goetz 님이 작성:

> Now might be a time to consider making version-sensitive modules ((kafk,
> ES) maven multi-modules.
>
> I've never really liked "storm-Kafka" and "storm-kafka-client" as
> differentiators of Kafka 0.9..X and 0.10.x. I'd tend toward something like:
>
> storm-Kafka/
> /0.9.x
> /0.10.x
>
> I would think the same thing would apply to ES.
>
> I'm +1 for supporting both versions, with careful consideration wrt how
> and when  to deprecate or EOL support for a version line.
>
> -Taylor
>
> > On Sep 20, 2016, at 7:30 PM, Aaron Niskodé-Dossett 
> wrote:
> >
> > Thanks everyone. Could one or more committers +1 the PR that would
> support
> > both versions?
> >
> > https://github.com/apache/storm/pull/1337
> > On Mon, Sep 19, 2016 at 10:52 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> >
> >> Agree with Bobby, +1 for supporting both versions till EOL and findout
> how
> >> many users are really using 1.7.x.
> >>
> >> ~Satish.
> >>
> >>
> >> On Mon, Sep 19, 2016 at 7:50 PM, Bobby Evans
> 
> >> wrote:
> >>
> >>> I am +1 for two modules until EOL.  Jan 2017. - Bobby
> >>>
> >>>On Saturday, September 17, 2016 4:19 AM, Jungtaek Lim <
> >>> kabh...@gmail.com> wrote:
> >>>
> >>>
> >>> According to the link, last version line of ES1 (1.7.x) will be live to
> >>> Jan
> >>> 2017. We need to keep two versions at least EOL of that.
> >>> I wouldn't mind having two modules and also wouldn't mind having
> >> duplicate
> >>> codes, but it would be better if we can extract common parts to parent
> >>> module.
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >>>
> >>> 2016년 9월 16일 (금) 오후 10:03, Aaron Niskodé-Dossett 님이
> >> 작성:
> >>>
>  ES1 is close to end of life in terms of commercial support from
> >> Elastic,
>  but not quite there (https://www.elastic.co/support/eol).
> >> Unfortunately
>  the ES1 and ES2 clients won't interoperate with their opposite
> >> versions.
> 
>  Given that, I take it you would support having ES1 and ES2 bolts
> >> packaged
>  separately?  This would somewhat like how we have storm-kafka and
>  storm-kafka-client for different Kafka versions.
> 
>  Thanks! -Aaron
> 
>  On Thu, Sep 15, 2016 at 9:12 AM Bobby Evans
> >>  
>  wrote:
> 
> > I personally don't use ES as part of my storm work, so I don't
>  necessarily
> > feel qualified to answer this.  In general though I really do like to
> >>> see
> > storm come with batteries included.  If ES1 is not end of life, and
> >>> there
> > is a community of people who want to continue using it/supporting
> >> it, I
> > would say lets continue to do so.  If that is not true, or if ES
> >>> offers a
> > backwards compatible client that could sway things for me to say lets
>  just
> > go forward with ES2. - Bobby
> >
> >   On Wednesday, September 14, 2016 2:47 PM, Aaron Niskodé-Dossett <
> > doss...@gmail.com> wrote:
> >
> >
> > Hi all,
> >
> > I started a a discussion about this a while ago, but didn't take it
> >> to
> >>> a
> > conclusion (my $realjob, etc., etc.).
> >
> > There are multiple PRs open to provide an Elastic Search 2.x bolt to
> >>> the
> > Storm project.  There are two different approaches:
> >
> > 1. Add side-by-side support for 2.x. Example:
> > https://github.com/apache/storm/pull/1337 (*FULL DISCLOSURE*: this
> >> is
> >>> my
> > own PR). [I also have some functionality enhancements in this PR, but
> > that's not relevant to this discussion.]
> >
> > 2. Upgrade existing bolt. Example,
> > https://github.com/apache/storm/pull/1396
> >
> > The drawback to approach 1 is that it duplicates a lot of code.  The
> > drawback to approach 2 is that it drops support for ES 1.x.
> >
> > ES 2.X has been out for a while and if we are serious about
> >> supporting
>  it,
> > we need to have a way to write to ES 2.X.
> >
> > I believe approach number 1 is ideal (again, it's my own PR) and
> >>> possibly
> > deprecating the existing 1.X bolt.
> >
> > I'd love to hear thoughts from others!
> >>
>


Re: [DISCUSS] ElasticSearch 2.x support

2016-09-20 Thread P. Taylor Goetz
Now might be a time to consider making version-sensitive modules ((kafk, ES) 
maven multi-modules.

I've never really liked "storm-Kafka" and "storm-kafka-client" as 
differentiators of Kafka 0.9..X and 0.10.x. I'd tend toward something like:

storm-Kafka/
/0.9.x
/0.10.x

I would think the same thing would apply to ES.

I'm +1 for supporting both versions, with careful consideration wrt how and 
when  to deprecate or EOL support for a version line.

-Taylor

> On Sep 20, 2016, at 7:30 PM, Aaron Niskodé-Dossett  wrote:
> 
> Thanks everyone. Could one or more committers +1 the PR that would support
> both versions?
> 
> https://github.com/apache/storm/pull/1337
> On Mon, Sep 19, 2016 at 10:52 AM Satish Duggana 
> wrote:
> 
>> Agree with Bobby, +1 for supporting both versions till EOL and findout how
>> many users are really using 1.7.x.
>> 
>> ~Satish.
>> 
>> 
>> On Mon, Sep 19, 2016 at 7:50 PM, Bobby Evans 
>> wrote:
>> 
>>> I am +1 for two modules until EOL.  Jan 2017. - Bobby
>>> 
>>>On Saturday, September 17, 2016 4:19 AM, Jungtaek Lim <
>>> kabh...@gmail.com> wrote:
>>> 
>>> 
>>> According to the link, last version line of ES1 (1.7.x) will be live to
>>> Jan
>>> 2017. We need to keep two versions at least EOL of that.
>>> I wouldn't mind having two modules and also wouldn't mind having
>> duplicate
>>> codes, but it would be better if we can extract common parts to parent
>>> module.
>>> 
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>> 
>>> 2016년 9월 16일 (금) 오후 10:03, Aaron Niskodé-Dossett 님이
>> 작성:
>>> 
 ES1 is close to end of life in terms of commercial support from
>> Elastic,
 but not quite there (https://www.elastic.co/support/eol).
>> Unfortunately
 the ES1 and ES2 clients won't interoperate with their opposite
>> versions.
 
 Given that, I take it you would support having ES1 and ES2 bolts
>> packaged
 separately?  This would somewhat like how we have storm-kafka and
 storm-kafka-client for different Kafka versions.
 
 Thanks! -Aaron
 
 On Thu, Sep 15, 2016 at 9:12 AM Bobby Evans
>>  I personally don't use ES as part of my storm work, so I don't
 necessarily
> feel qualified to answer this.  In general though I really do like to
>>> see
> storm come with batteries included.  If ES1 is not end of life, and
>>> there
> is a community of people who want to continue using it/supporting
>> it, I
> would say lets continue to do so.  If that is not true, or if ES
>>> offers a
> backwards compatible client that could sway things for me to say lets
 just
> go forward with ES2. - Bobby
> 
>   On Wednesday, September 14, 2016 2:47 PM, Aaron Niskodé-Dossett <
> doss...@gmail.com> wrote:
> 
> 
> Hi all,
> 
> I started a a discussion about this a while ago, but didn't take it
>> to
>>> a
> conclusion (my $realjob, etc., etc.).
> 
> There are multiple PRs open to provide an Elastic Search 2.x bolt to
>>> the
> Storm project.  There are two different approaches:
> 
> 1. Add side-by-side support for 2.x. Example:
> https://github.com/apache/storm/pull/1337 (*FULL DISCLOSURE*: this
>> is
>>> my
> own PR). [I also have some functionality enhancements in this PR, but
> that's not relevant to this discussion.]
> 
> 2. Upgrade existing bolt. Example,
> https://github.com/apache/storm/pull/1396
> 
> The drawback to approach 1 is that it duplicates a lot of code.  The
> drawback to approach 2 is that it drops support for ES 1.x.
> 
> ES 2.X has been out for a while and if we are serious about
>> supporting
 it,
> we need to have a way to write to ES 2.X.
> 
> I believe approach number 1 is ideal (again, it's my own PR) and
>>> possibly
> deprecating the existing 1.X bolt.
> 
> I'd love to hear thoughts from others!
>> 


[GitHub] storm issue #1337: STORM-1475: Add storm-elasticsearch2 module

2016-09-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1337
  
@dossett 
ES2 doesn't shade their Guava dependency? I guess ES1 is doing that.
I'm not sure we're OK to upgrade Guava version only for this, but there's 
another PR which gets rid of Guava from storm-core, so it might be OK. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Provision for dead-letter topic in storm

2016-09-20 Thread Aaron Niskodé-Dossett
I like the idea, especially if it can be implemented as generically as
possible. Ideally we could "dead letter" both the original tuple and the
tuple that itself failed. Intervening transformations could have changed
the original tuple. I realize that's adds a lot of complexity to your idea
and may not be feasible.
On Tue, Sep 20, 2016 at 1:15 AM S G  wrote:

> Hi,
>
> I want to gather some thoughts on a suggestion to provide a dead-letter
> functionality common to all spouts/bolts.
>
> Currently, if any spout / bolt reports a failure, it is retried by the
> spout.
> For a single bolt-failure in a large ADG, this retry logic can cause
> several perfectly successful component to replay and yet the Tuple could
> fail exactly at the same bolt on retry.
>
> This is fine usually (if the failure was temporary, say due to a network
> glitch) but sometimes, the message is bad enough such that it should not be
> retried but at the same time important enough that its failure should not
> be ignored.
>
> Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
>
> Most of the times, it is able to deserialize the bytes correctly but
> sometimes a badly formatted message fails to deserialize. For such cases,
> neither Kafka-Spout should retry nor ES-bolt should report a success. It
> should however be reported to the user somehow that a badly serialized
> message entered the stream.
>
> For cases like temporary network glitch, the Tuple should be retried.
>
> So the proposal is to implement a dead-letter topic as:
>
> 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the collector.
> Bolts will begin using it once its available for use.
>
> 2) Provide the ability to *configure a dead-letter data-store in the
> spout* for
> failed messages reported by #1 above.
>
>
> The configurable data-store should support kafka, solr and redis to
> begin-with (Plus the option to implement one's own and dropping a jar file
> in the classpath).
>
> Such a feature should benefit all the spouts as:
>
> 1) Topologies will not block replaying the same doomed-to-fail tuples.
> 2) Users can set alerts on dead-letters and find out easily actual problems
> in their topologies rather than analyze all failed tuples only to find that
> they failed because of a temporary network glitch.
> 3) Since the entire Tuple is put into the dead-letter, all the data is
> available for retrying after fixing the topology code.
>
> Please share your thoughts if you think it can benefit storm in a generic
> way.
>
> Thx,
> SG
>


Re: [DISCUSS] ElasticSearch 2.x support

2016-09-20 Thread Aaron Niskodé-Dossett
Thanks everyone. Could one or more committers +1 the PR that would support
both versions?

https://github.com/apache/storm/pull/1337
On Mon, Sep 19, 2016 at 10:52 AM Satish Duggana 
wrote:

> Agree with Bobby, +1 for supporting both versions till EOL and findout how
> many users are really using 1.7.x.
>
> ~Satish.
>
>
> On Mon, Sep 19, 2016 at 7:50 PM, Bobby Evans 
> wrote:
>
> > I am +1 for two modules until EOL.  Jan 2017. - Bobby
> >
> > On Saturday, September 17, 2016 4:19 AM, Jungtaek Lim <
> > kabh...@gmail.com> wrote:
> >
> >
> >  According to the link, last version line of ES1 (1.7.x) will be live to
> > Jan
> > 2017. We need to keep two versions at least EOL of that.
> > I wouldn't mind having two modules and also wouldn't mind having
> duplicate
> > codes, but it would be better if we can extract common parts to parent
> > module.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 9월 16일 (금) 오후 10:03, Aaron Niskodé-Dossett 님이
> 작성:
> >
> > > ES1 is close to end of life in terms of commercial support from
> Elastic,
> > > but not quite there (https://www.elastic.co/support/eol).
> Unfortunately
> > > the ES1 and ES2 clients won't interoperate with their opposite
> versions.
> > >
> > > Given that, I take it you would support having ES1 and ES2 bolts
> packaged
> > > separately?  This would somewhat like how we have storm-kafka and
> > > storm-kafka-client for different Kafka versions.
> > >
> > > Thanks! -Aaron
> > >
> > > On Thu, Sep 15, 2016 at 9:12 AM Bobby Evans
>  > >
> > > wrote:
> > >
> > > > I personally don't use ES as part of my storm work, so I don't
> > > necessarily
> > > > feel qualified to answer this.  In general though I really do like to
> > see
> > > > storm come with batteries included.  If ES1 is not end of life, and
> > there
> > > > is a community of people who want to continue using it/supporting
> it, I
> > > > would say lets continue to do so.  If that is not true, or if ES
> > offers a
> > > > backwards compatible client that could sway things for me to say lets
> > > just
> > > > go forward with ES2. - Bobby
> > > >
> > > >On Wednesday, September 14, 2016 2:47 PM, Aaron Niskodé-Dossett <
> > > > doss...@gmail.com> wrote:
> > > >
> > > >
> > > >  Hi all,
> > > >
> > > > I started a a discussion about this a while ago, but didn't take it
> to
> > a
> > > > conclusion (my $realjob, etc., etc.).
> > > >
> > > > There are multiple PRs open to provide an Elastic Search 2.x bolt to
> > the
> > > > Storm project.  There are two different approaches:
> > > >
> > > > 1. Add side-by-side support for 2.x. Example:
> > > > https://github.com/apache/storm/pull/1337 (*FULL DISCLOSURE*: this
> is
> > my
> > > > own PR). [I also have some functionality enhancements in this PR, but
> > > > that's not relevant to this discussion.]
> > > >
> > > > 2. Upgrade existing bolt. Example,
> > > > https://github.com/apache/storm/pull/1396
> > > >
> > > > The drawback to approach 1 is that it duplicates a lot of code.  The
> > > > drawback to approach 2 is that it drops support for ES 1.x.
> > > >
> > > > ES 2.X has been out for a while and if we are serious about
> supporting
> > > it,
> > > > we need to have a way to write to ES 2.X.
> > > >
> > > > I believe approach number 1 is ideal (again, it's my own PR) and
> > possibly
> > > > deprecating the existing 1.X bolt.
> > > >
> > > > I'd love to hear thoughts from others!
> > > >
> > > >
> > > >
> > >
> >
> >
> >
>


[GitHub] storm pull request #1697: STORM-2018: Supervisor V2

2016-09-20 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1697#discussion_r79723534
  
--- Diff: pom.xml ---
@@ -1067,8 +1067,8 @@
 org.apache.maven.plugins
 maven-compiler-plugin
 
-1.7
-1.7
+1.8
+1.8
--- End diff --

Oops forgot to revert this, will go back to 1.7


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1697: STORM-2018: Supervisor V2

2016-09-20 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/storm/pull/1697

STORM-2018: Supervisor V2

Still need to do some more manual testing but the unit tests passed for me.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/incubator-storm STORM-2018-1.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1697


commit 318ab5f300e0820ab862cbeca04b7cb67699b938
Author: Robert (Bobby) Evans 
Date:   2016-09-20T19:59:07Z

STORM-2018: Supervisor V2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79669257
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -266,26 +266,32 @@ private void doSeekRetriableTopicPartitions() {
 if (offsetAndMeta != null) {
 kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1);  // 
seek to the next offset that is ready to commit in next commit cycle
 } else {
-kafkaConsumer.seekToEnd(toArrayList(rtp));// Seek to 
last committed offset
+kafkaConsumer.seek(rtp, acked.get(rtp).committedOffset + 
1);// Seek to last committed offset
--- End diff --

Another PR is fine, but I don't think there needs to be a big change. If 
people are running with auto commit mode on they've already kind of decided 
that they don't care about at least once delivery (similar to disabling tuple 
acking), so I think simply making the spout ignore failed tuples should be 
fine. Alternatively auto commit mode could be removed from KafkaSpoutConfig and 
just be an optimization the spout enables if tuple acking is disabled. I don't 
think it makes sense to run with tuple acking on and auto commit on, nor does 
it really make sense to disable tuple acking and disable auto commit. @hmcl do 
you have an opinion on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1679: STORM-2087: storm-kafka-client - tuples not always being ...

2016-09-20 Thread jfenc91
Github user jfenc91 commented on the issue:

https://github.com/apache/storm/pull/1679
  
Thats fine @srdo . Thanks again for the reviews. I also created STORM-2106 
to keep track of the consequences of this change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread jfenc91
Github user jfenc91 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79659246
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -266,26 +266,32 @@ private void doSeekRetriableTopicPartitions() {
 if (offsetAndMeta != null) {
 kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1);  // 
seek to the next offset that is ready to commit in next commit cycle
 } else {
-kafkaConsumer.seekToEnd(toArrayList(rtp));// Seek to 
last committed offset
+kafkaConsumer.seek(rtp, acked.get(rtp).committedOffset + 
1);// Seek to last committed offset
--- End diff --

@srdo Yes, in the case that the failed tuple is not the first message in 
the batch since the last committed offset things worked fine how they were 
previously. 

I agree. Honestly, the idea of auto commit mode seems to go against the 
philosophy of storm's processing guarantees. Putting the offsets that need to 
be retried in memory isn't enough in the case of restarts. I propose we address 
auto commit mode in a separate PR. Sound alright?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread jfenc91
Github user jfenc91 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79662312
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -479,16 +487,17 @@ public OffsetAndMetadata findNextCommitOffset() {
 KafkaSpoutMessageId nextCommitMsg = null; // this is a 
convenience variable to make it faster to create OffsetAndMetadata
 
 for (KafkaSpoutMessageId currAckedMsg : ackedMsgs) {  // 
complexity is that of a linear scan on a TreeMap
-if ((currOffset = currAckedMsg.offset()) == 
initialFetchOffset || currOffset == nextCommitOffset + 1) {// found 
the next offset to commit
+if ((currOffset = currAckedMsg.offset()) == 
nextCommitOffset + 1) {// found the next offset to commit
 found = true;
 nextCommitMsg = currAckedMsg;
 nextCommitOffset = currOffset;
 } else if (currAckedMsg.offset() > nextCommitOffset + 1) { 
   // offset found is not continuous to the offsets listed to go in the next 
commit, so stop search
 LOG.debug("topic-partition [{}] has non-continuous 
offset [{}]. It will be processed in a subsequent batch.", tp, currOffset);
 break;
 } else {
-LOG.debug("topic-partition [{}] has unexpected offset 
[{}].", tp, currOffset);
-break;
+//Received a redundant ack. Ignore and continue 
processing.
--- End diff --

I saw this a couple of times before I figured it out. I have not taken the 
time to reproduce this in a toy/test case, but given the error message this is 
clearly a storm or storm-kafka-client issue. I got to this state in about 30 
minutes of running a topology processing 800k-300k tuples a minute with about 
10s latency. The input to the topology was on the order of 2k-10k tuples per 
minute with a bolt that separated each input into multiple tuples. At startup 
there was a high amount of failures after the separation (I was making requests 
against an unwarmed ELB). I would guess that that is enough to reproduce with 
random data/failures. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread jfenc91
Github user jfenc91 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79657089
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -451,11 +454,11 @@ public int compare(KafkaSpoutMessageId m1, 
KafkaSpoutMessageId m2) {
 /**
  * This class is not thread safe
  */
-private class OffsetEntry {
+class OffsetEntry {
 private final TopicPartition tp;
 private final long initialFetchOffset;  /* First offset to be 
fetched. It is either set to the beginning, end, or to the first uncommitted 
offset.
  * Initial value depends 
on offset strategy. See KafkaSpoutConsumerRebalanceListener */
-private long committedOffset; // last offset committed to 
Kafka. Initially it is set to fetchOffset - 1
+long committedOffset; // last offset committed to Kafka. 
Initially it is set to fetchOffset - 1
--- End diff --

Unless there is a way I don't know about, using reflection is hard to 
follow and difficult to refactor making it somewhat fragile. I am adding a 
protected get method and changing this back to private to hopefully address 
your concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1667: STORM-2076: Add new atom to prevent sync-processes...

2016-09-20 Thread srdo
Github user srdo closed the pull request at:

https://github.com/apache/storm/pull/1667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1667: STORM-2076: Add new atom to prevent sync-processes from d...

2016-09-20 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/1667
  
Closing this since SupervisorV2 was merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1679: STORM-2087: storm-kafka-client - tuples not always being ...

2016-09-20 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/1679
  
@jfenc91 I borrowed a few classes from this PR for another change to the 
spout here https://github.com/apache/storm/pull/1696. I hope you don't mind :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1696: STORM-2104: More graceful handling of acked/failed...

2016-09-20 Thread srdo
GitHub user srdo opened a pull request:

https://github.com/apache/storm/pull/1696

STORM-2104: More graceful handling of acked/failed tuples after parti…

…tion reassignment in new Kafka spout

See https://issues.apache.org/jira/browse/STORM-2104

In order to test this change I added a factory for KafkaConsumers. Please 
let me know if there's a nicer way to mock it.

In addition to fixing the described issue, I changed a few types on 
KafkaSpoutConfig. If the user specifies a non-serializable Deserializer in 
either setter in KafkaSpoutConfig.Builder, the topology can't start because 
Nimbus can't serialize KafkaSpoutConfig.

I borrowed a few classes from https://github.com/apache/storm/pull/1679. I 
hope that's okay with you @jfenc91. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srdo/storm STORM-2104

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1696.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1696


commit d319b0a68ac6537d6e9561da8e90b693c2b9f990
Author: Stig Rohde Døssing 
Date:   2016-09-20T11:53:55Z

STORM-2104: More graceful handling of acked/failed tuples after partition 
reassignment in new Kafka spout




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1695: STORM-2101: fixes npe in compute-executors in nimb...

2016-09-20 Thread abellina
GitHub user abellina opened a pull request:

https://github.com/apache/storm/pull/1695

STORM-2101: fixes npe in compute-executors in nimbus



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abellina/storm 
STORM-2101_compute_executors_npe_in_nimbus_1x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1695






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1694: STORM-2101: fixes npe in compute-executors in nimb...

2016-09-20 Thread abellina
GitHub user abellina opened a pull request:

https://github.com/apache/storm/pull/1694

STORM-2101: fixes npe in compute-executors in nimbus

We observed compute-executors can throw NPE while shutting down a topology 
sporadically. 

joinMaps (join-maps) will NPE if one of the maps is nil. The only one that 
can be nil in this case is component->executors.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abellina/storm 
STORM-2101_compute_executors_npe_in_nimbus

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1694.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1694


commit b0944100e339f6d28c6c7bb936bef8ebb1948d89
Author: Alessandro Bellina 
Date:   2016-09-19T18:33:37Z

STORM-2101: fixes npe in compute-executors in nimbus




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79575563
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -277,7 +277,10 @@ private void emit() {
 waitingToEmit.remove();
 }
 
-// emits one tuple per record
+/**
+ * Emits one tuple per record
+ * @return True if tuple was emitted else false
--- End diff --

This method returns void. What does the @return True comment mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79575910
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -451,11 +454,11 @@ public int compare(KafkaSpoutMessageId m1, 
KafkaSpoutMessageId m2) {
 /**
  * This class is not thread safe
  */
-private class OffsetEntry {
+class OffsetEntry {
 private final TopicPartition tp;
 private final long initialFetchOffset;  /* First offset to be 
fetched. It is either set to the beginning, end, or to the first uncommitted 
offset.
  * Initial value depends 
on offset strategy. See KafkaSpoutConsumerRebalanceListener */
-private long committedOffset; // last offset committed to 
Kafka. Initially it is set to fetchOffset - 1
+long committedOffset; // last offset committed to Kafka. 
Initially it is set to fetchOffset - 1
--- End diff --

It violates encapsulation to make this field non private.

Is this package protected for test purposes? If so, wouldn't it be better 
to use the testing tool reflection based methods to test for internal state, 
rather than exposing the state to the outside. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79574175
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -266,26 +266,32 @@ private void doSeekRetriableTopicPartitions() {
 if (offsetAndMeta != null) {
 kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1);  // 
seek to the next offset that is ready to commit in next commit cycle
 } else {
-kafkaConsumer.seekToEnd(toArrayList(rtp));// Seek to 
last committed offset
+kafkaConsumer.seek(rtp, acked.get(rtp).committedOffset + 
1);// Seek to last committed offset
--- End diff --

@jfenc91 Why is seekToEnd not correct? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79576337
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -479,16 +482,17 @@ public OffsetAndMetadata findNextCommitOffset() {
 KafkaSpoutMessageId nextCommitMsg = null; // this is a 
convenience variable to make it faster to create OffsetAndMetadata
 
 for (KafkaSpoutMessageId currAckedMsg : ackedMsgs) {  // 
complexity is that of a linear scan on a TreeMap
-if ((currOffset = currAckedMsg.offset()) == 
initialFetchOffset || currOffset == nextCommitOffset + 1) {// found 
the next offset to commit
+if ((currOffset = currAckedMsg.offset()) == 
nextCommitOffset + 1) {// found the next offset to commit
--- End diff --

With this change is the code going to work for a topic that does not yet 
have any records and/or commits in it?

I recall testing the condition using initialFetchOffset was necessary. Why 
isn't it necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r79576762
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -479,16 +487,17 @@ public OffsetAndMetadata findNextCommitOffset() {
 KafkaSpoutMessageId nextCommitMsg = null; // this is a 
convenience variable to make it faster to create OffsetAndMetadata
 
 for (KafkaSpoutMessageId currAckedMsg : ackedMsgs) {  // 
complexity is that of a linear scan on a TreeMap
-if ((currOffset = currAckedMsg.offset()) == 
initialFetchOffset || currOffset == nextCommitOffset + 1) {// found 
the next offset to commit
+if ((currOffset = currAckedMsg.offset()) == 
nextCommitOffset + 1) {// found the next offset to commit
 found = true;
 nextCommitMsg = currAckedMsg;
 nextCommitOffset = currOffset;
 } else if (currAckedMsg.offset() > nextCommitOffset + 1) { 
   // offset found is not continuous to the offsets listed to go in the next 
commit, so stop search
 LOG.debug("topic-partition [{}] has non-continuous 
offset [{}]. It will be processed in a subsequent batch.", tp, currOffset);
 break;
 } else {
-LOG.debug("topic-partition [{}] has unexpected offset 
[{}].", tp, currOffset);
-break;
+//Received a redundant ack. Ignore and continue 
processing.
--- End diff --

Do you have a test case where we can reproduce this consistently? This code 
is already  running in a large production environment, and according to the 
feedback I received, there are no issues of this nature. 

I am also a bit confused on what I mean by multiple acks. I am pretty sure 
storm guarantees that a tuple it's either acked (once), failed (once), or times 
out (which is equivalent to failing)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1605#discussion_r79571602
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -330,11 +328,9 @@ public void ack(Object messageId) {
 public void fail(Object messageId) {
 final KafkaSpoutMessageId msgId = (KafkaSpoutMessageId) messageId;
 emitted.remove(msgId);
-if (msgId.numFails() < maxRetries) {
-msgId.incrementNumFails();
-retryService.schedule(msgId);
-} else { // limit to max number of retries
-LOG.debug("Reached maximum number of retries. Message [{}] 
being marked as acked.", msgId);
+msgId.incrementNumFails();
+if (!retryService.schedule(msgId)) {
+LOG.debug("Retry service indicated message should not be 
retried. Message [{}] being marked as acked.", msgId);
--- End diff --

I would leave the message as it was before. "Reached maximum number of 
retries. Message [{}] being marked as acked."

It adds overhead and it is irrelevant to be mentioning the retry service 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1605#discussion_r79572783
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutRetryService.java
 ---
@@ -29,14 +29,18 @@
  */
 public interface KafkaSpoutRetryService extends Serializable {
 /**
- * Schedules this {@link KafkaSpoutMessageId} if not yet scheduled, or 
updates retry time if it has already been scheduled.
+ * Schedules this {@link KafkaSpoutMessageId} if not yet scheduled, or
+ * updates retry time if it has already been scheduled. May also 
indicate
--- End diff --

It may also indicate...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

2016-09-20 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1605#discussion_r79570505
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutRetryExponentialBackoff.java
 ---
@@ -144,7 +145,12 @@ public String toString() {
 /**
  * The time stamp of the next retry is scheduled according to the 
exponential backoff formula ( geometric progression):
  * nextRetry = failCount == 1 ? currentTime + initialDelay : 
currentTime + delayPeriod^(failCount-1) where failCount = 1, 2, 3, ...
- * nextRetry = Min(nextRetry, currentTime + maxDelay)
+ * nextRetry = Min(nextRetry, currentTime + maxDelay).
+ * 
+ * While retrying a record, no new records are committed until the 
previous polled records have been acked. This guarantees at once delivery of
--- End diff --

This comment should probably not go here, but rather somewhere in the Spout 
code. (retry service knows nothing about exactly once or at most once semantics)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---