date:20170628

[GitHub] storm issue #2170: [STORM-2564] We should provide a template for storm-clust...

2017-06-28 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/storm/pull/2170
  
@HeartSaVioR 
It says "Note that if this is set to something with a secret (as when using 
digest authentication) then it should only be set in the 
storm-cluster-auth.yaml file." in Config.java,line 1082. So I think we should 
provide  
a  template.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2165: STORM-2558: Port storm.sh to Powershell and remove outdat...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2165
  
Oops I left a review comment so please treat this as revoking +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2165: STORM-2558: Port storm.sh to Powershell and remove...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/2165#discussion_r124716042
  
--- Diff: bin/storm.py ---
@@ -94,7 +94,7 @@ def init_storm_env():
 CONFFILE = ""
 JAR_JVM_OPTS = shlex.split(os.getenv('STORM_JAR_JVM_OPTS', ''))
 JAVA_HOME = os.getenv('JAVA_HOME', None)
-JAVA_CMD = 'java' if not JAVA_HOME else os.path.join(JAVA_HOME, 'bin', 
'java')
+JAVA_CMD = 'java' if not JAVA_HOME else os.path.join(JAVA_HOME, 'bin', 
'java' if not is_windows() else "java.exe")
--- End diff --

This one-liner becomes a bit complicated now. Could we expand this to 
multiple lines? 
And isn't the difference between non-windows and windows be 'java' or 
'java.exe'? I mean `java vs java.exe` or `/bin/java vs 
\bin\java.exe` might be correct whereas this line doesn't do the 
thing. Please let me know if I'm missing here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2165: STORM-2558: Port storm.sh to Powershell and remove outdat...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2165
  
OK then I would also +1 given that the patch is confirmed to be manually 
tested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2178: STORM-2599: Fix BasicContainer wildcard classpath ...

2017-06-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/2178


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2178: STORM-2599: Fix BasicContainer wildcard classpath on Wind...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2178
  
Just confirmed that this issue also affects 1.x-branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2166: [STORM-2559] There are three configurations in def...

2017-06-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/2166


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2142: MINOR: Fix pacemaker_state_factory.clj not compile proble...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2142
  
@chengxinglin Kindly reminder.
@revans2 If @chengxinglin doesn't respond and if you think this is a 
blocker, could you craft pull request after waiting a bit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2149: STORM-2503: Fix lgtm.com alerts on equality and compariso...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2149
  
@adityasharad 
I guess @revans2 put up a pull request to the your repo. Could you please 
merge and update this PR? Thanks in advance!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2179: STORM-2601: add the timeout parameter to the method of ge...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2179
  
+1
@MichealShin Could you change the commit title to include STORM-2601 so 
that we can track easily? Thanks in advance!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2178: STORM-2599: Fix BasicContainer wildcard classpath on Wind...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2178
  
@srdo 
+1
I guess that 1.x branch is also affected, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

DRPC problem

2017-06-28 Thread sam mohel

I submitted topology in local without any problem , but in production mode
i couldn't as you can see in ui zeros values in columns except execute
columns .
i got after sometimes in terminal drpcexecutionexception(msg:request timed
out)
my configurations are
Machine A and Machine B
storm.yaml in Machine A is
storm.zookeeper.servers:
 - "192.168.x.x"

 nimbus.host : "192.168.x.x"

 supervisor.childopts: "-Xmx4g"
 worker.childopts: "-Xmx4g"

storm.yaml in Machine B is
storm.zookeeper.servers:
 - "192.168.x.x"

 nimbus.host : "192.168.x.x"

 supervisor.childopts: "-Xmx4g"
 worker.childopts: "-Xmx4g"

i set drpc in the code

Config conf = new Config();
List dprcServers = new ArrayList();
 dprcServers.add("192.168.x.x");
conf.put(Config.DRPC_SERVERS, dprcServers);
conf.put(Config.DRPC_PORT, 3772);
// distributed mode
Config conf = createTopologyConfiguration(prop, true);
LocalDRPC drpc = null;
StormSubmitter.submitTopology(args[0], conf, buildTopology(drpc));
 client=new DRPCClient("192.168.x.x", 3772);

i used same ip address for storm.zookeeper.servers ,  nimbus.host
,dprcServers and DRPCClient . Is that wrong ?
and i ran nimbus , drpc,ui in Machine A ,   I ran supervisor in Machine B
i appreciate any help , Thanks

[GitHub] storm issue #2170: [STORM-2564] We should provide a template for storm-clust...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2170
  
If this is needed, why not adding storm.yaml.example and commenting it out? 
I don't think users want to copy a line from template file and paste to their 
storm.yaml. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2173: STORM-2597: Don't parse passed in class paths

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2173
  
@revans2 @harshach 
I've just played with get_wildcard_dir() and found another issues. Please 
refer below:

```
>>> import os
>>> def get_wildcard_dir(path):
... if os.path.isdir(path):
... ret = [(os.path.join(path, "*"))]
... elif os.path.exists(path):
... ret = [path]
... return ret
...
>>> get_wildcard_dir("~/WorkArea")
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 6, in get_wildcard_dir
UnboundLocalError: local variable 'ret' referenced before assignment
>>> get_wildcard_dir("/Users/jlim/WorkArea")
['/Users/jlim/WorkArea/*']
>>> get_wildcard_dir("/Users/jlim/WorkArea/*")
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 6, in get_wildcard_dir
UnboundLocalError: local variable 'ret' referenced before assignment
```

So `get_wildcard_dir()` just fails when a part of classpath contains not a 
dir nor a existing file. 
IIRC we don't guide the format of `STORM_EXT_CLASSPATH` and 
`STORM_EXT_CLASSPATH_DAEMON` and then what I expect is the format of Java class 
path pattern.

I'm +1 for the change. This is a bug hence actually need to port back to 
1.x version line, but it might break backward compatibility so not having 
strong opinion about back port.

Maybe need to fix `get_wildcard_dir()` as well. Which value we want to 
return when it's not a dir nor an existing file? Same return value as an 
existing file? `[]` to effectively get rid of input to the classpath?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2180: [STORM-2602] storm.zookeeper.topology.auth.payload doesn'...

2017-06-28 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/storm/pull/2180
  
@HeartSaVioR 
I want to use my payload by setting the configuration which named 
"STORM_ZOOKEEPER_TOPOLOGY_AUTH_PAYLOAD",but it doesn't work.The payload of any 
topology always be a uuid generated by the method which named 
generateZookeeperDigestSecretPayload().It isn't difficult to judge the logic of 
"prepareZookeeperAuthentication" was defective.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2180: [STORM-2602] storm.zookeeper.topology.auth.payload doesn'...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2180
  
Sorry I don't know the detail of this part and not sure this is a bug. Have 
you faced specific issue regarding this ug?
cc. @revans2 I guess you might know about the detail, though the code block 
is 3 years old.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2166: [STORM-2559] There are three configurations in defaults.y...

2017-06-28 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/storm/pull/2166
  
@HeartSaVioR 
Please help me merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2166: [STORM-2559] There are three configurations in defaults.y...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2166
  
Sorry I already reviewed this but forgot to comment. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2172: STORM-2369 [storm-redis] Use binary type for State manage...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2172
  
Filed an issue for versioning state: 
https://issues.apache.org/jira/browse/STORM-2605


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

difference between local and production mode

2017-06-28 Thread sam mohel

I'm confusing in something i'm facing now . i submitted topology in local
mode and worked well , but in production not due to garbage collector that
means need more RAM !!  How topology worked in local with one machine well
and when i used production to distribute work among two machines for
example but need more RAM .

Am i don't get it well ? or what ? Thanks for any help

[GitHub] storm issue #1674: STORM-2083: Blacklist scheduler

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1674
  
@nilday Sure, please take your time. Thanks for keeping interest of 
contributing. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1674: STORM-2083: Blacklist scheduler

2017-06-28 Thread nilday

Github user nilday commented on the issue:

https://github.com/apache/storm/pull/1674
  
@HeartSaVioR I'll rebase it as soon as I can. As the storm-core has been 
restructured, it may take a while.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1674: STORM-2083: Blacklist scheduler

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1674
  
This feature looks like good thing to adopt. If @nilday is still willing to 
adopt this to the Storm codebase, I'd like to spend some time to review this as 
well.
@nilday Could you rebase once more if you really don't mind? Please note 
that we split storm-core to several modules, and all of your changes need to be 
placed to `storm-server` module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Lag issues using Storm 1.1.1 latest build with StormKafkaClient 1.1.1 vs old StormKafka spouts

2017-06-28 Thread Jungtaek Lim

Hi Alexandre,

I don't know much of storm-kafka-client, but at a glimpse, I can't find
misuse of HashMap in KafkaSpout so I'd rather suspect that OffsetManager
being really huge. If you are willing to dig more on KafkaSpout OOME issue,
you can get more information of KafkaSpout for tracking with changing log
level to DEBUG or even TRACE.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 6월 29일 (목) 오전 4:58, Stig Døssing 님이 작성:

> Hi Alexandre,
>
> About issue 1:
> This issue is not by design. It is a side effect of the spout internally
> using the KafkaConsumer's subscribe API instead of the assign API. Support
> for using the assign API was added a while back, but has a bug that is
> preventing the spout from starting when configured to use that API. We are
> working on fixing the issues with that implementation in these PRs
> https://github.com/apache/storm/pull/2150, https://github.com/apache/
> storm/pull/2174 . I think it
> is very likely that we will remove support for
> the subscribe API at some point as well, making the assign API the default,
> since several users have had issues with the subscribe API's behavior.
>
> Once the assign API support is fixed, you can switch to using it via this
> KafkaSpoutConfig Builder constructor https://github.com/apache/
> storm/blob/master/external/storm-kafka-client/src/main/
> java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L136 and this
> Subscription implementation https://github.com/srdo/storm/blob/
> f524868baa5929f29b69258f0da2948d0f6e152b/external/storm-
> kafka-client/src/main/java/org/apache/storm/kafka/spout/
> ManualPartitionSubscription.java
>
> If you'd like to try out the code from the PR branch, you can check it out
> with some of the steps described here
>
> https://help.github.com/articles/checking-out-pull-requests-locally/#modifying-an-inactive-pull-request-locally
> .
>
> Note that the PRs for fixing the manual partitioning option are against the
> master branch right now, which is targeting Storm version 2.0.0, but I
> believe you may be able to use the 2.0.0 storm-kafka-client jar with a
> 1.1.0 cluster.
>
> Switching to the assign API should solve remove the instability as the
> spouts start.
>
> About issue 2:
> The spout lag shouldn't have an effect on memory use. I'm wondering if your
> spout instances are not progressing at all, which might explain the lag?
> You should be able to check this using the kafka-consumer-groups.sh script
> in the Kafka /bin directory. Once you've started the spout, you can use the
> script to inspect which offsets the consumer group have committed. Try
> checking if the offsets are moving once the spouts have started running.
>
> I can't spout any suspicious use of HashMap in the 1.x KafkaSpout. Your
> attachment didn't make it through, could you post it somewhere?
>
> About issue 3:
> The CommitException you are getting is likely because we use the subscribe
> API. When using the subscribe API Kafka is in control of partition
> assigment, and will reassign partitions if a consumer doesn't call poll on
> the consumer often enough. The default is that the consumer must be polled
> at least every 5 minutes. See max.poll.interval.ms in the Kafka consumer
> configuration. The assign API doesn't require this, and won't shut down
> spouts because they are too slow.
>
> There's likely still another underlying issue, because it seems strange
> that your spout instances are not polling at least once per 5 minutes, at
> least if you didn't set a high max.poll.records. It's almost certainly
> related to issue 2. Do you have acking enabled, and what is your
> topology.message.timeout.secs?
>
> I'm not sure I understand why you would want to write your own spout from
> scratch, rather than contributing fixes to the storm-kafka-client spout. It
> seems likely to be more effort than fixing the problems with
> storm-kafka-client.
>
> 2017-06-28 14:25 GMT+02:00 Alexandre Vermeerbergen <
> avermeerber...@gmail.com
> >:
>
> > More about this thread: we noticed that with StormKafkaClient 1.1.x
> > latest, we get OutOfMemoryError after ~2hours of running our simple test
> > topology.
> >
> > We reproduce it everytime, so we decided to generate a heap dump before
> > the OutOfMemoryError, and viewed the result using EclipseMAT.
> >
> > The results tends to show that there's a memory leak in KafkaSpoutClient:
> > =
> >
> > One instance of *"org.apache.storm.kafka.spout.KafkaSpout"* loaded by
> *"sun.misc.Launcher$AppClassLoader
> > @ 0x80023d98"* occupies *1,662,509,664 (93.87%)* bytes. The memory is
> > accumulated in one instance of *"java.util.HashMap$Node[]"* loaded by
> *" > class loader>"*.
> >
> > *Keywords*
> > sun.misc.Launcher$AppClassLoader @ 0x80023d98
> > java.util.HashMap$Node[]
> > org.apache.storm.kafka.spout.KafkaSpout
> >
> >

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Alexandre Vermeerbergen

Hi Hugo,

Thanks for your concerns about our troubles with the new storm-kafka-client.

Our "bench" is based on our live production data of our cloud supervision
system, collecting  at least 1million metrics/min in our Kafka Brokers
cluster (currently based on Kafka 0.10.1.0, with "compatibility flag
active").

More details are available in the thread in the same dev list entitled "Lag
issues using Storm 1.1.1 latest build with StormKafkaClient 1.1.1 vs old
StormKafka spouts".

The latest post on this thread from Stig Døssing is giving me back some
hope to see some progress in understanding our issues.

My point about writing our own Spout come from our past experience: we've
been using Kafka for a very long time in our supervision application. Way
before we decided to use Storm, we had our own Java daemons consuming the
same topics as today, doing some evaluation and writing them into an
in-memory store for later consumption by our web services - see this as
a"poor man's streaming system" ;-)  In this legacy code, the part in charge
of consuming data from Kafka wasn't the most complex which we had: a small
pool of threads using the old Kafka consumer API... so maybe I'm wrong, but
for *this* purpose I do not feel like writing a Spout consuming a few
topics to be a big effort. But of course, if we do that, then we'll miss
the fancy integration in StormUI, flux, and the ability to subscribe to
multiple topics based on a wildcard expression.

So we're going to carefully dig into Stig's answer, and probably provide
more details on our bench before jumping into our home-brewed Kafka
spout... then I'll have to make some decision based on how much progress we
have vs time remaining before our next delivery gate.

Hope it clarifies my position with regard to storm-kafka-client

Best regards,
Alexandre

2017-06-28 22:49 GMT+02:00 Hugo Da Cruz Louro :

> Hi Alexandre,
>
> In my benchmarks the storm-kafka-client spout improves throughput by 70%
> and latency by 40% vs the storm-kafka implementation. I am surprised by
> your findings substantiating the opposite. Can you share your benchmark
> where you compare the performances of both implementations?
>
> As for you writing your own version of the spout. Why not contribute to
> this one instead? Do you think the implementation is that poor? If so, why
> do you think it is that poor? Do you expect your first version to be much
> better than a version that is already in production in several customers,
> and seems to be working fairly well?
>
> All the bugs found so far have been addressed. Since it’s a new feature
> there may be a few bugs - it is expected. However, I don’t think that it is
> as bad as you make it sound as there are several people using it in
> production for extended periods of time.
>
> Cheers
>
> > On Jun 28, 2017, at 12:04 PM, Alexandre Vermeerbergen <
> avermeerber...@gmail.com> wrote:
> >
> > Hello,
> >
> > If that matters, our current experiences with StormKafkaClient
> > isdisappointing (see my recent posts "Lag issues using Storm 1.1.1 latest
> > build with StormKafkaClient 1.1.1 vs old StormKafka spouts" in this
> mailing
> > list).
> >
> > Our current experience is that the old StormKafka spout always beats the
> > new one in term of performance & stability.
> >
> > Therefore, I am surprised when I see talks about deprecation of the old
> > StormKafka spout when the new one which just came "General Available"
> with
> > Storm 1.1.0, is not stable, and it's not better when we try it from
> current
> > 1.1.x builds to take into account recently closed JIRAs.
> >
> > We're even considering writing our own Kafka spout with Kafka 0.10.x API
> to
> > overcome the incompatibility of the old StormKafka spout with Kafka 0.10
> > libraries.
> >
> > Thus, for people which are comfortable with old Kafka spout, I'like to
> give
> > a -1 (non binding) to the proposal of withdrawal of the old StormKafka
> > spout until the new one converges.
> >
> > Best regards,
> > Alexandre Vermeerbergen
> >
> >
> > 2017-06-28 19:40 GMT+02:00 P. Taylor Goetz :
> >
> >>
> >>> On Jun 28, 2017, at 1:16 PM, Hugo Da Cruz Louro <
> hlo...@hortonworks.com>
> >> wrote:
> >>>
> >>> I still need to go over the entire discussion thread in more detail,
> but
> >> one thing I would like to bring up right way is the proposal to
> DEPRECATE,
> >> and eventually remove, the KafkaSpout with the old Kafka Consumer APIs.
> The
> >> storm-kafka-client KafkaSpout is getting stabilized, and I think we are
> all
> >> in agreement that the storm-kafka KafkaSpout has presented continuous
> >> maintainability problems with some fixes that got in not being backwards
> >> compatible.
> >>
> >> I’m fine with deprecating the old KafkaSpout, but I feel the decision to
> >> actually remove it needs to take into account the user community. The
> main
> >> sticking point here is compatibility with earlier versions of Kafka.
> Like
> >> with JDK versions,

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Hugo Da Cruz Louro

Hi Alexandre,

In my benchmarks the storm-kafka-client spout improves throughput by 70% and 
latency by 40% vs the storm-kafka implementation. I am surprised by your 
findings substantiating the opposite. Can you share your benchmark where you 
compare the performances of both implementations?

As for you writing your own version of the spout. Why not contribute to this 
one instead? Do you think the implementation is that poor? If so, why do you 
think it is that poor? Do you expect your first version to be much better than 
a version that is already in production in several customers, and seems to be 
working fairly well?

All the bugs found so far have been addressed. Since it’s a new feature there 
may be a few bugs - it is expected. However, I don’t think that it is as bad as 
you make it sound as there are several people using it in production for 
extended periods of time.

Cheers

> On Jun 28, 2017, at 12:04 PM, Alexandre Vermeerbergen 
>  wrote:
> 
> Hello,
> 
> If that matters, our current experiences with StormKafkaClient
> isdisappointing (see my recent posts "Lag issues using Storm 1.1.1 latest
> build with StormKafkaClient 1.1.1 vs old StormKafka spouts" in this mailing
> list).
> 
> Our current experience is that the old StormKafka spout always beats the
> new one in term of performance & stability.
> 
> Therefore, I am surprised when I see talks about deprecation of the old
> StormKafka spout when the new one which just came "General Available" with
> Storm 1.1.0, is not stable, and it's not better when we try it from current
> 1.1.x builds to take into account recently closed JIRAs.
> 
> We're even considering writing our own Kafka spout with Kafka 0.10.x API to
> overcome the incompatibility of the old StormKafka spout with Kafka 0.10
> libraries.
> 
> Thus, for people which are comfortable with old Kafka spout, I'like to give
> a -1 (non binding) to the proposal of withdrawal of the old StormKafka
> spout until the new one converges.
> 
> Best regards,
> Alexandre Vermeerbergen
> 
> 
> 2017-06-28 19:40 GMT+02:00 P. Taylor Goetz :
> 
>> 
>>> On Jun 28, 2017, at 1:16 PM, Hugo Da Cruz Louro 
>> wrote:
>>> 
>>> I still need to go over the entire discussion thread in more detail, but
>> one thing I would like to bring up right way is the proposal to DEPRECATE,
>> and eventually remove, the KafkaSpout with the old Kafka Consumer APIs. The
>> storm-kafka-client KafkaSpout is getting stabilized, and I think we are all
>> in agreement that the storm-kafka KafkaSpout has presented continuous
>> maintainability problems with some fixes that got in not being backwards
>> compatible.
>> 
>> I’m fine with deprecating the old KafkaSpout, but I feel the decision to
>> actually remove it needs to take into account the user community. The main
>> sticking point here is compatibility with earlier versions of Kafka. Like
>> with JDK versions, there are many valid reasons whey users may not be in a
>> position to upgrade to a newer version of Kafka. Outright removal could
>> leave some users in the lurch.
>> 
>> Ideally, we could just poll the user community to get an idea of how much
>> of the user base depends on the old KafkaSpout and use the results to guide
>> our decision. Unfortunately, at least in my past experience, polling the
>> user@ list doesn’t elicit much of a response and the results don’t
>> provide an accurate view of the user community.
>> 
>> 
>>> 
>>> I am pretty confident how things are looking at this point for the
>> KafkaSpout. The Trident Kafka Spout is likely in between alpha and beta,
>> and that should be taken into account. I just recently submitted a PR<
>> https://github.com/apache/storm/pull/2174> with some improvements to the
>> Trident Kafka Spout (including the refactoring done to support manual
>> partition assignment), and there are some customers using it in
>> pre-production. However, it definitely would benefit from some more testing.
>>> 
>>> Thanks,
>>> Hugo
>> 
>> -Taylor
>> 
>> 
>>> 
>>> On Jun 28, 2017, at 7:48 AM, Bobby Evans > mailto:ev...@yahoo-inc.com.INVALID>> wrote:
>>> 
>>> +1.
>>> If the 1.1 and 1.2 lines start to become difficult to maintain we can
>> look at putting them in maintenance mode too once we have a 2.x release.
>>> I am a little nervous about merging a new feature into 1.x branch
>> without first going to master, but I hope that it will not be too much work
>> to port it to master, and I trust the devs on that branch to do the right
>> thing.
>>> On a related note we have not done much with feature branches before so
>> I am not sure what we want to do about merging in the new metrics API
>> branch to 1.x.  I know for me I have not had time to keep up with the
>> development work going on there.  I would at least like to have a pull
>> request put up for review before we merge it in.  This would fit with our
>> current bylaws that do not

Drpc configuration

2017-06-28 Thread sam mohel

I need to know configurations of drpc as I read more about it and every
time I read different settings.
Now if I have 2 machines . Machine A and B . In machine A I will run nimbus
and drpc and ui . But in B I will run supervisor .
In the code I made drpc.servers has IP address of A and drpc client has
same IP too. Nimbus.host has same IP and zookeeper.severs too . And in
/etc/hosts I defined IP address with name and IP address of B with it's
name and this copied to /etc/hosts in B too . I didn't receive results
after this configurations . Is something wrong here ?

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread P. Taylor Goetz

> On Jun 28, 2017, at 4:01 PM, Jungtaek Lim  wrote:
> 
> If my memory is right, when releasing 1.1.0 we postponed resolving some
> critical issues for storm-kafka-client, and seems like it still haven't
> been sorted out. I even think these issues can be effectively blocker for
> 1.1.1 if we want to treat this module as stable.
> 
> So in fact and ideally, it should've been marked as experimental or beta
> until it has no more critical issue, and I feel it's still in beta phase.

+1

IIRC, I think we had one issue opened during the voting phase when we released 
1.1.0, and it has since snowballed somewhat.

For a 1.1.1 release I think we need to make sure it is solid, and if not, 
retroactively give it a beta/experimental/etc. label in the release 
announcement.

-Taylor

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Jungtaek Lim

-1 to deprecate the old storm-kafka.

I don't feel storm-kafka-client is stable given that it has some critical
issues, and the module doesn't have enough volunteer committers to be
stabilized faster as it should be.
If my memory is right, when releasing 1.1.0 we postponed resolving some
critical issues for storm-kafka-client, and seems like it still haven't
been sorted out. I even think these issues can be effectively blocker for
1.1.1 if we want to treat this module as stable.

So in fact and ideally, it should've been marked as experimental or beta
until it has no more critical issue, and I feel it's still in beta phase.

- Jungtaek Lim (HeartSaVioR)
On Thu, 29 Jun 2017 at 4:04 AM Alexandre Vermeerbergen <
avermeerber...@gmail.com> wrote:

> Hello,
>
> If that matters, our current experiences with StormKafkaClient
> isdisappointing (see my recent posts "Lag issues using Storm 1.1.1 latest
> build with StormKafkaClient 1.1.1 vs old StormKafka spouts" in this mailing
> list).
>
> Our current experience is that the old StormKafka spout always beats the
> new one in term of performance & stability.
>
> Therefore, I am surprised when I see talks about deprecation of the old
> StormKafka spout when the new one which just came "General Available" with
> Storm 1.1.0, is not stable, and it's not better when we try it from current
> 1.1.x builds to take into account recently closed JIRAs.
>
> We're even considering writing our own Kafka spout with Kafka 0.10.x API to
> overcome the incompatibility of the old StormKafka spout with Kafka 0.10
> libraries.
>
> Thus, for people which are comfortable with old Kafka spout, I'like to give
> a -1 (non binding) to the proposal of withdrawal of the old StormKafka
> spout until the new one converges.
>
> Best regards,
> Alexandre Vermeerbergen
>
>
> 2017-06-28 19:40 GMT+02:00 P. Taylor Goetz :
>
> >
> > > On Jun 28, 2017, at 1:16 PM, Hugo Da Cruz Louro <
> hlo...@hortonworks.com>
> > wrote:
> > >
> > > I still need to go over the entire discussion thread in more detail,
> but
> > one thing I would like to bring up right way is the proposal to
> DEPRECATE,
> > and eventually remove, the KafkaSpout with the old Kafka Consumer APIs.
> The
> > storm-kafka-client KafkaSpout is getting stabilized, and I think we are
> all
> > in agreement that the storm-kafka KafkaSpout has presented continuous
> > maintainability problems with some fixes that got in not being backwards
> > compatible.
> >
> > I’m fine with deprecating the old KafkaSpout, but I feel the decision to
> > actually remove it needs to take into account the user community. The
> main
> > sticking point here is compatibility with earlier versions of Kafka. Like
> > with JDK versions, there are many valid reasons whey users may not be in
> a
> > position to upgrade to a newer version of Kafka. Outright removal could
> > leave some users in the lurch.
> >
> > Ideally, we could just poll the user community to get an idea of how much
> > of the user base depends on the old KafkaSpout and use the results to
> guide
> > our decision. Unfortunately, at least in my past experience, polling the
> > user@ list doesn’t elicit much of a response and the results don’t
> > provide an accurate view of the user community.
> >
> >
> > >
> > > I am pretty confident how things are looking at this point for the
> > KafkaSpout. The Trident Kafka Spout is likely in between alpha and beta,
> > and that should be taken into account. I just recently submitted a PR<
> > https://github.com/apache/storm/pull/2174> with some improvements to the
> > Trident Kafka Spout (including the refactoring done to support manual
> > partition assignment), and there are some customers using it in
> > pre-production. However, it definitely would benefit from some more
> testing.
> > >
> > > Thanks,
> > > Hugo
> >
> > -Taylor
> >
> >
> > >
> > > On Jun 28, 2017, at 7:48 AM, Bobby Evans  > mailto:ev...@yahoo-inc.com.INVALID>> wrote:
> > >
> > > +1.
> > > If the 1.1 and 1.2 lines start to become difficult to maintain we can
> > look at putting them in maintenance mode too once we have a 2.x release.
> > > I am a little nervous about merging a new feature into 1.x branch
> > without first going to master, but I hope that it will not be too much
> work
> > to port it to master, and I trust the devs on that branch to do the right
> > thing.
> > > On a related note we have not done much with feature branches before so
> > I am not sure what we want to do about merging in the new metrics API
> > branch to 1.x.  I know for me I have not had time to keep up with the
> > development work going on there.  I would at least like to have a pull
> > request put up for review before we merge it in.  This would fit with our
> > current bylaws that do not mention feature branches.  If all of the
> changes
> > have already followed the review process then technically I think it is
> OK
> > to just merge it in, but I still would

Re: Lag issues using Storm 1.1.1 latest build with StormKafkaClient 1.1.1 vs old StormKafka spouts

2017-06-28 Thread Stig Døssing

Hi Alexandre,

About issue 1:
This issue is not by design. It is a side effect of the spout internally
using the KafkaConsumer's subscribe API instead of the assign API. Support
for using the assign API was added a while back, but has a bug that is
preventing the spout from starting when configured to use that API. We are
working on fixing the issues with that implementation in these PRs
https://github.com/apache/storm/pull/2150, https://github.com/apache/
storm/pull/2174. I think it is very likely that we will remove support for
the subscribe API at some point as well, making the assign API the default,
since several users have had issues with the subscribe API's behavior.

Once the assign API support is fixed, you can switch to using it via this
KafkaSpoutConfig Builder constructor https://github.com/apache/
storm/blob/master/external/storm-kafka-client/src/main/
java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L136 and this
Subscription implementation https://github.com/srdo/storm/blob/
f524868baa5929f29b69258f0da2948d0f6e152b/external/storm-
kafka-client/src/main/java/org/apache/storm/kafka/spout/
ManualPartitionSubscription.java

If you'd like to try out the code from the PR branch, you can check it out
with some of the steps described here
https://help.github.com/articles/checking-out-pull-requests-locally/#modifying-an-inactive-pull-request-locally
.

Note that the PRs for fixing the manual partitioning option are against the
master branch right now, which is targeting Storm version 2.0.0, but I
believe you may be able to use the 2.0.0 storm-kafka-client jar with a
1.1.0 cluster.

Switching to the assign API should solve remove the instability as the
spouts start.

About issue 2:
The spout lag shouldn't have an effect on memory use. I'm wondering if your
spout instances are not progressing at all, which might explain the lag?
You should be able to check this using the kafka-consumer-groups.sh script
in the Kafka /bin directory. Once you've started the spout, you can use the
script to inspect which offsets the consumer group have committed. Try
checking if the offsets are moving once the spouts have started running.

I can't spout any suspicious use of HashMap in the 1.x KafkaSpout. Your
attachment didn't make it through, could you post it somewhere?

About issue 3:
The CommitException you are getting is likely because we use the subscribe
API. When using the subscribe API Kafka is in control of partition
assigment, and will reassign partitions if a consumer doesn't call poll on
the consumer often enough. The default is that the consumer must be polled
at least every 5 minutes. See max.poll.interval.ms in the Kafka consumer
configuration. The assign API doesn't require this, and won't shut down
spouts because they are too slow.

There's likely still another underlying issue, because it seems strange
that your spout instances are not polling at least once per 5 minutes, at
least if you didn't set a high max.poll.records. It's almost certainly
related to issue 2. Do you have acking enabled, and what is your
topology.message.timeout.secs?

I'm not sure I understand why you would want to write your own spout from
scratch, rather than contributing fixes to the storm-kafka-client spout. It
seems likely to be more effort than fixing the problems with
storm-kafka-client.

2017-06-28 14:25 GMT+02:00 Alexandre Vermeerbergen :

> More about this thread: we noticed that with StormKafkaClient 1.1.x
> latest, we get OutOfMemoryError after ~2hours of running our simple test
> topology.
>
> We reproduce it everytime, so we decided to generate a heap dump before
> the OutOfMemoryError, and viewed the result using EclipseMAT.
>
> The results tends to show that there's a memory leak in KafkaSpoutClient:
> =
>
> One instance of *"org.apache.storm.kafka.spout.KafkaSpout"* loaded by 
> *"sun.misc.Launcher$AppClassLoader
> @ 0x80023d98"* occupies *1,662,509,664 (93.87%)* bytes. The memory is
> accumulated in one instance of *"java.util.HashMap$Node[]"* loaded by 
> *" class loader>"*.
>
> *Keywords*
> sun.misc.Launcher$AppClassLoader @ 0x80023d98
> java.util.HashMap$Node[]
> org.apache.storm.kafka.spout.KafkaSpout
>
> =
>
> See attached screenshots of EclipseMAT session showing graphical
> representation of memory usage
>
> FYI we tried to follow instructions from https://docs.hortonworks.com/
> HDPDocuments/HDP2/HDP-2.5.5/bk_storm-component-guide/
> content/storm-kafkaspout-perf.html to avoid the use of too much memory,
> but still after 2 hours the memory fills up and the process hosting our
> spout is killed by Supervisor...
>
> Any clue of what we may have missed?
>
> Best regards,
> Alexandre Vermeerbergen
>
>
>
> 2017-06-28 <20%2017%2006%2028> 9:17 GMT+02:00 Alexandre Vermeerbergen <
> avermeerber...@gmail.com>:
>
>> Oops, sent my last mail too

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread P. Taylor Goetz

Hi Alexandre,

Thanks for your input.

I think we’re very much on the same page. The new Kafka spout needs to on par 
with the old one in terms of performance and stability before we even think 
about deprecation, let alone removal. I’ve heard a lot of complaints both 
publicly and privately about issues with the new spout. In retrospect I think 
we released it prematurely and would have been better off releasing it with 
some sort of “beta” or “preview” caveat until it further stabilized. Hindsight 
is 20/20. I’m hoping the fixes coming in the 1.1.1 release will ease some of 
the pain.

When I say I’m okay with deprecating the old spout, I’m saying that I’m okay 
with signaling to users that eventually in the (potentially very distant) 
future the old spout may be dropped. There are components of Storm that were 
deprecated 5 years ago that are still around, and I’m okay with that. IMHO it’s 
way too early to be discussing removal.

-Taylor

> On Jun 28, 2017, at 3:04 PM, Alexandre Vermeerbergen 
>  wrote:
> 
> Hello,
> 
> If that matters, our current experiences with StormKafkaClient
> isdisappointing (see my recent posts "Lag issues using Storm 1.1.1 latest
> build with StormKafkaClient 1.1.1 vs old StormKafka spouts" in this mailing
> list).
> 
> Our current experience is that the old StormKafka spout always beats the
> new one in term of performance & stability.
> 
> Therefore, I am surprised when I see talks about deprecation of the old
> StormKafka spout when the new one which just came "General Available" with
> Storm 1.1.0, is not stable, and it's not better when we try it from current
> 1.1.x builds to take into account recently closed JIRAs.
> 
> We're even considering writing our own Kafka spout with Kafka 0.10.x API to
> overcome the incompatibility of the old StormKafka spout with Kafka 0.10
> libraries.
> 
> Thus, for people which are comfortable with old Kafka spout, I'like to give
> a -1 (non binding) to the proposal of withdrawal of the old StormKafka
> spout until the new one converges.
> 
> Best regards,
> Alexandre Vermeerbergen
> 
> 
> 2017-06-28 19:40 GMT+02:00 P. Taylor Goetz :
> 
>> 
>>> On Jun 28, 2017, at 1:16 PM, Hugo Da Cruz Louro 
>> wrote:
>>> 
>>> I still need to go over the entire discussion thread in more detail, but
>> one thing I would like to bring up right way is the proposal to DEPRECATE,
>> and eventually remove, the KafkaSpout with the old Kafka Consumer APIs. The
>> storm-kafka-client KafkaSpout is getting stabilized, and I think we are all
>> in agreement that the storm-kafka KafkaSpout has presented continuous
>> maintainability problems with some fixes that got in not being backwards
>> compatible.
>> 
>> I’m fine with deprecating the old KafkaSpout, but I feel the decision to
>> actually remove it needs to take into account the user community. The main
>> sticking point here is compatibility with earlier versions of Kafka. Like
>> with JDK versions, there are many valid reasons whey users may not be in a
>> position to upgrade to a newer version of Kafka. Outright removal could
>> leave some users in the lurch.
>> 
>> Ideally, we could just poll the user community to get an idea of how much
>> of the user base depends on the old KafkaSpout and use the results to guide
>> our decision. Unfortunately, at least in my past experience, polling the
>> user@ list doesn’t elicit much of a response and the results don’t
>> provide an accurate view of the user community.
>> 
>> 
>>> 
>>> I am pretty confident how things are looking at this point for the
>> KafkaSpout. The Trident Kafka Spout is likely in between alpha and beta,
>> and that should be taken into account. I just recently submitted a PR<
>> https://github.com/apache/storm/pull/2174> with some improvements to the
>> Trident Kafka Spout (including the refactoring done to support manual
>> partition assignment), and there are some customers using it in
>> pre-production. However, it definitely would benefit from some more testing.
>>> 
>>> Thanks,
>>> Hugo
>> 
>> -Taylor
>> 
>> 
>>> 
>>> On Jun 28, 2017, at 7:48 AM, Bobby Evans > mailto:ev...@yahoo-inc.com.INVALID>> wrote:
>>> 
>>> +1.
>>> If the 1.1 and 1.2 lines start to become difficult to maintain we can
>> look at putting them in maintenance mode too once we have a 2.x release.
>>> I am a little nervous about merging a new feature into 1.x branch
>> without first going to master, but I hope that it will not be too much work
>> to port it to master, and I trust the devs on that branch to do the right
>> thing.
>>> On a related note we have not done much with feature branches before so
>> I am not sure what we want to do about merging in the new metrics API
>> branch to 1.x.  I know for me I have not had time to keep up with the
>> development work going on there.  I would at least like to have a pull
>> request put up for review before we merge it in.  This

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Alexandre Vermeerbergen

Hello,

If that matters, our current experiences with StormKafkaClient
isdisappointing (see my recent posts "Lag issues using Storm 1.1.1 latest
build with StormKafkaClient 1.1.1 vs old StormKafka spouts" in this mailing
list).

Our current experience is that the old StormKafka spout always beats the
new one in term of performance & stability.

Therefore, I am surprised when I see talks about deprecation of the old
StormKafka spout when the new one which just came "General Available" with
Storm 1.1.0, is not stable, and it's not better when we try it from current
1.1.x builds to take into account recently closed JIRAs.

We're even considering writing our own Kafka spout with Kafka 0.10.x API to
overcome the incompatibility of the old StormKafka spout with Kafka 0.10
libraries.

Thus, for people which are comfortable with old Kafka spout, I'like to give
a -1 (non binding) to the proposal of withdrawal of the old StormKafka
spout until the new one converges.

Best regards,
Alexandre Vermeerbergen


2017-06-28 19:40 GMT+02:00 P. Taylor Goetz :

>
> > On Jun 28, 2017, at 1:16 PM, Hugo Da Cruz Louro 
> wrote:
> >
> > I still need to go over the entire discussion thread in more detail, but
> one thing I would like to bring up right way is the proposal to DEPRECATE,
> and eventually remove, the KafkaSpout with the old Kafka Consumer APIs. The
> storm-kafka-client KafkaSpout is getting stabilized, and I think we are all
> in agreement that the storm-kafka KafkaSpout has presented continuous
> maintainability problems with some fixes that got in not being backwards
> compatible.
>
> I’m fine with deprecating the old KafkaSpout, but I feel the decision to
> actually remove it needs to take into account the user community. The main
> sticking point here is compatibility with earlier versions of Kafka. Like
> with JDK versions, there are many valid reasons whey users may not be in a
> position to upgrade to a newer version of Kafka. Outright removal could
> leave some users in the lurch.
>
> Ideally, we could just poll the user community to get an idea of how much
> of the user base depends on the old KafkaSpout and use the results to guide
> our decision. Unfortunately, at least in my past experience, polling the
> user@ list doesn’t elicit much of a response and the results don’t
> provide an accurate view of the user community.
>
>
> >
> > I am pretty confident how things are looking at this point for the
> KafkaSpout. The Trident Kafka Spout is likely in between alpha and beta,
> and that should be taken into account. I just recently submitted a PR<
> https://github.com/apache/storm/pull/2174> with some improvements to the
> Trident Kafka Spout (including the refactoring done to support manual
> partition assignment), and there are some customers using it in
> pre-production. However, it definitely would benefit from some more testing.
> >
> > Thanks,
> > Hugo
>
> -Taylor
>
>
> >
> > On Jun 28, 2017, at 7:48 AM, Bobby Evans  mailto:ev...@yahoo-inc.com.INVALID>> wrote:
> >
> > +1.
> > If the 1.1 and 1.2 lines start to become difficult to maintain we can
> look at putting them in maintenance mode too once we have a 2.x release.
> > I am a little nervous about merging a new feature into 1.x branch
> without first going to master, but I hope that it will not be too much work
> to port it to master, and I trust the devs on that branch to do the right
> thing.
> > On a related note we have not done much with feature branches before so
> I am not sure what we want to do about merging in the new metrics API
> branch to 1.x.  I know for me I have not had time to keep up with the
> development work going on there.  I would at least like to have a pull
> request put up for review before we merge it in.  This would fit with our
> current bylaws that do not mention feature branches.  If all of the changes
> have already followed the review process then technically I think it is OK
> to just merge it in, but I still would like to take some time to look at
> the changes, and especially the new APIs.
> >
> > - Bobby
> >
> >
> > On Wednesday, June 28, 2017, 1:53:34 AM CDT, Jungtaek Lim <
> kabh...@gmail.com> wrote:
> >
> > That's great news that metrics work is ready!
> >
> > I'm +1 to Taylor's proposal, but in order to respect semantic
> versioning, I
> > propose some modifications from Taylor's proposal:
> >
> > - create 1.1.x-branch with target version 1.1.1-SNAPSHOT and port back
> only
> > bug fixes to the 1.1.x-branch
> > - change the target version of 1.x-branch to 1.2.0-SNAPSHOT
> >
> > If we also agree above, I would like to volunteer the back-port work.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2017년 6월 28일 (수) 오전 10:09, Harsha  harsha.io>>님이 작성:
> >
> > +1 for above stated approach on releasing 1.2.0 with metrics
> > -Harsha
> >
> > On Tue, Jun 27, 2017, at 12:17 PM, P. Taylor

[GitHub] storm issue #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid duplicatin...

2017-06-28 Thread hmcl

Github user hmcl commented on the issue:

https://github.com/apache/storm/pull/2155
  
Yeah... I also think that's the ideal way to do it. Squash at the end and 
have a new commit addressing each batch of code review comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid duplicatin...

2017-06-28 Thread srdo

Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2155
  
I'll squash the commits soon, I just wanted people to have a chance to 
review the changes without having to read the entire diff again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid duplicatin...

2017-06-28 Thread srdo

Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2155
  
@hmcl I'm not sure how we can do that. The fields in Builder are not 
static, so if we move the default definitions there, we'd have to create a 
builder and fish out the default values in the tests. The defaults also 
wouldn't be visible on the KafkaSpoutConfig class, which would probably be a 
minus. I don't see a way to have the defaults solely in the KafkaSpoutConfig 
class with no references to them from Builder, since the Builer fields have to 
get the default values if nothing else is set.

We could move the FirstPollOffsetStrategy default into a static field in 
KafkaSpoutConfig like the rest for consistency at least?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid duplicatin...

2017-06-28 Thread hmcl

Github user hmcl commented on the issue:

https://github.com/apache/storm/pull/2155
  
@srdo thanks for addressing the code comments. It LGTM, but I forgot to 
publish the following comment - sorry for the extra overhead. Do you want to 
address it in this patch as well ?

Most of the defaults in 
[here](https://github.com/srdo/storm/blob/73adaa983cc421c7974e95011030c54f75e7698a/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L38-#L51)
 are also in 
[here](https://github.com/srdo/storm/blob/73adaa983cc421c7974e95011030c54f75e7698a/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L135-#L140).
 I think it would be better to have de defaults only in one place, ideally in 
the top level class. However, if it makes the code simpler to leave the 
defaults only in the Builder class, so be it. Neertheless, we should avoid 
having them in both places.

I also would like to ask if you can squash all the commits once we are done 
addressing everything, such that we can keep the git log clean. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid duplicatin...

2017-06-28 Thread srdo

Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2155
  
@hmcl Addressed your comments, and updated the README to reflect the API 
changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread P. Taylor Goetz


> On Jun 28, 2017, at 1:16 PM, Hugo Da Cruz Louro  
> wrote:
> 
> I still need to go over the entire discussion thread in more detail, but one 
> thing I would like to bring up right way is the proposal to DEPRECATE, and 
> eventually remove, the KafkaSpout with the old Kafka Consumer APIs. The 
> storm-kafka-client KafkaSpout is getting stabilized, and I think we are all 
> in agreement that the storm-kafka KafkaSpout has presented continuous 
> maintainability problems with some fixes that got in not being backwards 
> compatible.

I’m fine with deprecating the old KafkaSpout, but I feel the decision to 
actually remove it needs to take into account the user community. The main 
sticking point here is compatibility with earlier versions of Kafka. Like with 
JDK versions, there are many valid reasons whey users may not be in a position 
to upgrade to a newer version of Kafka. Outright removal could leave some users 
in the lurch.

Ideally, we could just poll the user community to get an idea of how much of 
the user base depends on the old KafkaSpout and use the results to guide our 
decision. Unfortunately, at least in my past experience, polling the user@ list 
doesn’t elicit much of a response and the results don’t provide an accurate 
view of the user community. 


> 
> I am pretty confident how things are looking at this point for the 
> KafkaSpout. The Trident Kafka Spout is likely in between alpha and beta, and 
> that should be taken into account. I just recently submitted a 
> PR with some improvements to the 
> Trident Kafka Spout (including the refactoring done to support manual 
> partition assignment), and there are some customers using it in 
> pre-production. However, it definitely would benefit from some more testing.
> 
> Thanks,
> Hugo

-Taylor


> 
> On Jun 28, 2017, at 7:48 AM, Bobby Evans 
> > wrote:
> 
> +1.
> If the 1.1 and 1.2 lines start to become difficult to maintain we can look at 
> putting them in maintenance mode too once we have a 2.x release.
> I am a little nervous about merging a new feature into 1.x branch without 
> first going to master, but I hope that it will not be too much work to port 
> it to master, and I trust the devs on that branch to do the right thing.
> On a related note we have not done much with feature branches before so I am 
> not sure what we want to do about merging in the new metrics API branch to 
> 1.x.  I know for me I have not had time to keep up with the development work 
> going on there.  I would at least like to have a pull request put up for 
> review before we merge it in.  This would fit with our current bylaws that do 
> not mention feature branches.  If all of the changes have already followed 
> the review process then technically I think it is OK to just merge it in, but 
> I still would like to take some time to look at the changes, and especially 
> the new APIs.
> 
> - Bobby
> 
> 
> On Wednesday, June 28, 2017, 1:53:34 AM CDT, Jungtaek Lim 
> > wrote:
> 
> That's great news that metrics work is ready!
> 
> I'm +1 to Taylor's proposal, but in order to respect semantic versioning, I
> propose some modifications from Taylor's proposal:
> 
> - create 1.1.x-branch with target version 1.1.1-SNAPSHOT and port back only
> bug fixes to the 1.1.x-branch
> - change the target version of 1.x-branch to 1.2.0-SNAPSHOT
> 
> If we also agree above, I would like to volunteer the back-port work.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2017년 6월 28일 (수) 오전 10:09, Harsha >님이 
> 작성:
> 
> +1 for above stated approach on releasing 1.2.0 with metrics
> -Harsha
> 
> On Tue, Jun 27, 2017, at 12:17 PM, P. Taylor Goetz wrote:
> The work on metrics is ready for a pull request to 1.x-branch from the
> feature branch. I’ve held off because we haven’t reached consensus on a
> path forward with the 1.x release lines .
> 
> I’d like to propose the following for the 1.x line:
> 
> 1. Create a branch for 1.2 so we have a branch to review the metrics
> stuff.
> 2. Release 1.1.1
> 3. Review/merge metrics work. Port metrics to master.
> 4. Release 1.2.0
> 5. Put the entire 1.x line into maintenance mode. Drop support for 1.0.x.
> (we would only support 1.2.x and 1.1.x which are very closely aligned).
> 
> Dropping support for 1.0.x line would eliminate the need to maintain one
> of the fairly heavily diverged branches. The 1.2.x and 1.1.x would be
> very closely aligned. I just up merged metrics_v2 against 1.x-branch
> after a while, and there were no conflicts.
> 
> That would give us a little more bandwidth to focus on 2.0 and needed bug
> fixes to the 1.x line like some of the issues raised with
> storm-kafka-client. We could even start releasing alpha/beta versions of
> 2.0 in parallel to the steps above.
> 
> Any thoughts on that approach?
> 
>

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread srdo

Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124604957
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
+ * @param bootstrapServers The bootstrap servers the consumer will 
use
+ * @param subscription The subscription defining which topics and 
partitions each spout instance will read.
+ */
+public Builder(String bootstrapServers, Subscription subscription) 
{
 kafkaProps = new HashMap<>();
 if (bootstrapServers == null || bootstrapServers.isEmpty()) {
 throw new IllegalArgumentException("bootstrap servers 
cannot be null");
 }
 kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers);
-this.keyDes = keyDes;
-this.keyDesClazz = keyDesClazz;
-this.valueDes = valDes;
-this.valueDesClazz = valDesClazz;
 this.subscription = subscription;
-this.translator = new DefaultRecordTranslator();
-}
-
-private Builder(Builder builder, SerializableDeserializer 
keyDes, Class> keyDesClazz,
-SerializableDeserializer valueDes, Class> valueDesClazz) {
-this.kafkaProps = new HashMap<>(builder.kafkaProps);
-this.subscription = builder.subscription;
-this.pollTimeoutMs = builder.pollTimeoutMs;
-this.offsetCommitPeriodMs = builder.offsetCommitPeriodMs;
-this.firstPollOffsetStrategy = builder.firstPollOffsetStrategy;
-this.maxUncommittedOffsets = builder.maxUncommittedOffsets;
-//this could result in a lot of class case exceptions at 
runtime,
-// but because some translators will

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread P. Taylor Goetz


> On Jun 28, 2017, at 10:48 AM, Bobby Evans  wrote:
> 
> +1.
> If the 1.1 and 1.2 lines start to become difficult to maintain we can look at 
> putting them in maintenance mode too once we have a 2.x release.
> I am a little nervous about merging a new feature into 1.x branch without 
> first going to master, but I hope that it will not be too much work to port 
> it to master, and I trust the devs on that branch to do the right thing.

Porting to master should be easy. It was designed from the start to minimize 
impact to Clojure code to ease porting to master. I wanted to wait to port to 
master until after it’s reviewed and merged so any changes that come from 
review don’t have to be separately ported.


> On a related note we have not done much with feature branches before so I am 
> not sure what we want to do about merging in the new metrics API branch to 
> 1.x.  I know for me I have not had time to keep up with the development work 
> going on there.  I would at least like to have a pull request put up for 
> review before we merge it in.  This would fit with our current bylaws that do 
> not mention feature branches.  If all of the changes have already followed 
> the review process then technically I think it is OK to just merge it in, but 
> I still would like to take some time to look at the changes, and especially 
> the new APIs.

The plan for any of the feature branches has always been the same: relax the 
commit rules for a feature branch, then when ready, issue a pull request from 
the feature branch to the target branch for thorough review with all the commit 
rules in effect. We could certainly codify that in our bylaws, but I think we 
should try the process out a few times before setting things in stone. If I 
remember correctly this is the first time a feature branch has reached the 
point of being ready to be reviewed/merged to a target branch.


> - Bobby

-Taylor

> 
> 
> On Wednesday, June 28, 2017, 1:53:34 AM CDT, Jungtaek Lim  
> wrote:
> 
> That's great news that metrics work is ready!
> 
> I'm +1 to Taylor's proposal, but in order to respect semantic versioning, I
> propose some modifications from Taylor's proposal:
> 
> - create 1.1.x-branch with target version 1.1.1-SNAPSHOT and port back only
> bug fixes to the 1.1.x-branch
> - change the target version of 1.x-branch to 1.2.0-SNAPSHOT
> 
> If we also agree above, I would like to volunteer the back-port work.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2017년 6월 28일 (수) 오전 10:09, Harsha 님이 작성:
> 
>> +1 for above stated approach on releasing 1.2.0 with metrics
>> -Harsha
>> 
>> On Tue, Jun 27, 2017, at 12:17 PM, P. Taylor Goetz wrote:
>>> The work on metrics is ready for a pull request to 1.x-branch from the
>>> feature branch. I’ve held off because we haven’t reached consensus on a
>>> path forward with the 1.x release lines .
>>> 
>>> I’d like to propose the following for the 1.x line:
>>> 
>>> 1. Create a branch for 1.2 so we have a branch to review the metrics
>>> stuff.
>>> 2. Release 1.1.1
>>> 3. Review/merge metrics work. Port metrics to master.
>>> 4. Release 1.2.0
>>> 5. Put the entire 1.x line into maintenance mode. Drop support for 1.0.x.
>>> (we would only support 1.2.x and 1.1.x which are very closely aligned).
>>> 
>>> Dropping support for 1.0.x line would eliminate the need to maintain one
>>> of the fairly heavily diverged branches. The 1.2.x and 1.1.x would be
>>> very closely aligned. I just up merged metrics_v2 against 1.x-branch
>>> after a while, and there were no conflicts.
>>> 
>>> That would give us a little more bandwidth to focus on 2.0 and needed bug
>>> fixes to the 1.x line like some of the issues raised with
>>> storm-kafka-client. We could even start releasing alpha/beta versions of
>>> 2.0 in parallel to the steps above.
>>> 
>>> Any thoughts on that approach?
>>> 
>>> -Taylor
>>> 
>>> 
 On Jun 24, 2017, at 1:21 AM, Jungtaek Lim  wrote:
 
 Yes I prefer option 1, but it might depend on the progress of metrics
>> V2.
 If it can be done within predictable near future I'm OK to pick option
>> 2,
 but if not, we may be better to focus releasing 2.0.0 and make it
>> really
 happen.
 
 Whichever we go, I feel it's time to track remaining work on Storm
>> 2.0.0. I
 found some bugs on master branch so filed issues, and we've remaining
>> port
 work (UI and logviewer). We've some other improvements target for
>> 2.0.0:
 worker redesign, beam integration, and so on, and we don't track its
 progress at all. I don't think we should wait for features which
>> progress
 is not transparent (in other words we don't know when it will be
>> finished).
 
 - Jungtaek Lim (HeartSaVioR)
 
 2017년 6월 24일 (토) 오전 5:19, P. Taylor Goetz 님이 작성:
 
> Bobby/Jungtaek,
> 
> Are you saying you want to forego the 1.2 “metrics_v2” release and

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread srdo

Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124601205
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -79,16 +78,46 @@
 UNCOMMITTED_EARLIEST,
 UNCOMMITTED_LATEST }
 
+/**
+ * Convenience method to get a Builder for String key/value spouts. 
Sets the key/value Deserializers to the StringDeserializer class.
--- End diff --

Sure, will shorten this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Hugo Da Cruz Louro

I still need to go over the entire discussion thread in more detail, but one 
thing I would like to bring up right way is the proposal to DEPRECATE, and 
eventually remove, the KafkaSpout with the old Kafka Consumer APIs. The 
storm-kafka-client KafkaSpout is getting stabilized, and I think we are all in 
agreement that the storm-kafka KafkaSpout has presented continuous 
maintainability problems with some fixes that got in not being backwards 
compatible.

I am pretty confident how things are looking at this point for the KafkaSpout. 
The Trident Kafka Spout is likely in between alpha and beta, and that should be 
taken into account. I just recently submitted a 
PR with some improvements to the 
Trident Kafka Spout (including the refactoring done to support manual partition 
assignment), and there are some customers using it in pre-production. However, 
it definitely would benefit from some more testing.

Thanks,
Hugo

On Jun 28, 2017, at 7:48 AM, Bobby Evans 
> wrote:

+1.
If the 1.1 and 1.2 lines start to become difficult to maintain we can look at 
putting them in maintenance mode too once we have a 2.x release.
I am a little nervous about merging a new feature into 1.x branch without first 
going to master, but I hope that it will not be too much work to port it to 
master, and I trust the devs on that branch to do the right thing.
On a related note we have not done much with feature branches before so I am 
not sure what we want to do about merging in the new metrics API branch to 1.x. 
 I know for me I have not had time to keep up with the development work going 
on there.  I would at least like to have a pull request put up for review 
before we merge it in.  This would fit with our current bylaws that do not 
mention feature branches.  If all of the changes have already followed the 
review process then technically I think it is OK to just merge it in, but I 
still would like to take some time to look at the changes, and especially the 
new APIs.

- Bobby

On Wednesday, June 28, 2017, 1:53:34 AM CDT, Jungtaek Lim 
> wrote:

That's great news that metrics work is ready!

I'm +1 to Taylor's proposal, but in order to respect semantic versioning, I
propose some modifications from Taylor's proposal:

- create 1.1.x-branch with target version 1.1.1-SNAPSHOT and port back only
bug fixes to the 1.1.x-branch
- change the target version of 1.x-branch to 1.2.0-SNAPSHOT

If we also agree above, I would like to volunteer the back-port work.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 6월 28일 (수) 오전 10:09, Harsha >님이 
작성:

+1 for above stated approach on releasing 1.2.0 with metrics
-Harsha

On Tue, Jun 27, 2017, at 12:17 PM, P. Taylor Goetz wrote:
The work on metrics is ready for a pull request to 1.x-branch from the
feature branch. I’ve held off because we haven’t reached consensus on a
path forward with the 1.x release lines .

I’d like to propose the following for the 1.x line:

1. Create a branch for 1.2 so we have a branch to review the metrics
stuff.
2. Release 1.1.1
3. Review/merge metrics work. Port metrics to master.
4. Release 1.2.0
5. Put the entire 1.x line into maintenance mode. Drop support for 1.0.x.
(we would only support 1.2.x and 1.1.x which are very closely aligned).

Dropping support for 1.0.x line would eliminate the need to maintain one
of the fairly heavily diverged branches. The 1.2.x and 1.1.x would be
very closely aligned. I just up merged metrics_v2 against 1.x-branch
after a while, and there were no conflicts.

That would give us a little more bandwidth to focus on 2.0 and needed bug
fixes to the 1.x line like some of the issues raised with
storm-kafka-client. We could even start releasing alpha/beta versions of
2.0 in parallel to the steps above.

Any thoughts on that approach?

-Taylor

On Jun 24, 2017, at 1:21 AM, Jungtaek Lim 
> wrote:

Yes I prefer option 1, but it might depend on the progress of metrics
V2.
If it can be done within predictable near future I'm OK to pick option
2,
but if not, we may be better to focus releasing 2.0.0 and make it
really
happen.

Whichever we go, I feel it's time to track remaining work on Storm
2.0.0. I
found some bugs on master branch so filed issues, and we've remaining
port
work (UI and logviewer). We've some other improvements target for
2.0.0:
worker redesign, beam integration, and so on, and we don't track its
progress at all. I don't think we should wait for features which
progress
is not transparent (in other words we don't know when it will be
finished).

- Jungtaek Lim (HeartSaVioR)

2017년 6월 24일 (토) 오전 5:19, P. Taylor Goetz 
>님이 작성:

Bobby/Jungtaek,

Are you saying you want to forego the 1.2 “metrics_v2” release and
include
it only in 2.0? (I ask because that

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread srdo

Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124600901
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
+ * @param bootstrapServers The bootstrap servers the consumer will 
use
+ * @param subscription The subscription defining which topics and 
partitions each spout instance will read.
+ */
+public Builder(String bootstrapServers, Subscription subscription) 
{
 kafkaProps = new HashMap<>();
 if (bootstrapServers == null || bootstrapServers.isEmpty()) {
 throw new IllegalArgumentException("bootstrap servers 
cannot be null");
 }
 kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers);
-this.keyDes = keyDes;
-this.keyDesClazz = keyDesClazz;
-this.valueDes = valDes;
-this.valueDesClazz = valDesClazz;
 this.subscription = subscription;
-this.translator = new DefaultRecordTranslator();
-}
-
-private Builder(Builder builder, SerializableDeserializer 
keyDes, Class> keyDesClazz,
-SerializableDeserializer valueDes, Class> valueDesClazz) {
-this.kafkaProps = new HashMap<>(builder.kafkaProps);
-this.subscription = builder.subscription;
-this.pollTimeoutMs = builder.pollTimeoutMs;
-this.offsetCommitPeriodMs = builder.offsetCommitPeriodMs;
-this.firstPollOffsetStrategy = builder.firstPollOffsetStrategy;
-this.maxUncommittedOffsets = builder.maxUncommittedOffsets;
-//this could result in a lot of class case exceptions at 
runtime,
-// but because some translators will

Re: What is the difference between UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST ?

2017-06-28 Thread Hugo Da Cruz Louro

Your KafkaConsumer instance (which boils down to your KafkaSpout) can be in one 
of two states:

1 - Has committed to Kafka
Here, EARLIEST fetches from the first offset. LATEST fetches from the last 
offset. UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST will fetch from the last 
committed offset - the  _earliest and _latest are “kinda” disregarded.

2 - Has never committed to Kafka
EARLIEST is the same as UNCOMMITTED_EARLIEST
LATEST is the same as UNCOMMITTED_LATEST

What this means is that since this KafkaSpout (Kafka Consumer Instance) has 
never committed to Kafka, when you do the first poll, from which offset do you 
start. We give the option to start from the beginning (EARLIEST) or from the 
end (LATEST).

Best,
Hugo


> On Jun 28, 2017, at 9:47 AM, Stig Døssing  wrote:
> 
> No, the description is accurate.
> 
> EARLIEST and LATEST are for unconditionally starting at the beginning or
> end of the subscribed partitions. So if you configure a spout to use either
> of these, it will start at the earliest or latest offset on each partition
> every time you start it. Example: Say the spout is set to EARLIEST and
> we've just deployed it for the first time. The spout seeks to the earliest
> offset (let's say 0) and emits offsets 0-100, and commits them (marking
> them as "done" in Kafka for the spout's consumer group, this happens when
> you ack the tuples emitted by the spout). The spout then crashes for some
> reason, or you redeploy the topology. The spout will pick up at offset 0
> when it restarts, because it is configured to always start at the beginning
> of the partition.
> 
> UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST act exactly like the other two
> modes if the spout hasn't committed anything. If the spout has committed
> some offsets, the restarted spout will pick up where it left off, at the
> last committed offset. Example: Say the spout is set to
> UNCOMMITTED_EARLIEST and we've just deployed it for the first time. The
> spout seeks to the earliest offset because it hasn't previously committed
> anything, so it starts at offset 0 and emits offsets 0-100 and commits them
> once the tuples are acked. The spout crashes. The restarted spout will pick
> up at offset 100, because that was the last committed offset before it
> crashed.
> 
> I hope this helps.
> 
> 2017-06-28 8:40 GMT+02:00 Zhechao Ma :
> 
>> The storm-kafka-client document explains these two values just almost the
>> same except the last word.
>> 
>> https://github.com/apache/storm/blob/master/docs/storm-kafka-client.md
>> 
>>   - UNCOMMITTED_EARLIEST (DEFAULT) means that the kafka spout polls
>>   records from the last committed offset, if any. If no offset has been
>>   committed, it behaves as EARLIEST.
>>   - UNCOMMITTED_LATEST means that the kafka spout polls records from the
>>   last committed offset, if any. If no offset has been committed, it
>> behaves
>>   as LATEST.
>> 
>> Or is that a mistake?
>> 
>> --
>> Thanks
>> Zhechao Ma
>>

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread hmcl

Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124580202
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
--- End diff --

Create a KafkaSpoutConfig builder with default property values. Properties 
can be overridden with the respective builder set methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread hmcl

Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124586447
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
+ * @param bootstrapServers The bootstrap servers the consumer will 
use
+ * @param subscription The subscription defining which topics and 
partitions each spout instance will read.
+ */
+public Builder(String bootstrapServers, Subscription subscription) 
{
 kafkaProps = new HashMap<>();
 if (bootstrapServers == null || bootstrapServers.isEmpty()) {
 throw new IllegalArgumentException("bootstrap servers 
cannot be null");
 }
 kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers);
-this.keyDes = keyDes;
-this.keyDesClazz = keyDesClazz;
-this.valueDes = valDes;
-this.valueDesClazz = valDesClazz;
 this.subscription = subscription;
-this.translator = new DefaultRecordTranslator();
-}
-
-private Builder(Builder builder, SerializableDeserializer 
keyDes, Class> keyDesClazz,
-SerializableDeserializer valueDes, Class> valueDesClazz) {
-this.kafkaProps = new HashMap<>(builder.kafkaProps);
-this.subscription = builder.subscription;
-this.pollTimeoutMs = builder.pollTimeoutMs;
-this.offsetCommitPeriodMs = builder.offsetCommitPeriodMs;
-this.firstPollOffsetStrategy = builder.firstPollOffsetStrategy;
-this.maxUncommittedOffsets = builder.maxUncommittedOffsets;
-//this could result in a lot of class case exceptions at 
runtime,
-// but because some translators will

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread hmcl

Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124586375
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
+ * @param bootstrapServers The bootstrap servers the consumer will 
use
+ * @param subscription The subscription defining which topics and 
partitions each spout instance will read.
+ */
+public Builder(String bootstrapServers, Subscription subscription) 
{
 kafkaProps = new HashMap<>();
 if (bootstrapServers == null || bootstrapServers.isEmpty()) {
 throw new IllegalArgumentException("bootstrap servers 
cannot be null");
 }
 kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers);
-this.keyDes = keyDes;
-this.keyDesClazz = keyDesClazz;
-this.valueDes = valDes;
-this.valueDesClazz = valDesClazz;
 this.subscription = subscription;
-this.translator = new DefaultRecordTranslator();
-}
-
-private Builder(Builder builder, SerializableDeserializer 
keyDes, Class> keyDesClazz,
-SerializableDeserializer valueDes, Class> valueDesClazz) {
-this.kafkaProps = new HashMap<>(builder.kafkaProps);
-this.subscription = builder.subscription;
-this.pollTimeoutMs = builder.pollTimeoutMs;
-this.offsetCommitPeriodMs = builder.offsetCommitPeriodMs;
-this.firstPollOffsetStrategy = builder.firstPollOffsetStrategy;
-this.maxUncommittedOffsets = builder.maxUncommittedOffsets;
-//this could result in a lot of class case exceptions at 
runtime,
-// but because some translators will

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread hmcl

Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124580583
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
+ * @param bootstrapServers The bootstrap servers the consumer will 
use
+ * @param subscription The subscription defining which topics and 
partitions each spout instance will read.
+ */
+public Builder(String bootstrapServers, Subscription subscription) 
{
 kafkaProps = new HashMap<>();
 if (bootstrapServers == null || bootstrapServers.isEmpty()) {
 throw new IllegalArgumentException("bootstrap servers 
cannot be null");
 }
 kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers);
-this.keyDes = keyDes;
-this.keyDesClazz = keyDesClazz;
-this.valueDes = valDes;
-this.valueDesClazz = valDesClazz;
 this.subscription = subscription;
-this.translator = new DefaultRecordTranslator();
-}
-
-private Builder(Builder builder, SerializableDeserializer 
keyDes, Class> keyDesClazz,
-SerializableDeserializer valueDes, Class> valueDesClazz) {
-this.kafkaProps = new HashMap<>(builder.kafkaProps);
-this.subscription = builder.subscription;
-this.pollTimeoutMs = builder.pollTimeoutMs;
-this.offsetCommitPeriodMs = builder.offsetCommitPeriodMs;
-this.firstPollOffsetStrategy = builder.firstPollOffsetStrategy;
-this.maxUncommittedOffsets = builder.maxUncommittedOffsets;
-//this could result in a lot of class case exceptions at 
runtime,
-// but because some translators will

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread hmcl

Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124572749
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -79,16 +78,46 @@
 UNCOMMITTED_EARLIEST,
 UNCOMMITTED_LATEST }
 
+/**
+ * Convenience method to get a Builder for String key/value spouts. 
Sets the key/value Deserializers to the StringDeserializer class.
--- End diff --

NIT: I would suggest a more succinct javadoc, e.g. "Factory method that 
creates a Builder that with key/val String deserializers.". In my opinion the 
text mentioning how to use builders for other ser/des should go in the README.

Same comment would apply to other Builder factory methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2155: STORM-2548: Simplify KafkaSpoutConfig to avoid dup...

2017-06-28 Thread hmcl

Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2155#discussion_r124594357
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -116,217 +141,57 @@
 private boolean emitNullTuples = false;
 
 public Builder(String bootstrapServers, String ... topics) {
-this(bootstrapServers, (SerializableDeserializer) null, 
(SerializableDeserializer) null, new NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes, SerializableDeserializer valDes, String 
... topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Collection topics) 
{
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-SerializableDeserializer valDes, Subscription subscription) 
{
-this(bootstrapServers, keyDes, null, valDes, null, 
subscription);
+public Builder(String bootstrapServers, Collection topics) 
{
+this(bootstrapServers, new NamedSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, String ... 
topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
+public Builder(String bootstrapServers, Pattern topics) {
+this(bootstrapServers, new PatternSubscription(topics));
 }
 
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, 
Collection topics) {
-this(bootstrapServers, keyDes, valDes, new 
NamedSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Pattern topics) {
-this(bootstrapServers, keyDes, valDes, new 
PatternSubscription(topics));
-}
-
-public Builder(String bootstrapServers, Class> keyDes,
-Class> valDes, Subscription 
subscription) {
-this(bootstrapServers, null, keyDes, null, valDes, 
subscription);
-}
-
-private Builder(String bootstrapServers, 
SerializableDeserializer keyDes,
-Class> keyDesClazz,
-SerializableDeserializer valDes, Class> valDesClazz, Subscription subscription) {
+/**
+ * Create a KafkaSpoutConfig builder.
+ * @param bootstrapServers The bootstrap servers the consumer will 
use
+ * @param subscription The subscription defining which topics and 
partitions each spout instance will read.
+ */
+public Builder(String bootstrapServers, Subscription subscription) 
{
 kafkaProps = new HashMap<>();
 if (bootstrapServers == null || bootstrapServers.isEmpty()) {
 throw new IllegalArgumentException("bootstrap servers 
cannot be null");
 }
 kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers);
-this.keyDes = keyDes;
-this.keyDesClazz = keyDesClazz;
-this.valueDes = valDes;
-this.valueDesClazz = valDesClazz;
 this.subscription = subscription;
-this.translator = new DefaultRecordTranslator();
-}
-
-private Builder(Builder builder, SerializableDeserializer 
keyDes, Class> keyDesClazz,
-SerializableDeserializer valueDes, Class> valueDesClazz) {
-this.kafkaProps = new HashMap<>(builder.kafkaProps);
-this.subscription = builder.subscription;
-this.pollTimeoutMs = builder.pollTimeoutMs;
-this.offsetCommitPeriodMs = builder.offsetCommitPeriodMs;
-this.firstPollOffsetStrategy = builder.firstPollOffsetStrategy;
-this.maxUncommittedOffsets = builder.maxUncommittedOffsets;
-//this could result in a lot of class case exceptions at 
runtime,
-// but because some translators will

Re: What is the difference between UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST ?

2017-06-28 Thread Stig Døssing

No, the description is accurate.

EARLIEST and LATEST are for unconditionally starting at the beginning or
end of the subscribed partitions. So if you configure a spout to use either
of these, it will start at the earliest or latest offset on each partition
every time you start it. Example: Say the spout is set to EARLIEST and
we've just deployed it for the first time. The spout seeks to the earliest
offset (let's say 0) and emits offsets 0-100, and commits them (marking
them as "done" in Kafka for the spout's consumer group, this happens when
you ack the tuples emitted by the spout). The spout then crashes for some
reason, or you redeploy the topology. The spout will pick up at offset 0
when it restarts, because it is configured to always start at the beginning
of the partition.

UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST act exactly like the other two
modes if the spout hasn't committed anything. If the spout has committed
some offsets, the restarted spout will pick up where it left off, at the
last committed offset. Example: Say the spout is set to
UNCOMMITTED_EARLIEST and we've just deployed it for the first time. The
spout seeks to the earliest offset because it hasn't previously committed
anything, so it starts at offset 0 and emits offsets 0-100 and commits them
once the tuples are acked. The spout crashes. The restarted spout will pick
up at offset 100, because that was the last committed offset before it
crashed.

I hope this helps.

2017-06-28 8:40 GMT+02:00 Zhechao Ma :

> The storm-kafka-client document explains these two values just almost the
> same except the last word.
>
> https://github.com/apache/storm/blob/master/docs/storm-kafka-client.md
>
>- UNCOMMITTED_EARLIEST (DEFAULT) means that the kafka spout polls
>records from the last committed offset, if any. If no offset has been
>committed, it behaves as EARLIEST.
>- UNCOMMITTED_LATEST means that the kafka spout polls records from the
>last committed offset, if any. If no offset has been committed, it
> behaves
>as LATEST.
>
> Or is that a mistake?
>
> --
> Thanks
> Zhechao Ma
>

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Bobby Evans

+1.
If the 1.1 and 1.2 lines start to become difficult to maintain we can look at 
putting them in maintenance mode too once we have a 2.x release.
I am a little nervous about merging a new feature into 1.x branch without first 
going to master, but I hope that it will not be too much work to port it to 
master, and I trust the devs on that branch to do the right thing.
On a related note we have not done much with feature branches before so I am 
not sure what we want to do about merging in the new metrics API branch to 1.x. 
 I know for me I have not had time to keep up with the development work going 
on there.  I would at least like to have a pull request put up for review 
before we merge it in.  This would fit with our current bylaws that do not 
mention feature branches.  If all of the changes have already followed the 
review process then technically I think it is OK to just merge it in, but I 
still would like to take some time to look at the changes, and especially the 
new APIs.

- Bobby


On Wednesday, June 28, 2017, 1:53:34 AM CDT, Jungtaek Lim  
wrote:

That's great news that metrics work is ready!

I'm +1 to Taylor's proposal, but in order to respect semantic versioning, I
propose some modifications from Taylor's proposal:

- create 1.1.x-branch with target version 1.1.1-SNAPSHOT and port back only
bug fixes to the 1.1.x-branch
- change the target version of 1.x-branch to 1.2.0-SNAPSHOT

If we also agree above, I would like to volunteer the back-port work.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 6월 28일 (수) 오전 10:09, Harsha 님이 작성:

> +1 for above stated approach on releasing 1.2.0 with metrics
> -Harsha
>
> On Tue, Jun 27, 2017, at 12:17 PM, P. Taylor Goetz wrote:
> > The work on metrics is ready for a pull request to 1.x-branch from the
> > feature branch. I’ve held off because we haven’t reached consensus on a
> > path forward with the 1.x release lines .
> >
> > I’d like to propose the following for the 1.x line:
> >
> > 1. Create a branch for 1.2 so we have a branch to review the metrics
> > stuff.
> > 2. Release 1.1.1
> > 3. Review/merge metrics work. Port metrics to master.
> > 4. Release 1.2.0
> > 5. Put the entire 1.x line into maintenance mode. Drop support for 1.0.x.
> > (we would only support 1.2.x and 1.1.x which are very closely aligned).
> >
> > Dropping support for 1.0.x line would eliminate the need to maintain one
> > of the fairly heavily diverged branches. The 1.2.x and 1.1.x would be
> > very closely aligned. I just up merged metrics_v2 against 1.x-branch
> > after a while, and there were no conflicts.
> >
> > That would give us a little more bandwidth to focus on 2.0 and needed bug
> > fixes to the 1.x line like some of the issues raised with
> > storm-kafka-client. We could even start releasing alpha/beta versions of
> > 2.0 in parallel to the steps above.
> >
> > Any thoughts on that approach?
> >
> > -Taylor
> >
> >
> > > On Jun 24, 2017, at 1:21 AM, Jungtaek Lim  wrote:
> > >
> > > Yes I prefer option 1, but it might depend on the progress of metrics
> V2.
> > > If it can be done within predictable near future I'm OK to pick option
> 2,
> > > but if not, we may be better to focus releasing 2.0.0 and make it
> really
> > > happen.
> > >
> > > Whichever we go, I feel it's time to track remaining work on Storm
> 2.0.0. I
> > > found some bugs on master branch so filed issues, and we've remaining
> port
> > > work (UI and logviewer). We've some other improvements target for
> 2.0.0:
> > > worker redesign, beam integration, and so on, and we don't track its
> > > progress at all. I don't think we should wait for features which
> progress
> > > is not transparent (in other words we don't know when it will be
> finished).
> > >
> > > - Jungtaek Lim (HeartSaVioR)
> > >
> > > 2017년 6월 24일 (토) 오전 5:19, P. Taylor Goetz 님이 작성:
> > >
> > >> Bobby/Jungtaek,
> > >>
> > >> Are you saying you want to forego the 1.2 “metrics_v2” release and
> include
> > >> it only in 2.0? (I ask because that work is already based on
> 1.x-branch,
> > >> and forward-porting it to master is relatively simple.) I’d kind of
> like
> > >> that work go out soon.
> > >>
> > >> If we go with option 1, I would want to see a 2.0 release (even if
> it’s a
> > >> “beta” or “preview) before putting the 1.x line into maintenance mode.
> > >>
> > >> -Taylor
> > >>
> > >>> On Jun 23, 2017, at 9:51 AM, Bobby Evans  >
> > >> wrote:
> > >>>
> > >>> I see 2 ways to address this.
> > >>> 1) We put the 1.x line into maintenance mode like with 0.10.  We
> don't
> > >> backport anything except bug fixes.2) We backport a lot of the
> backwards
> > >> compatible changes from 2.x to 1.x.
> > >>> My personal preference is 1.  It makes it clear the direction we
> want to
> > >> go in.  The biggest issue is that we probably would want to do a 2.x
> > >> release sooner rather then later.  Even if we don't get all of the
>

[GitHub] storm pull request #:

2017-06-28 Thread markthegrea

Github user markthegrea commented on the pull request:


https://github.com/apache/storm/commit/ca17c4ff10231a5d93deb3d4ac934140ccec674d#commitcomment-22810673
  
How is this enabled?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1970: STORM-2383 Support HBase as state backend (1.x)

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1970
  
I updated this PR to apply the last change of STORM-2369. I'll simply 
rebase again after STORM-2369 is merged. And will also craft the PR for master 
branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Lag issues using Storm 1.1.1 latest build with StormKafkaClient 1.1.1 vs old StormKafka spouts

2017-06-28 Thread Alexandre Vermeerbergen

More about this thread: we noticed that with StormKafkaClient 1.1.x latest,
we get OutOfMemoryError after ~2hours of running our simple test topology.

We reproduce it everytime, so we decided to generate a heap dump before the
OutOfMemoryError, and viewed the result using EclipseMAT.

The results tends to show that there's a memory leak in KafkaSpoutClient:
=

One instance of *"org.apache.storm.kafka.spout.KafkaSpout"* loaded by
*"sun.misc.Launcher$AppClassLoader
@ 0x80023d98"* occupies *1,662,509,664 (93.87%)* bytes. The memory is
accumulated in one instance of *"java.util.HashMap$Node[]"* loaded by *""*.

*Keywords*
sun.misc.Launcher$AppClassLoader @ 0x80023d98
java.util.HashMap$Node[]
org.apache.storm.kafka.spout.KafkaSpout

=

See attached screenshots of EclipseMAT session showing graphical
representation of memory usage

FYI we tried to follow instructions from
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_storm-component-guide/content/storm-kafkaspout-perf.html
to avoid the use of too much memory, but still after 2 hours the memory
fills up and the process hosting our spout is killed by Supervisor...

Any clue of what we may have missed?

Best regards,
Alexandre Vermeerbergen



2017-06-28 9:17 GMT+02:00 Alexandre Vermeerbergen 
:

> Oops, sent my last mail too fast, let me continue it:
>
> Hello,
>
> Coming back to my original post in this list, we have 3 issues with latest
> 1.1.x StormKafkaClient spout with our setup:
>
> Issue#1:
>  Initial lag (which we hadn't using the classic Storm Kafka spout)
>For this issue, my understanding of Kristopher's answer is that this is
> "by design" of the StormKafkaClient spout, which instances progressively
> joins Kafka consumers group, which causes consumers rebalancing. This
> rebalancing is "slow", which means that until all spout instances are
> started, the topology starts with an "initial Kafka Lag"
>=> Is my understanding correct?
>=> Why don't we have such behavior with the old Storm Kafka spout ?
>=> Is this annoying initial lag tracked by a JIRA ?
>
> Issue#2:
> The kafka Lag is increasing constantly and this leads to the overload
> of the storm worker running the kafka spout. At the end, the worker crashes
> and it is automatically restarted by Storm.
> => This is unlike what we observe with the old Storm Kafka spout
> => What is the recommended way to analyze this issue?
>
> Issue3:
>   With the new Kafka Spout, we have faced this exception many times:
>
>  org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
> be completed since the group has already rebalanced and assigned the
> partitions to another member. This means that the time between subsequent
> calls to poll() was longer than the configured max.poll.interval.ms,
> which typically implies that the poll loop is spending too much time
> message processing. You can address this either by increasing the session
> timeout or by reducing the maximum size of batches returned in poll() with
> max.poll.records. at org.apache.kafka.clients.consu
> mer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:702)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
> tor.commitOffsetsSync(ConsumerCoordinator.java:581) at
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1124)
> at 
> org.apache.storm.kafka.spout.KafkaSpout.commitOffsetsForAckedTuples(KafkaSpout.java:384)
> at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:220)
> at 
> org.apache.storm.daemon.executor$fn__10780$fn__10795$fn__10826.invoke(executor.clj:646)
> at org.apache.storm.util$async_loop$fn__555.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:748)
>
>
>   => Are we the only ones experiencing such issues with Storm 1.1.0/1.1.x
> latest ?
>
> Note: We are considering writing our own Kafka Spout, as we're time-bound
> to move to Kafka 0.10.x consumers & producers (to prepare our next step
> with Kafka security, which isn't available with Kafka 0.9.x). We will miss
> the integration of Kafka lag in StormUI, but currently we do not understand
> how to solve the regressions we observe with latest Storm Kafka client
> spout.
>
> Are there other Storm developers/users who jumped into this alternative?
>
> Best regards,
>
> Alexandre Vermeerbergen
>
>
>
>
> 2017-06-28 9:09 GMT+02:00 Alexandre Vermeerbergen <
> avermeerber...@gmail.com>:
>
>> Hello,
>>
>> Coming back to my original post in this list, we have two issues with
>> latest 1.1.x StormKafkaClient spout with our setup:
>>
>> Issue#1:
>>  Initial lag (which we hadn't using the classic Storm Kafka spout)
>>For this issue, my understanding of Kristopher's answer is that this
>> is "by design" of the StormKafkaClient spout, which

[GitHub] storm issue #1950: STORM-2369 [storm-redis] Use binary type for State manage...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1950
  
@arunmahadevan Ah, please review #2172 as well when you revisit this. 
Thanks in advance!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2172: STORM-2369 [storm-redis] Use binary type for State manage...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2172
  
Just update the PR to make equivalent to #1950 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: java.io.IOException: Connection reset by peer in DRPC

2017-06-28 Thread sam mohel

Is there any help.please ?

On Wednesday, June 28, 2017, sam mohel  wrote:
> I submitted two topologies in production mode . First one has a data set
with size 215 MB and worked well  and gave me the results . Second topology
has a data set with size 170 MB with same configurations but stopped worked
after some times and didn't complete its result
> The error i got is drpc log file
> TNonblockingServer [WARN] Got an IOException in internalRead!
> java.io.IOException: Connection reset by peer
> I couldn't figure where is the problem as it supposed to work well as
second data set is smaller in size

[GitHub] storm issue #1950: STORM-2369 [storm-redis] Use binary type for State manage...

2017-06-28 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1950
  
@arunmahadevan Thanks I've addressed latest review comment and also 
squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1950: STORM-2369 [storm-redis] Use binary type for State...

2017-06-28 Thread arunmahadevan

Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/1950#discussion_r124484454
  
--- Diff: 
external/storm-redis/src/main/java/org/apache/storm/redis/state/RedisKeyValueState.java
 ---
@@ -316,10 +340,18 @@ private Long lastPreparedTxid() {
 
 private Long lastId(String key) {
 Long lastId = null;
-String str = txIds.get(key);
-if (str != null) {
-lastId = Long.valueOf(str);
+String txId = txIds.get(key);
+if (txId != null) {
+lastId = Long.valueOf(txId);
 }
 return lastId;
 }
+
+private ConcurrentNavigableMap 
createPendingPrepareMap() {
+return new 
ConcurrentSkipListMap<>(UnsignedBytes.lexicographicalComparator());
+}
+
+private NavigableMap createPendingCommitMap() {
--- End diff --

Seems this is not used. If so can be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Lag issues using Storm 1.1.1 latest build with StormKafkaClient 1.1.1 vs old StormKafka spouts

2017-06-28 Thread Alexandre Vermeerbergen

Oops, sent my last mail too fast, let me continue it:

Hello,

Coming back to my original post in this list, we have 3 issues with latest
1.1.x StormKafkaClient spout with our setup:

Issue#1:
 Initial lag (which we hadn't using the classic Storm Kafka spout)
   For this issue, my understanding of Kristopher's answer is that this is
"by design" of the StormKafkaClient spout, which instances progressively
joins Kafka consumers group, which causes consumers rebalancing. This
rebalancing is "slow", which means that until all spout instances are
started, the topology starts with an "initial Kafka Lag"
   => Is my understanding correct?
   => Why don't we have such behavior with the old Storm Kafka spout ?
   => Is this annoying initial lag tracked by a JIRA ?

Issue#2:
The kafka Lag is increasing constantly and this leads to the overload
of the storm worker running the kafka spout. At the end, the worker crashes
and it is automatically restarted by Storm.
=> This is unlike what we observe with the old Storm Kafka spout
=> What is the recommended way to analyze this issue?

Issue3:
  With the new Kafka Spout, we have faced this exception many times:

 org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed since the group has already rebalanced and assigned the
partitions to another member. This means that the time between subsequent
calls to poll() was longer than the configured max.poll.interval.ms, which
typically implies that the poll loop is spending too much time message
processing. You can address this either by increasing the session timeout
or by reducing the maximum size of batches returned in poll() with
max.poll.records. at org.apache.kafka.clients.consumer.internals.
ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:702)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
commitOffsetsSync(ConsumerCoordinator.java:581) at org.apache.kafka.clients.
consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1124) at
org.apache.storm.kafka.spout.KafkaSpout.commitOffsetsForAckedTuples(KafkaSpout.java:384)
at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:220)
at org.apache.storm.daemon.executor$fn__10780$fn__10795$
fn__10826.invoke(executor.clj:646) at org.apache.storm.util$async_
loop$fn__555.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at
java.lang.Thread.run(Thread.java:748)

  => Are we the only ones experiencing such issues with Storm 1.1.0/1.1.x
latest ?

Note: We are considering writing our own Kafka Spout, as we're time-bound
to move to Kafka 0.10.x consumers & producers (to prepare our next step
with Kafka security, which isn't available with Kafka 0.9.x). We will miss
the integration of Kafka lag in StormUI, but currently we do not understand
how to solve the regressions we observe with latest Storm Kafka client
spout.

Are there other Storm developers/users who jumped into this alternative?

Best regards,

Alexandre Vermeerbergen

2017-06-28 9:09 GMT+02:00 Alexandre Vermeerbergen 
:

> Hello,
>
> Coming back to my original post in this list, we have two issues with
> latest 1.1.x StormKafkaClient spout with our setup:
>
> Issue#1:
>  Initial lag (which we hadn't using the classic Storm Kafka spout)
>For this issue, my understanding of Kristopher's answer is that this is
> "by design" of the StormKafkaClient spout, which instances progressively
> joins Kafka consumers group, which causes consumers rebalancing. This
> rebalancing is "slow", which means that until all spout instances are
> started, the topology starts with an "initial Kafka Lag"
>=> Is my understanding correct?
>=> Why don't we have such behavior with the old Storm Kafka spout ?
>=> Is this annoying initial lag tracked by a JIRA ?
>
>
>
> 2017-06-27 17:15 GMT+02:00 Alexandre Vermeerbergen <
> avermeerber...@gmail.com>:
>
>> Hello Kristopher,
>>
>> We built Storm 1.1.1-latest using yesterday's (2017-06-26)  artifacts
>> downloaded from https://github.com/apache/storm/tree/1.x-branch.
>> 
>>
>> Is your latest PR supposed to be in what we downloaded & built, or do we
>> need to upgrade in some way (which?)
>>
>> Should we change anything to our settings?
>>
>> Please note that I mistakenly wrote that our "Kafka consumer strategy is
>> set to EARLY" whereas it's "Kafka consumer strategy is set to LATEST", if
>> that matters.
>>
>> Best regards,
>> Alexandre Vermeerbergen
>>
>> 2017-06-27 16:37 GMT+02:00 Kristopher Kane :
>>
>>> Correction: https://github.com/apache/storm/pull/2174 has all of what I
>>> was
>>> doing and more.
>>>
>>> On Tue, Jun 27, 2017 at 9:33 AM, Kristopher Kane 
>>> wrote:
>>>
>>> > Alexandre,
>>> >
>>> > There are quite a few JIRAs and discussions around this recently.  The
>>> > default behavior for storm-kafka-client is the 'subscribe' API which
>>> causes
>>> > the

Re: Lag issues using Storm 1.1.1 latest build with StormKafkaClient 1.1.1 vs old StormKafka spouts

2017-06-28 Thread Alexandre Vermeerbergen

Hello,

Coming back to my original post in this list, we have two issues with
latest 1.1.x StormKafkaClient spout with our setup:

Issue#1:
 Initial lag (which we hadn't using the classic Storm Kafka spout)
   For this issue, my understanding of Kristopher's answer is that this is
"by design" of the StormKafkaClient spout, which instances progressively
joins Kafka consumers group, which causes consumers rebalancing. This
rebalancing is "slow", which means that until all spout instances are
started, the topology starts with an "initial Kafka Lag"
   => Is my understanding correct?
   => Why don't we have such behavior with the old Storm Kafka spout ?
   => Is this annoying initial lag tracked by a JIRA ?



2017-06-27 17:15 GMT+02:00 Alexandre Vermeerbergen :

> Hello Kristopher,
>
> We built Storm 1.1.1-latest using yesterday's (2017-06-26)  artifacts
> downloaded from https://github.com/apache/storm/tree/1.x-branch.
> 
>
> Is your latest PR supposed to be in what we downloaded & built, or do we
> need to upgrade in some way (which?)
>
> Should we change anything to our settings?
>
> Please note that I mistakenly wrote that our "Kafka consumer strategy is
> set to EARLY" whereas it's "Kafka consumer strategy is set to LATEST", if
> that matters.
>
> Best regards,
> Alexandre Vermeerbergen
>
> 2017-06-27 16:37 GMT+02:00 Kristopher Kane :
>
>> Correction: https://github.com/apache/storm/pull/2174 has all of what I
>> was
>> doing and more.
>>
>> On Tue, Jun 27, 2017 at 9:33 AM, Kristopher Kane 
>> wrote:
>>
>> > Alexandre,
>> >
>> > There are quite a few JIRAs and discussions around this recently.  The
>> > default behavior for storm-kafka-client is the 'subscribe' API which
>> causes
>> > the immediate lag you see since rebalance will happen spout(n)-1 times
>> > just from the spouts spinning up.
>> >
>> > There is a Builder for ManualPartitionNamedSubscription and the
>> > RoundRobinManualPartitioner (which use the 'assign' Kafka consumer API)
>> but
>> > they don't work at all.  I hope to have a PR in today
>> > to fix these on 1.x-branch
>> >
>> > The other JIRAs I mentioned are for a redesign of this spout or other
>> more
>> > drastic changes.  My goal is a bug fix for a version of the spout that
>> > doesn't provide unnecessary duplicates.
>> >
>> > Kris
>> >
>> > On Tue, Jun 27, 2017 at 8:00 AM, Alexandre Vermeerbergen <
>> > avermeerber...@gmail.com> wrote:
>> >
>> >> Hello All,
>> >>
>> >> We have been running for a while our real-time supervision application
>> >> based on Apache Storm 1.0.3 with Storm Kafka Spouts (old consumer:
>> >> storm-kafka) and with our Kafka Broker cluster based on Apache Kafka
>> >> 0.10.1.0.
>> >>
>> >> Backpressure is activated with default parameters.
>> >>
>> >>  Key
>> >> Value
>> >>
>> >> backpressure.disruptor.high.watermark  0.9
>> >>
>> >> backpressure.disruptor.low.watermark   0.4
>> >>
>> >> task.backpressure.poll.secs 30
>> >>
>> >> topology.backpressure.enable   true
>> >>
>> >>
>> >>
>> >> We decided to upgrade to Apache Storm 1.1.0 to benefit from the new
>> Kafka
>> >> Spout  (storm-kafka-client lib) with a consumer which has no more
>> >> dependency on Zookeeper.
>> >>
>> >> After upgrade, we had several issues with kafka consumption.
>> >>
>> >>
>> >> We saw that several JIRAs were opened and resolved on Apache Storm
>> 1.1.1.
>> >>
>> >> So we decided to upgrade to the latest available Apache Storm 1.1.x
>> code
>> >> built from source (2017-06-26) but  we still have issues :
>> >>
>> >>
>> >>
>> >> 1. The kafka Lag is increasing constantly and this leads to the
>> overload
>> >> of
>> >> the storm worker running the kafka spout. At the end, the worker
>> crashes
>> >> and it is automatically restarted by Storm.
>> >>
>> >>
>> >>
>> >> With old kafka spout version, we had a lag most of the times bellow
>> 1.
>> >>
>> >> With the new one, we are starting with Kafka lag about 3 and
>> >> increasing
>> >> until crash.
>> >>
>> >>
>> >>
>> >> 2. With the new Kafka Spout, we have faced this exception many times:
>> >>
>> >>
>> >>
>> >> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>> cannot be
>> >> completed since the group has already rebalanced and assigned the
>> >> partitions to another member. This means that the time between
>> subsequent
>> >> calls to poll() was longer than the configured max.poll.interval.ms,
>> >> which
>> >> typically implies that the poll loop is spending too much time message
>> >> processing. You can address this either by increasing the session
>> timeout
>> >> or by reducing the maximum size of batches returned in poll() with
>> >> max.poll.records. at
>> >> org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>> >> tor.sendOffsetCommitRequest(ConsumerCoordinator.java:702)
>> >> at
>>

Re: [DISCUSS] Storm 2.0 Roadmap

2017-06-28 Thread Jungtaek Lim

That's great news that metrics work is ready!

I'm +1 to Taylor's proposal, but in order to respect semantic versioning, I
propose some modifications from Taylor's proposal:

- create 1.1.x-branch with target version 1.1.1-SNAPSHOT and port back only
bug fixes to the 1.1.x-branch
- change the target version of 1.x-branch to 1.2.0-SNAPSHOT

If we also agree above, I would like to volunteer the back-port work.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 6월 28일 (수) 오전 10:09, Harsha 님이 작성:

> +1 for above stated approach on releasing 1.2.0 with metrics
> -Harsha
>
> On Tue, Jun 27, 2017, at 12:17 PM, P. Taylor Goetz wrote:
> > The work on metrics is ready for a pull request to 1.x-branch from the
> > feature branch. I’ve held off because we haven’t reached consensus on a
> > path forward with the 1.x release lines .
> >
> > I’d like to propose the following for the 1.x line:
> >
> > 1. Create a branch for 1.2 so we have a branch to review the metrics
> > stuff.
> > 2. Release 1.1.1
> > 3. Review/merge metrics work. Port metrics to master.
> > 4. Release 1.2.0
> > 5. Put the entire 1.x line into maintenance mode. Drop support for 1.0.x.
> > (we would only support 1.2.x and 1.1.x which are very closely aligned).
> >
> > Dropping support for 1.0.x line would eliminate the need to maintain one
> > of the fairly heavily diverged branches. The 1.2.x and 1.1.x would be
> > very closely aligned. I just up merged metrics_v2 against 1.x-branch
> > after a while, and there were no conflicts.
> >
> > That would give us a little more bandwidth to focus on 2.0 and needed bug
> > fixes to the 1.x line like some of the issues raised with
> > storm-kafka-client. We could even start releasing alpha/beta versions of
> > 2.0 in parallel to the steps above.
> >
> > Any thoughts on that approach?
> >
> > -Taylor
> >
> >
> > > On Jun 24, 2017, at 1:21 AM, Jungtaek Lim  wrote:
> > >
> > > Yes I prefer option 1, but it might depend on the progress of metrics
> V2.
> > > If it can be done within predictable near future I'm OK to pick option
> 2,
> > > but if not, we may be better to focus releasing 2.0.0 and make it
> really
> > > happen.
> > >
> > > Whichever we go, I feel it's time to track remaining work on Storm
> 2.0.0. I
> > > found some bugs on master branch so filed issues, and we've remaining
> port
> > > work (UI and logviewer). We've some other improvements target for
> 2.0.0:
> > > worker redesign, beam integration, and so on, and we don't track its
> > > progress at all. I don't think we should wait for features which
> progress
> > > is not transparent (in other words we don't know when it will be
> finished).
> > >
> > > - Jungtaek Lim (HeartSaVioR)
> > >
> > > 2017년 6월 24일 (토) 오전 5:19, P. Taylor Goetz 님이 작성:
> > >
> > >> Bobby/Jungtaek,
> > >>
> > >> Are you saying you want to forego the 1.2 “metrics_v2” release and
> include
> > >> it only in 2.0? (I ask because that work is already based on
> 1.x-branch,
> > >> and forward-porting it to master is relatively simple.) I’d kind of
> like
> > >> that work go out soon.
> > >>
> > >> If we go with option 1, I would want to see a 2.0 release (even if
> it’s a
> > >> “beta” or “preview) before putting the 1.x line into maintenance mode.
> > >>
> > >> -Taylor
> > >>
> > >>> On Jun 23, 2017, at 9:51 AM, Bobby Evans  >
> > >> wrote:
> > >>>
> > >>> I see 2 ways to address this.
> > >>> 1) We put the 1.x line into maintenance mode like with 0.10.  We
> don't
> > >> backport anything except bug fixes.2) We backport a lot of the
> backwards
> > >> compatible changes from 2.x to 1.x.
> > >>> My personal preference is 1.  It makes it clear the direction we
> want to
> > >> go in.  The biggest issue is that we probably would want to do a 2.x
> > >> release sooner rather then later.  Even if we don't get all of the
> features
> > >> that people want, if we just get a release out we can add in new
> features
> > >> if they are backwards compatible, or we can create a 3.x line that
> would
> > >> have the breaking changes in it.
> > >>>
> > >>> - Bobby
> > >>>
> > >>>
> > >>> On Thursday, June 22, 2017, 7:39:55 PM CDT, Jungtaek Lim <
> > >> kabh...@gmail.com> wrote:
> > >>>
> > >>> I'd like to bump this again instead of initiating new discussion
> thread.
> > >>>
> > >>> I had having hard time to create and apply pull requests for both
> master
> > >>> and 1.x-branch and that's really painful and sometimes blocker for
> me to
> > >> do
> > >>> merge step.
> > >>> Two branches are heavily diverged more than between 0.10 and 1.0.0,
> even
> > >>> IDE can't switch between the branch smoothly. We didn't even address
> > >>> checkstyle issue yet, but after addressing, it could be "completely"
> > >>> diverged. JDK version is another major issue, since the pull requests
> > >>> targeted for master branch are not checked against JDK 7, and some of
> > >> them
> > >>> make some issues regarding JDK version while

java.io.IOException: Connection reset by peer in DRPC

2017-06-28 Thread sam mohel

I submitted two topologies in production mode . First one has a data set
with size 215 MB and worked well  and gave me the results . Second topology
has a data set with size 170 MB with same configurations but stopped worked
after some times and didn't complete its result

The error i got is drpc log file

TNonblockingServer [WARN] Got an IOException in internalRead!
java.io.IOException: Connection reset by peer

I couldn't figure where is the problem as it supposed to work well as
second data set is smaller in size

What is the difference between UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST ?

2017-06-28 Thread Zhechao Ma

The storm-kafka-client document explains these two values just almost the
same except the last word.

https://github.com/apache/storm/blob/master/docs/storm-kafka-client.md

   - UNCOMMITTED_EARLIEST (DEFAULT) means that the kafka spout polls
   records from the last committed offset, if any. If no offset has been
   committed, it behaves as EARLIEST.
   - UNCOMMITTED_LATEST means that the kafka spout polls records from the
   last committed offset, if any. If no offset has been committed, it behaves
   as LATEST.

Or is that a mistake?

-- 
Thanks
Zhechao Ma

65 matches

Mail list logo