[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2241#discussion_r130523483
  
--- Diff: conf/defaults.yaml ---
@@ -253,11 +244,15 @@ topology.trident.batch.emit.interval.millis: 500
 topology.testing.always.try.serialize: false
 topology.classpath: null
 topology.environment: null
-topology.bolts.outgoing.overflow.buffer.enable: false
-topology.disruptor.wait.timeout.millis: 1000
-topology.disruptor.batch.size: 100
-topology.disruptor.batch.timeout.millis: 1
-topology.disable.loadaware.messaging: false
+topology.disruptor.wait.timeout.millis: 1000  # TODO: Roshan: not used, 
but we may/not want this behavior
+topology.transfer.buffer.size: 5
+topology.transfer.batch.size: 10
+topology.executor.receive.buffer.size: 5
+topology.producer.batch.size: 1000
+topology.flush.tuple.freq.millis: 100
--- End diff --

@roshannaik, why is it 100 ms, is it based on some benchmarks ?  

As per [design 
doc](https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit#heading=h.gjdgxs)
 posted in the JIRA, the JCTools - MPSCArrayQ provides a throughput of 68 
Million/Sec with 20 producers and the performance doesn't seem to degrade much 
as the number of producers increases. If so why do we need to batch and flush 
the tuples to the consumer queue. If the producers enqueue the events directly 
into the receivers queue it would simplify the design and address the latency 
concerns.

Also I assume if the batch size is set to 1, the events are directly 
enqueued and the flush threads are not started?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

2017-07-31 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/2241#discussion_r130522690
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/supervisor/BasicContainer.java
 ---
@@ -346,7 +346,7 @@ protected String getWildcardDir(File dir) {
 }
 
 protected List frameworkClasspath(SimpleVersion topoVersion) {
-File stormWorkerLibDir = new File(_stormHome, "lib-worker");
+File stormWorkerLibDir = new File(_stormHome, "lib");
--- End diff --

This was an effort to reduce dependencies on worker. If the line makes 
issue, let's fix it. Which issues you are seeing before fixing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Considerably slow building website

2017-07-31 Thread Jungtaek Lim
Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
directory is not created in 'publish/releases' directory. Maybe also missed
that.

2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:

> Hi devs,
>
> I'm trying to modify release note on 1.0.4 one of user reported about
> wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
> website locally. Any hints to reduce the time? 50 mins for only building
> the website is really annoying and anyone don't want to wait for that if we
> modify "a" file.
>
> And I found Storm 1.1.0 release note markdown file is missing. Taylor,
> could you add it back to the SVN repo?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>


Considerably slow building website

2017-07-31 Thread Jungtaek Lim
Hi devs,

I'm trying to modify release note on 1.0.4 one of user reported about wrong
CHANGELOG. And surprisingly, it took about 50 mins to serve the website
locally. Any hints to reduce the time? 50 mins for only building the
website is really annoying and anyone don't want to wait for that if we
modify "a" file.

And I found Storm 1.1.0 release note markdown file is missing. Taylor,
could you add it back to the SVN repo?

Thanks,
Jungtaek Lim (HeartSaVioR)


[GitHub] storm issue #2250: WIP: STORM-2665: Adapt Kafka's release note generation sc...

2017-07-31 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2250
  
Whoops, seems like there's a different jira module in the directory I put 
the script in than I expected. Will move it. Here's some sample output after 
running `$ ./release_notes.py 1.1.1 > relnotes.html` 
https://paste.apache.org/JFDP


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2250: WIP: STORM-2665: Adapt Kafka's release note generation sc...

2017-07-31 Thread hmcl
Github user hmcl commented on the issue:

https://github.com/apache/storm/pull/2250
  
@srdo Can you please add a screenshot, or a link for a sample of the 
output. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2243: STORM-2658: Extract storm-kafka-client examples to storm-...

2017-07-31 Thread hmcl
Github user hmcl commented on the issue:

https://github.com/apache/storm/pull/2243
  
+1. Thanks @srdo 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Jungtaek Lim
I'm seeing several voices worrying about JIRA update.

I think the main reason to miss to update is that we're doing it manually.
If you remember the PR about adopting Kafka merge script, it also updates
corresponding JIRA issue at the end of merge. If you're not aware of,
please refer https://github.com/apache/storm/pull/1468 to see long
explanation and discussion.

Our main concern to adopt Spark/Kafka merge script was squashing commits
(and also no merge commit, maybe), while I still personally see the huge
benefit (commit list itself becomes CHANGELOG) and I see some others fan of
squashed commit, but we still modify the script to do the merge commit like
what we're doing. That's what we can discuss and decide, not the blocker
for merge script I think.

Let's suppose we get rid of commit for updating CHANGELOG, and we still
rely on merge commit. Could we determine which JIRA issue is addressed only
from merge commit's commit title? Yes or no, depending on how contributor
names their branch, and we can't force that (and forcing even their branch
is going to be really annoying). So commit title of the merge commit should
be conform to the formal format (say, contains JIRA title or so), or just
leave squashed commit. Refining the title of merge commit manually will be
going to another pain for merger, so should be automated as well.

tl;dr. This is the time to reconsider merge script, maybe modify
Spark/Kafka merge script to conform to Storm project. This helps squashing
commits (only if we decide to go on), or set informative title to the merge
commit if we reside on merge commit. This also helps on resolving
corresponding JIRA issue as well.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 8월 1일 (화) 오전 5:48, Stig Rohde Døssing 님이 작성:

> Would it fit alongside the other release artifacts in e.g.
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/? That
> seems to be what Kafka is doing as well
> https://dist.apache.org/repos/dist/release/kafka/0.11.0.0/.
>
> If we could put the change log up along the other artifacts, we could
> probably get away with not having it included in the src/bin distributions,
> because people could get the change log from the same mirrors they got the
> distributions from.
>
> 2017-07-31 22:37 GMT+02:00 P. Taylor Goetz :
>
> > A couple thoughts/questions regarding the mechanics of publishing the
> > resulting HTML file.
> >
> > When voting on release candidates, in the past we point to the CHANGELOG
> > file in git. What would we do in this case?
> >
> > My assumption is the release manager would generate the file and post it
> > to their account on people.apache.org. After a successful vote, the
> > change log would be published to the storm.a.o website, presumably in a
> > /changelogs/${version}.html file.
> >
> > One could argue we could simply link to a JIRA filter for that release,
> > but I don’t like the idea of linking to something inherently mutable as a
> > release artifact.
> >
> > Would we include the file in the source and/or binary distributions? If
> > so, where, and what would be the process?
> >
> > I’m interested in hearing others’ thoughts.
> >
> > -Taylor
> >
> >
> > > On Jul 31, 2017, at 3:50 PM, Stig Rohde Døssing <
> stigdoess...@gmail.com>
> > wrote:
> > >
> > > Opened JIRA here https://issues.apache.org/jira/browse/STORM-2665 and
> > took
> > > a look at adjusting Kafka's script here
> > > https://github.com/apache/storm/pull/2250
> > >
> > > 2017-07-31 21:02 GMT+02:00 Bobby Evans :
> > >
> > >> So it looks like we all agree, now we just need someone to file a JIRA
> > and
> > >> a corresponding pull request.  The kafka script looks like a good
> place
> > to
> > >> start, but we can iterate on it in the pull request to try and address
> > >> Taylor's concern about JIRA not being up to date.  I would love to do
> > it,
> > >> but I am really overloaded right now so if someone else wants to take
> > lead
> > >> on it that would be great.
> > >>
> > >>
> > >> - Bobby
> > >>
> > >>
> > >> On Monday, July 31, 2017, 1:45:14 PM CDT, P. Taylor Goetz <
> > >> ptgo...@gmail.com> wrote:
> > >>
> > >> I’m all for getting rid of the current process for CHANGELOG. My only
> > >> concern with any JIRA-based solution is that we would need to be very
> > good
> > >> about setting the “Fix Version” field properly when merging a patch
> and
> > >> updating the associated ticket. In the past I’ve seen a lot of patches
> > >> merged without the associated JIRA updated. If we’re going to rely on
> > JIRA
> > >> as the source of truth for change logs, we need to be very
> conscientious
> > >> about updating JIRA as necessary.
> > >>
> > >> -Taylor
> > >>
> > >>> On Jul 31, 2017, at 10:06 AM, Bobby Evans
>  > >
> > >> wrote:
> > >>>
> > >>> I am happy to switch as soon as someone has a working alternative.
> The
> > >> big thing in my opinion is giving end 

[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r130462981
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/windowing/WindowManager.java ---
@@ -111,14 +125,86 @@ public void add(Event windowEvent) {
 LOG.debug("Got watermark event with ts {}", 
windowEvent.getTimestamp());
 }
 track(windowEvent);
-compactWindow();
+if (!stateful) {
+compactWindow();
+}
 }
 
 /**
  * The callback invoked by the trigger policy.
  */
 @Override
 public boolean onTrigger() {
+return stateful ? doOnTriggerStateful() : doOnTrigger();
+}
+
+private static class IteratorStatus {
+private boolean valid = true;
+
+void invalidate() {
+valid = false;
+}
+
+boolean isValid() {
+return valid;
+}
+}
+
+private static Iterator expiringIterator(Iterator inner, 
IteratorStatus status) {
+return new Iterator() {
+@Override
+public boolean hasNext() {
+if (status.isValid()) {
+return inner.hasNext();
+}
+throw new IllegalStateException("Stale iterator");
+}
+
+@Override
+public T next() {
+if (status.isValid()) {
+return inner.next();
+}
+throw new IllegalStateException("Stale iterator");
+}
+};
+}
+
+private boolean doOnTriggerStateful() {
+Supplier scanEventsStateful = 
this::scanEventsStateful;
+Iterator it = scanEventsStateful.get();
+boolean hasEvents = it.hasNext();
+if (hasEvents) {
+final IteratorStatus status = new IteratorStatus();
+LOG.debug("invoking windowLifecycleListener onActivation with 
iterator");
+// reuse the retrieved iterator
+Supplier wrapper = new Supplier() {
+Iterator initial = it;
+@Override
+public Iterator get() {
+if (status.isValid()) {
--- End diff --

Makes sense, thanks. Could you expand the IllegalStateException to mention 
this? Why is the iterator being updated in line 191 btw?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2243: STORM-2658: Extract storm-kafka-client examples to storm-...

2017-07-31 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2243
  
@hmcl I think I addressed everything. Please look again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Stig Rohde Døssing
Would it fit alongside the other release artifacts in e.g.
https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/? That
seems to be what Kafka is doing as well
https://dist.apache.org/repos/dist/release/kafka/0.11.0.0/.

If we could put the change log up along the other artifacts, we could
probably get away with not having it included in the src/bin distributions,
because people could get the change log from the same mirrors they got the
distributions from.

2017-07-31 22:37 GMT+02:00 P. Taylor Goetz :

> A couple thoughts/questions regarding the mechanics of publishing the
> resulting HTML file.
>
> When voting on release candidates, in the past we point to the CHANGELOG
> file in git. What would we do in this case?
>
> My assumption is the release manager would generate the file and post it
> to their account on people.apache.org. After a successful vote, the
> change log would be published to the storm.a.o website, presumably in a
> /changelogs/${version}.html file.
>
> One could argue we could simply link to a JIRA filter for that release,
> but I don’t like the idea of linking to something inherently mutable as a
> release artifact.
>
> Would we include the file in the source and/or binary distributions? If
> so, where, and what would be the process?
>
> I’m interested in hearing others’ thoughts.
>
> -Taylor
>
>
> > On Jul 31, 2017, at 3:50 PM, Stig Rohde Døssing 
> wrote:
> >
> > Opened JIRA here https://issues.apache.org/jira/browse/STORM-2665 and
> took
> > a look at adjusting Kafka's script here
> > https://github.com/apache/storm/pull/2250
> >
> > 2017-07-31 21:02 GMT+02:00 Bobby Evans :
> >
> >> So it looks like we all agree, now we just need someone to file a JIRA
> and
> >> a corresponding pull request.  The kafka script looks like a good place
> to
> >> start, but we can iterate on it in the pull request to try and address
> >> Taylor's concern about JIRA not being up to date.  I would love to do
> it,
> >> but I am really overloaded right now so if someone else wants to take
> lead
> >> on it that would be great.
> >>
> >>
> >> - Bobby
> >>
> >>
> >> On Monday, July 31, 2017, 1:45:14 PM CDT, P. Taylor Goetz <
> >> ptgo...@gmail.com> wrote:
> >>
> >> I’m all for getting rid of the current process for CHANGELOG. My only
> >> concern with any JIRA-based solution is that we would need to be very
> good
> >> about setting the “Fix Version” field properly when merging a patch and
> >> updating the associated ticket. In the past I’ve seen a lot of patches
> >> merged without the associated JIRA updated. If we’re going to rely on
> JIRA
> >> as the source of truth for change logs, we need to be very conscientious
> >> about updating JIRA as necessary.
> >>
> >> -Taylor
> >>
> >>> On Jul 31, 2017, at 10:06 AM, Bobby Evans  >
> >> wrote:
> >>>
> >>> I am happy to switch as soon as someone has a working alternative.  The
> >> big thing in my opinion is giving end users a clear list of all of the
> >> changes that went into a release so they can review it for themselves.
> >> However we do it is fine with me as the current changelog file leaves a
> lot
> >> to be desired. I personally would be fine with us updating the web
> >> page/release notes to have a link to a JIRA query in it as a starting
> point.
> >>>
> >>> https://issues.apache.org/jira/issues/?jql=project%20%
> >> 3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%
> >> 20priority%20DESC
> >>> or
> >>> https://issues.apache.org/jira/issues/?jql=project%20%
> >> 3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%
> >> 20priority%20DESC
> >>> for example.
> >>> Later on we can start looking at more complex alternatives that run the
> >> above query and join it with the git revision history and possibly pull
> >> requests to give a more complete view for what has happened.
> >>>
> >>> - Bobby
> >>>
> >>>
> >>> On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim <
> >> kabh...@gmail.com> wrote:
> >>>
> >>> Let me also put long ago discussion about this:
> >>>
> >>> http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+
> >> DISCUSSION+More+convenient+way+to+maintain+committer+
> contributor+list+and+
> >> changelogs
> >>>
> >>>
> >>> In my view, from long ago discussion, Haohui and Bobby agreed to not
> >>> maintain CHANGELOG by hand. Haohui also suggested how to get them
> >>> automatically, whereas I just would want to remove it, but that's also
> >> OK)
> >>> We didn’t get agreement clearly about removing CHANGELOG but at least
> saw
> >>> our needs to automate it.
> >>>
> >>>
> >>> And in current discussion, again in my view, Roshan, Hugo, Stig agree
> to
> >>> remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG,
> >> so 3
> >>> PMC members and 1 contributor seem to agree on removing CHANGELOG, and
> at
> >>> least 2 more PMC members to not maintain CHANGELOG manually.
> >>>
> >>>
> >>> I 

Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread P. Taylor Goetz
A couple thoughts/questions regarding the mechanics of publishing the resulting 
HTML file.

When voting on release candidates, in the past we point to the CHANGELOG file 
in git. What would we do in this case?

My assumption is the release manager would generate the file and post it to 
their account on people.apache.org. After a successful vote, the change log 
would be published to the storm.a.o website, presumably in a 
/changelogs/${version}.html file.

One could argue we could simply link to a JIRA filter for that release, but I 
don’t like the idea of linking to something inherently mutable as a release 
artifact.

Would we include the file in the source and/or binary distributions? If so, 
where, and what would be the process?

I’m interested in hearing others’ thoughts.

-Taylor


> On Jul 31, 2017, at 3:50 PM, Stig Rohde Døssing  
> wrote:
> 
> Opened JIRA here https://issues.apache.org/jira/browse/STORM-2665 and took
> a look at adjusting Kafka's script here
> https://github.com/apache/storm/pull/2250
> 
> 2017-07-31 21:02 GMT+02:00 Bobby Evans :
> 
>> So it looks like we all agree, now we just need someone to file a JIRA and
>> a corresponding pull request.  The kafka script looks like a good place to
>> start, but we can iterate on it in the pull request to try and address
>> Taylor's concern about JIRA not being up to date.  I would love to do it,
>> but I am really overloaded right now so if someone else wants to take lead
>> on it that would be great.
>> 
>> 
>> - Bobby
>> 
>> 
>> On Monday, July 31, 2017, 1:45:14 PM CDT, P. Taylor Goetz <
>> ptgo...@gmail.com> wrote:
>> 
>> I’m all for getting rid of the current process for CHANGELOG. My only
>> concern with any JIRA-based solution is that we would need to be very good
>> about setting the “Fix Version” field properly when merging a patch and
>> updating the associated ticket. In the past I’ve seen a lot of patches
>> merged without the associated JIRA updated. If we’re going to rely on JIRA
>> as the source of truth for change logs, we need to be very conscientious
>> about updating JIRA as necessary.
>> 
>> -Taylor
>> 
>>> On Jul 31, 2017, at 10:06 AM, Bobby Evans 
>> wrote:
>>> 
>>> I am happy to switch as soon as someone has a working alternative.  The
>> big thing in my opinion is giving end users a clear list of all of the
>> changes that went into a release so they can review it for themselves.
>> However we do it is fine with me as the current changelog file leaves a lot
>> to be desired. I personally would be fine with us updating the web
>> page/release notes to have a link to a JIRA query in it as a starting point.
>>> 
>>> https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%
>> 20priority%20DESC
>>> or
>>> https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%
>> 20priority%20DESC
>>> for example.
>>> Later on we can start looking at more complex alternatives that run the
>> above query and join it with the git revision history and possibly pull
>> requests to give a more complete view for what has happened.
>>> 
>>> - Bobby
>>> 
>>> 
>>> On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim <
>> kabh...@gmail.com> wrote:
>>> 
>>> Let me also put long ago discussion about this:
>>> 
>>> http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+
>> DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+
>> changelogs
>>> 
>>> 
>>> In my view, from long ago discussion, Haohui and Bobby agreed to not
>>> maintain CHANGELOG by hand. Haohui also suggested how to get them
>>> automatically, whereas I just would want to remove it, but that's also
>> OK)
>>> We didn’t get agreement clearly about removing CHANGELOG but at least saw
>>> our needs to automate it.
>>> 
>>> 
>>> And in current discussion, again in my view, Roshan, Hugo, Stig agree to
>>> remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG,
>> so 3
>>> PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
>>> least 2 more PMC members to not maintain CHANGELOG manually.
>>> 
>>> 
>>> I will initiate a VOTE thread if we need to. Again, release managers
>> would
>>> be affected by this change so I would want to hear Taylor’s opinion
>> before
>>> going forward, but this is clear pain point for mergers so will initiate
>> a
>>> VOTE thread in several days (at least in this week) if Taylor doesn’t put
>>> opinion on this or misses this discussion.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Jungtaek Lim (HeartSaVioR)
>>> 
>>> 2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:
>>> 
 correction: other projects -> *some* other projects, though they're
 popular projects (including in competition)
 
 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
 
> I'm happy that there're other 

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130449657
  
--- Diff: 
examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/spout/test/KafkaSpoutTopologyMainNamedTopics.java
 ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.test;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.storm.Config;
+import org.apache.storm.StormSubmitter;
+import org.apache.storm.generated.StormTopology;
+import org.apache.storm.kafka.spout.ByTopicRecordTranslator;
+import org.apache.storm.kafka.spout.KafkaSpout;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import 
org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff.TimeInterval;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.trident.KafkaProducerTopology;
+import org.apache.storm.topology.TopologyBuilder;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+public class KafkaSpoutTopologyMainNamedTopics {
+
+private static final String TOPIC_2_STREAM = "test_2_stream";
+private static final String TOPIC_0_1_STREAM = "test_0_1_stream";
+private static final String KAFKA_LOCAL_BROKER = "localhost:9092";
+private static final String TOPIC_0 = "kafka-spout-test";
+private static final String TOPIC_1 = "kafka-spout-test-1";
+private static final String TOPIC_2 = "kafka-spout-test-2";
+
+public static void main(String[] args) throws Exception {
+new KafkaSpoutTopologyMainNamedTopics().runMain(args);
+}
+
+protected void runMain(String[] args) throws Exception {
--- End diff --

I remembered why I removed it. LocalCluster is in the storm-server jar, 
which isn't included by the example projects. I think including it would cause 
conflict when the jar is deployed to a real cluster. How about I move the 
ability to run this from a local cluster to a test class? That should still 
leave people able to run on a local cluster from an IDE, but doesn't interfere 
with the generated jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Stig Rohde Døssing
Opened JIRA here https://issues.apache.org/jira/browse/STORM-2665 and took
a look at adjusting Kafka's script here
https://github.com/apache/storm/pull/2250

2017-07-31 21:02 GMT+02:00 Bobby Evans :

> So it looks like we all agree, now we just need someone to file a JIRA and
> a corresponding pull request.  The kafka script looks like a good place to
> start, but we can iterate on it in the pull request to try and address
> Taylor's concern about JIRA not being up to date.  I would love to do it,
> but I am really overloaded right now so if someone else wants to take lead
> on it that would be great.
>
>
> - Bobby
>
>
> On Monday, July 31, 2017, 1:45:14 PM CDT, P. Taylor Goetz <
> ptgo...@gmail.com> wrote:
>
> I’m all for getting rid of the current process for CHANGELOG. My only
> concern with any JIRA-based solution is that we would need to be very good
> about setting the “Fix Version” field properly when merging a patch and
> updating the associated ticket. In the past I’ve seen a lot of patches
> merged without the associated JIRA updated. If we’re going to rely on JIRA
> as the source of truth for change logs, we need to be very conscientious
> about updating JIRA as necessary.
>
> -Taylor
>
> > On Jul 31, 2017, at 10:06 AM, Bobby Evans 
> wrote:
> >
> > I am happy to switch as soon as someone has a working alternative.  The
> big thing in my opinion is giving end users a clear list of all of the
> changes that went into a release so they can review it for themselves.
> However we do it is fine with me as the current changelog file leaves a lot
> to be desired. I personally would be fine with us updating the web
> page/release notes to have a link to a JIRA query in it as a starting point.
> >
> > https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%
> 20priority%20DESC
> > or
> > https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%
> 20priority%20DESC
> > for example.
> > Later on we can start looking at more complex alternatives that run the
> above query and join it with the git revision history and possibly pull
> requests to give a more complete view for what has happened.
> >
> > - Bobby
> >
> >
> > On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim <
> kabh...@gmail.com> wrote:
> >
> > Let me also put long ago discussion about this:
> >
> > http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+
> DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+
> changelogs
> >
> >
> > In my view, from long ago discussion, Haohui and Bobby agreed to not
> > maintain CHANGELOG by hand. Haohui also suggested how to get them
> > automatically, whereas I just would want to remove it, but that's also
> OK)
> > We didn’t get agreement clearly about removing CHANGELOG but at least saw
> > our needs to automate it.
> >
> >
> > And in current discussion, again in my view, Roshan, Hugo, Stig agree to
> > remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG,
> so 3
> > PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
> > least 2 more PMC members to not maintain CHANGELOG manually.
> >
> >
> > I will initiate a VOTE thread if we need to. Again, release managers
> would
> > be affected by this change so I would want to hear Taylor’s opinion
> before
> > going forward, but this is clear pain point for mergers so will initiate
> a
> > VOTE thread in several days (at least in this week) if Taylor doesn’t put
> > opinion on this or misses this discussion.
> >
> >
> > Thanks,
> >
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:
> >
> >> correction: other projects -> *some* other projects, though they're
> >> popular projects (including in competition)
> >>
> >> 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
> >>
> >>> I'm happy that there're other guys having same difficult and sharing
> same
> >>> feeling.
> >>>
> >>> This discussion has been initiating several times (from me) and getting
> >>> some +1s for each thread but didn't reach to actual work.
> >>>
> >>> We already utilize JIRA, and I'm subscribing issues@ and taking care
> of
> >>> issues forgot to mark resolve and/or labeling fixed versions.
> >>> It may sounds ideal for us to let reporters caring about their issues,
> >>> but committers can also help that, and in fact merger is in
> responsible to
> >>> take care of resolving the issue, so irrelevant to contributor for this
> >>> side.
> >>>
> >>> My other consideration is that which thing is convenient for release
> >>> manager. Taylor took the release manager all the time (thanks for the
> great
> >>> work!) and it is directly related to release announcement so would
> like to
> >>> hear his opinion. If it is more convenient or he think he can tolerate
> >>> that, we can just go on.
> 

[GitHub] storm pull request #2250: WIP: STORM-2665: Adapt Kafka's release note genera...

2017-07-31 Thread srdo
GitHub user srdo opened a pull request:

https://github.com/apache/storm/pull/2250

WIP: STORM-2665: Adapt Kafka's release note generation script for Storm

See https://issues.apache.org/jira/browse/STORM-2665

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srdo/storm STORM-2665

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2250


commit 4e523f21bb3433a8dcf6f6e24215f72f101e340a
Author: Stig Rohde Døssing 
Date:   2017-07-31T19:48:22Z

WIP: STORM-2665: Adapt Kafka's release note generation script for Storm




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Release Apache Storm 1.1.1 (rc2)

2017-07-31 Thread P. Taylor Goetz
This vote is now closed and passes with 5 binding +1 votes and not 0 or -1 
votes.

Vote tally (* indicates a binding vote):

+1:
Juntaek Lim*
Bobby Evans*
Stig Rohde Døssing*
Harsha Chintalapani*
P. Taylor Goetz*

I will release the artifacts from staging and announce the release after 24 
hours.

Thanks to all who voted.

-Taylor

> On Jul 27, 2017, at 2:38 PM, P. Taylor Goetz  wrote:
> 
> This is a call to vote on releasing Apache Storm 1.1.1 (rc2)
> 
> Full list of changes in this release:
> 
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=CHANGELOG.md;hb=41bfea87b1a002565333bd18a06d766af1ca3816
> 
> The tag/commit to be voted upon is v1.1.1:
> 
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=tree;h=948ce7d63a31fae8c478785985d0ef79808e234e;hb=41bfea87b1a002565333bd18a06d766af1ca3816
> 
> The source archive being voted upon can be found here:
> 
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/apache-storm-1.1.1-src.tar.gz
> 
> Other release files, signatures and digests can be found here:
> 
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/
> 
> The release artifacts are signed with the following key:
> 
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
> 
> The Nexus staging repository for this release is:
> 
> https://repository.apache.org/content/repositories/orgapachestorm-1050
> 
> Please vote on releasing this package as Apache Storm 1.1.1.
> 
> When voting, please list the actions taken to verify the release.
> 
> This vote will be open for at least 72 hours.
> 
> [ ] +1 Release this package as Apache Storm 1.1.1
> [ ]  0 No opinion
> [ ] -1 Do not release this package because...
> 
> Thanks to everyone who contributed to this release.
> 
> -Taylor



[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130441519
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/streams/processors/CoGroupByKeyProcessor.java
 ---
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.streams.processors;
+
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
+import org.apache.storm.streams.Pair;
+import org.apache.storm.streams.operations.PairValueJoiner;
+import org.apache.storm.streams.tuple.Tuple3;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+
+/**
+ * co-group by key implementation
+ */
+public class CoGroupByKeyProcessor extends 
BaseProcessor> implements BatchProcessor {
+private final PairValueJoiner 
valueJoiner;
+private final String firstStream;
+private final String secondStream;
+private final List> firstRows = new ArrayList<>();
+private final List> secondRows = new ArrayList<>();
+
+public CoGroupByKeyProcessor(String firstStream, String secondStream, 
PairValueJoiner valueJoiner) {
+this.valueJoiner = valueJoiner;
+this.firstStream = firstStream;
+this.secondStream = secondStream;
+}
+
+@Override
+public void execute(Pair input, String sourceStream) {
+K key = input.getFirst();
+if (sourceStream.equals(firstStream)) {
+V1 val = (V1) input.getSecond();
+Pair pair = Pair.of(key, val);
+firstRows.add(pair);
+} else if (sourceStream.equals(secondStream)) {
+V2 val = (V2) input.getSecond();
+Pair pair = Pair.of(key, val);
+secondRows.add(pair);
+}
+if (!context.isWindowed()) {
+joinAndForward(firstRows, secondRows);
+}
+
+}
+
+@Override
+public void finish() {
+joinAndForward(firstRows, secondRows);
+firstRows.clear();
+secondRows.clear();
+}
+
+private void joinAndForward(List> firstRows, List> secondRows) {
+for (Tuple3 res : 
join(getJoinTable(firstRows), getJoinTable(secondRows))) {
+context.forward(Pair.of(res._1, valueJoiner.apply(res._2, 
res._3)));
+
+}
+}
+
+/*
+ * returns list of Tuple3 (key, val from table, val from row)
+ */
+
+private  List> 
join(Multimap tab1, Multimap tab2) {
+List> res = new 
ArrayList<>();
+for (K key : tab1.keys()) {
--- End diff --

`keys` returns a multi-set, I am not sure how this would work. You will end 
up with duplicate results for the same key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130441506
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/streams/processors/CoGroupByKeyProcessor.java
 ---
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.streams.processors;
+
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
+import org.apache.storm.streams.Pair;
+import org.apache.storm.streams.operations.PairValueJoiner;
+import org.apache.storm.streams.tuple.Tuple3;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+
+/**
+ * co-group by key implementation
+ */
+public class CoGroupByKeyProcessor extends 
BaseProcessor> implements BatchProcessor {
+private final PairValueJoiner 
valueJoiner;
+private final String firstStream;
+private final String secondStream;
+private final List> firstRows = new ArrayList<>();
+private final List> secondRows = new ArrayList<>();
+
+public CoGroupByKeyProcessor(String firstStream, String secondStream, 
PairValueJoiner valueJoiner) {
+this.valueJoiner = valueJoiner;
+this.firstStream = firstStream;
+this.secondStream = secondStream;
+}
+
+@Override
+public void execute(Pair input, String sourceStream) {
+K key = input.getFirst();
+if (sourceStream.equals(firstStream)) {
+V1 val = (V1) input.getSecond();
+Pair pair = Pair.of(key, val);
+firstRows.add(pair);
+} else if (sourceStream.equals(secondStream)) {
+V2 val = (V2) input.getSecond();
+Pair pair = Pair.of(key, val);
+secondRows.add(pair);
+}
+if (!context.isWindowed()) {
+joinAndForward(firstRows, secondRows);
+}
+
+}
+
+@Override
+public void finish() {
+joinAndForward(firstRows, secondRows);
+firstRows.clear();
+secondRows.clear();
+}
+
+private void joinAndForward(List> firstRows, List> secondRows) {
+for (Tuple3 res : 
join(getJoinTable(firstRows), getJoinTable(secondRows))) {
--- End diff --

Once you maintain the two multi-maps, you don't need the join step. You 
could iterate over them directly and forward the entries.
```java
mm1.asMap().forEach((key, values) -> {
context.forward(Pair.of(key, Pair.of(values, 
mm2.removeAll(key;
});

mm2.asMap().forEach((key, values) -> {
 context.forward(Pair.of(key, Pair.of(mm1.get(key), values)));
});
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130434940
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/streams/processors/CoGroupByKeyProcessor.java
 ---
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.streams.processors;
+
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
+import org.apache.storm.streams.Pair;
+import org.apache.storm.streams.operations.PairValueJoiner;
+import org.apache.storm.streams.tuple.Tuple3;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+
+/**
+ * co-group by key implementation
+ */
+public class CoGroupByKeyProcessor extends 
BaseProcessor> implements BatchProcessor {
+private final PairValueJoiner 
valueJoiner;
--- End diff --

`Iterable`, `Iterable` instead of `Collection`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130434631
  
--- Diff: storm-client/src/jvm/org/apache/storm/streams/PairStream.java ---
@@ -400,6 +422,18 @@
 return new PairStream<>(streamBuilder, joinNode);
 }
 
+private  PairStream coGroupByKeyPartition(PairStream otherStream,
+   
PairValueJoiner valueJoiner) {
+String firstStream = stream;
+String secondStream = otherStream.stream;
+Node joinNode = addProcessorNode(
--- End diff --

nit: coGroupNode


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130435926
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/streams/processors/CoGroupByKeyProcessor.java
 ---
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.streams.processors;
+
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
+import org.apache.storm.streams.Pair;
+import org.apache.storm.streams.operations.PairValueJoiner;
+import org.apache.storm.streams.tuple.Tuple3;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+
+/**
+ * co-group by key implementation
+ */
+public class CoGroupByKeyProcessor extends 
BaseProcessor> implements BatchProcessor {
+private final PairValueJoiner 
valueJoiner;
+private final String firstStream;
+private final String secondStream;
+private final List> firstRows = new ArrayList<>();
--- End diff --

Why not just maintain two multi-maps and add directly? This list looks 
redundant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130434218
  
--- Diff: storm-client/src/jvm/org/apache/storm/streams/PairStream.java ---
@@ -380,6 +383,25 @@
 return partitionBy(KEY).updateStateByKeyPartition(stateUpdater);
 }
 
+/**
+ * group the values of this stream with the values having the same key 
from the other stream.
+ * 
+ * Note: The parallelism of this stream is carried forward to the 
joined stream.
+ * 
+ *
+ * @param otherStream the other stream
+ * @param valueJoiner the {@link PairValueJoiner}
+ * @param  the type of the values resulting from the 
grouping
+ * @param the type of the values in the other stream
+ * @return the new stream
+ */
+public  PairStream coGroupByKey(PairStream 
otherStream,
+ 
PairValueJoiner valueJoiner) {
--- End diff --

In the case of Co-group it does not make much sense to expose the 
PairValueJoiner in the public API. If needed this can be an implementation 
detail.

So the API could be something like,
```java
public  PairStream> 
coGroupByKey(PairStream otherStream)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130429097
  
--- Diff: storm-client/src/jvm/org/apache/storm/streams/PairStream.java ---
@@ -380,6 +383,25 @@
 return partitionBy(KEY).updateStateByKeyPartition(stateUpdater);
 }
 
+/**
+ * group the values of this stream with the values having the same key 
from the other stream.
+ * 
+ * Note: The parallelism of this stream is carried forward to the 
joined stream.
--- End diff --

carried forward to the co-grouped stream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2233#discussion_r130436004
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/streams/processors/CoGroupByKeyProcessor.java
 ---
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.streams.processors;
+
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
+import org.apache.storm.streams.Pair;
+import org.apache.storm.streams.operations.PairValueJoiner;
+import org.apache.storm.streams.tuple.Tuple3;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+
+/**
+ * co-group by key implementation
+ */
+public class CoGroupByKeyProcessor extends 
BaseProcessor> implements BatchProcessor {
+private final PairValueJoiner 
valueJoiner;
+private final String firstStream;
+private final String secondStream;
+private final List> firstRows = new ArrayList<>();
+private final List> secondRows = new ArrayList<>();
--- End diff --

Why not just maintain two multi-maps and add directly? This list looks 
redundant.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Release Apache Storm 1.1.1 (rc2)

2017-07-31 Thread P. Taylor Goetz
+1

Tested several different topologies on a multi-node cluster.

-Taylor

> On Jul 27, 2017, at 2:38 PM, P. Taylor Goetz  wrote:
> 
> This is a call to vote on releasing Apache Storm 1.1.1 (rc2)
> 
> Full list of changes in this release:
> 
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=CHANGELOG.md;hb=41bfea87b1a002565333bd18a06d766af1ca3816
> 
> The tag/commit to be voted upon is v1.1.1:
> 
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=tree;h=948ce7d63a31fae8c478785985d0ef79808e234e;hb=41bfea87b1a002565333bd18a06d766af1ca3816
> 
> The source archive being voted upon can be found here:
> 
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/apache-storm-1.1.1-src.tar.gz
> 
> Other release files, signatures and digests can be found here:
> 
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/
> 
> The release artifacts are signed with the following key:
> 
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
> 
> The Nexus staging repository for this release is:
> 
> https://repository.apache.org/content/repositories/orgapachestorm-1050
> 
> Please vote on releasing this package as Apache Storm 1.1.1.
> 
> When voting, please list the actions taken to verify the release.
> 
> This vote will be open for at least 72 hours.
> 
> [ ] +1 Release this package as Apache Storm 1.1.1
> [ ]  0 No opinion
> [ ] -1 Do not release this package because...
> 
> Thanks to everyone who contributed to this release.
> 
> -Taylor



[GitHub] storm pull request #2240: [STORM-2657] Update SECURITY.MD

2017-07-31 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2240#discussion_r130439781
  
--- Diff: docs/SECURITY.md ---
@@ -478,6 +478,35 @@ nimbus.groups:
  
 
 ### DRPC
-Hopefully more on this soon
+ 
+ Storm provides the Access Control List for the DRPC Authorizer.Users can 
see org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer for more 
details.
+ 
+ There are several DRPC ACL related configurations.
+ 
+ | YAML Setting | Description |
+ ||--|
+ | drpc.authorizer.acl | The class for DRPC ACL. |
+ | drpc.authorizer.acl.filename | File name of the DRPC Authorizer ACL.It 
should be set to "drpc-auth-acl.yaml",users can see drpc-auth-acl.yaml.example 
for more details. |
+ | drpc.authorizer.acl.strict| Whether the DRPCSimpleAclAuthorizer should 
deny requests for operations involving functions that have no explicit ACL 
entry. |
--- End diff --

This is kind of confusing, and that is because the config is kind of 
confusing.  Some configs go in the main storm.yaml

| YAML Setting | Description |
||--|
| drpc.authorizer | A class that will perform authorization for DRPC 
operations.  Set this to 
`org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer` when using 
security. |
| drpc.authorizer.acl.strict | Whether the DRPCSimpleAclAuthorizer should 
deny requests for operations involving functions that have no explicit ACL 
entry. It is useful to set this to false for staging where users may want to 
experiment, but true for production where you want users to be secure. Defaults 
to false. |
| drpc.authorizer.acl.filename | This is the name of a file that the ACLs 
will be loaded from.  It is separate from storm.yaml to allow the file to be 
updated without bringing down a DRPC server. Defaults to drpc-auth-acl.yaml |

The file pointed to by `drpc.authorizer.acl.filename` will have only one 
config in it `drpc.authorizer.acl` this should be of the form 

```yaml
drpc.authorizer.acl:
  "functionName1":
"client.users":
  - "alice"
  - "bob"
"invocation.user": "bob"
```

In this the users `bob` and `alice` as `client.users` are allowed to run 
DRPC requests against functionName1, but only `bob` as the `invocation.user` is 
allowed to run the topology that actually processes those requests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2240: [STORM-2657] Update SECURITY.MD

2017-07-31 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2240#discussion_r130436600
  
--- Diff: docs/SECURITY.md ---
@@ -478,6 +478,35 @@ nimbus.groups:
  
 
 ### DRPC
-Hopefully more on this soon
+ 
+ Storm provides the Access Control List for the DRPC Authorizer.Users can 
see org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer for more 
details.
--- End diff --

It would be great if we could turn this into a link.

```

[org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer](javadocs/org/apache/storm/security/auth/authorizer/DRPCSimpleACLAuthorizer.html)
```

Should work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130438812
  
--- Diff: 
examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/spout/test/KafkaSpoutTopologyMainNamedTopics.java
 ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.test;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.storm.Config;
+import org.apache.storm.StormSubmitter;
+import org.apache.storm.generated.StormTopology;
+import org.apache.storm.kafka.spout.ByTopicRecordTranslator;
+import org.apache.storm.kafka.spout.KafkaSpout;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import 
org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff.TimeInterval;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.trident.KafkaProducerTopology;
+import org.apache.storm.topology.TopologyBuilder;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+public class KafkaSpoutTopologyMainNamedTopics {
+
+private static final String TOPIC_2_STREAM = "test_2_stream";
+private static final String TOPIC_0_1_STREAM = "test_0_1_stream";
+private static final String KAFKA_LOCAL_BROKER = "localhost:9092";
+private static final String TOPIC_0 = "kafka-spout-test";
+private static final String TOPIC_1 = "kafka-spout-test-1";
+private static final String TOPIC_2 = "kafka-spout-test-2";
+
+public static void main(String[] args) throws Exception {
+new KafkaSpoutTopologyMainNamedTopics().runMain(args);
+}
+
+protected void runMain(String[] args) throws Exception {
--- End diff --

Yes. I'll restore that bit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130438441
  
--- Diff: 
external/storm-kafka-client/src/test/java/org/apache/storm/kafka/spout/builders/SingleTopicKafkaSpoutConfiguration.java
 ---
@@ -21,15 +21,10 @@
 import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
 
 import org.apache.kafka.clients.consumer.ConsumerConfig;
--- End diff --

Sure, will rename


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130438296
  
--- Diff: 
examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/trident/TridentKafkaClientWordCountNamedTopics.java
 ---
@@ -42,72 +42,68 @@
 import org.apache.storm.tuple.Values;
 
 public class TridentKafkaClientWordCountNamedTopics {
+
 private static final String TOPIC_1 = "test-trident";
 private static final String TOPIC_2 = "test-trident-1";
 private static final String KAFKA_LOCAL_BROKER = "localhost:9092";
 
-private KafkaTridentSpoutOpaque 
newKafkaTridentSpoutOpaque() {
-return new KafkaTridentSpoutOpaque<>(newKafkaSpoutConfig());
+private KafkaTridentSpoutOpaque 
newKafkaTridentSpoutOpaque(KafkaSpoutConfig spoutConfig) {
+return new KafkaTridentSpoutOpaque<>(spoutConfig);
 }
 
 private static Func, List> 
JUST_VALUE_FUNC = new JustValueFunc();
 
 /**
- * Needs to be serializable
+ * Needs to be serializable.
  */
 private static class JustValueFunc implements 
Func, List>, Serializable {
+
 @Override
 public List apply(ConsumerRecord record) {
 return new Values(record.value());
 }
 }
 
-protected KafkaSpoutConfig newKafkaSpoutConfig() {
-return KafkaSpoutConfig.builder(KAFKA_LOCAL_BROKER, TOPIC_1, 
TOPIC_2)
-.setProp(ConsumerConfig.GROUP_ID_CONFIG, 
"kafkaSpoutTestGroup_" + System.nanoTime())
-.setProp(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 
200)
-.setRecordTranslator(JUST_VALUE_FUNC, new Fields("str"))
-.setRetry(newRetryService())
-.setOffsetCommitPeriodMs(10_000)
-.setFirstPollOffsetStrategy(EARLIEST)
-.setMaxUncommittedOffsets(250)
-.build();
+protected KafkaSpoutConfig newKafkaSpoutConfig(String 
bootstrapServers) {
+return KafkaSpoutConfig.builder(bootstrapServers, TOPIC_1, TOPIC_2)
+.setProp(ConsumerConfig.GROUP_ID_CONFIG, 
"kafkaSpoutTestGroup_" + System.nanoTime())
+.setProp(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 200)
+.setRecordTranslator(JUST_VALUE_FUNC, new Fields("str"))
+.setRetry(newRetryService())
+.setOffsetCommitPeriodMs(10_000)
+.setFirstPollOffsetStrategy(EARLIEST)
+.setMaxUncommittedOffsets(250)
+.build();
 }
 
 protected KafkaSpoutRetryService newRetryService() {
 return new KafkaSpoutRetryExponentialBackoff(new 
TimeInterval(500L, TimeUnit.MICROSECONDS),
-TimeInterval.milliSeconds(2), Integer.MAX_VALUE, 
TimeInterval.seconds(10));
+TimeInterval.milliSeconds(2), Integer.MAX_VALUE, 
TimeInterval.seconds(10));
 }
 
 public static void main(String[] args) throws Exception {
 new TridentKafkaClientWordCountNamedTopics().run(args);
 }
 
-protected void run(String[] args) throws AlreadyAliveException, 
InvalidTopologyException, AuthorizationException, InterruptedException {
-if (args.length > 0 && Arrays.stream(args).anyMatch(option -> 
option.equals("-h"))) {
-System.out.printf("Usage: java %s [%s] [%s] [%s] [%s]\n", 
getClass().getName(),
-"broker_host:broker_port", "topic1", "topic2", 
"topology_name");
-} else {
-final String brokerUrl = args.length > 0 ? args[0] : 
KAFKA_LOCAL_BROKER;
-final String topic1 = args.length > 1 ? args[1] : TOPIC_1;
-final String topic2 = args.length > 2 ? args[2] : TOPIC_2;
-
-System.out.printf("Running with broker_url: [%s], topics: [%s, 
%s]\n", brokerUrl, topic1, topic2);
-
-Config tpConf = new Config();
-tpConf.setDebug(true);
-tpConf.setMaxSpoutPending(5);
-
-// Producers
-StormSubmitter.submitTopology(topic1 + "-producer", tpConf, 
KafkaProducerTopology.newTopology(brokerUrl, topic1));
-StormSubmitter.submitTopology(topic2 + "-producer", tpConf, 
KafkaProducerTopology.newTopology(brokerUrl, topic2));
-// Consumer
-StormSubmitter.submitTopology("topics-consumer", tpConf, 
TridentKafkaConsumerTopology.newTopology(newKafkaTridentSpoutOpaque()));
-
-// Print results to console, which also causes the print 
filter in the consumer topology to print the results in the worker log
   

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130435817
  
--- Diff: examples/storm-kafka-client-examples/pom.xml ---
@@ -42,7 +42,9 @@
 org.apache.storm
 storm-kafka
 ${project.version}
+

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130435628
  
--- Diff: examples/storm-kafka-client-examples/README.markdown ---
@@ -0,0 +1,10 @@
+## Usage
+This module contains example topologies demonstrating storm-kafka-client 
spout and Trident usage.
+
+The module is built by `mvn clean package`. This will generate the 
`target/storm-kafka-client-examples-VERSION.jar` file. The jar contains all 
dependencies and can be submitted to Storm via the Storm CLI. For example:
--- End diff --

Will fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Bobby Evans
So it looks like we all agree, now we just need someone to file a JIRA and a 
corresponding pull request.  The kafka script looks like a good place to start, 
but we can iterate on it in the pull request to try and address Taylor's 
concern about JIRA not being up to date.  I would love to do it, but I am 
really overloaded right now so if someone else wants to take lead on it that 
would be great.


- Bobby


On Monday, July 31, 2017, 1:45:14 PM CDT, P. Taylor Goetz  
wrote:

I’m all for getting rid of the current process for CHANGELOG. My only concern 
with any JIRA-based solution is that we would need to be very good about 
setting the “Fix Version” field properly when merging a patch and updating the 
associated ticket. In the past I’ve seen a lot of patches merged without the 
associated JIRA updated. If we’re going to rely on JIRA as the source of truth 
for change logs, we need to be very conscientious about updating JIRA as 
necessary.

-Taylor

> On Jul 31, 2017, at 10:06 AM, Bobby Evans  wrote:
> 
> I am happy to switch as soon as someone has a working alternative.  The big 
> thing in my opinion is giving end users a clear list of all of the changes 
> that went into a release so they can review it for themselves.  However we do 
> it is fine with me as the current changelog file leaves a lot to be desired. 
> I personally would be fine with us updating the web page/release notes to 
> have a link to a JIRA query in it as a starting point.
> 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%20priority%20DESC
> or
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%20priority%20DESC
> for example.
> Later on we can start looking at more complex alternatives that run the above 
> query and join it with the git revision history and possibly pull requests to 
> give a more complete view for what has happened.
> 
> - Bobby
> 
> 
> On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim  
> wrote:
> 
> Let me also put long ago discussion about this:
> 
> http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+changelogs
> 
> 
> In my view, from long ago discussion, Haohui and Bobby agreed to not
> maintain CHANGELOG by hand. Haohui also suggested how to get them
> automatically, whereas I just would want to remove it, but that's also OK)
> We didn’t get agreement clearly about removing CHANGELOG but at least saw
> our needs to automate it.
> 
> 
> And in current discussion, again in my view, Roshan, Hugo, Stig agree to
> remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG, so 3
> PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
> least 2 more PMC members to not maintain CHANGELOG manually.
> 
> 
> I will initiate a VOTE thread if we need to. Again, release managers would
> be affected by this change so I would want to hear Taylor’s opinion before
> going forward, but this is clear pain point for mergers so will initiate a
> VOTE thread in several days (at least in this week) if Taylor doesn’t put
> opinion on this or misses this discussion.
> 
> 
> Thanks,
> 
> Jungtaek Lim (HeartSaVioR)
> 
> 2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:
> 
>> correction: other projects -> *some* other projects, though they're
>> popular projects (including in competition)
>> 
>> 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
>> 
>>> I'm happy that there're other guys having same difficult and sharing same
>>> feeling.
>>> 
>>> This discussion has been initiating several times (from me) and getting
>>> some +1s for each thread but didn't reach to actual work.
>>> 
>>> We already utilize JIRA, and I'm subscribing issues@ and taking care of
>>> issues forgot to mark resolve and/or labeling fixed versions.
>>> It may sounds ideal for us to let reporters caring about their issues,
>>> but committers can also help that, and in fact merger is in responsible to
>>> take care of resolving the issue, so irrelevant to contributor for this
>>> side.
>>> 
>>> My other consideration is that which thing is convenient for release
>>> manager. Taylor took the release manager all the time (thanks for the great
>>> work!) and it is directly related to release announcement so would like to
>>> hear his opinion. If it is more convenient or he think he can tolerate
>>> that, we can just go on.
>>> 
>>> Please note that other projects don't use merge commit. Most of the time
>>> they squash commits in PR into one, labeling commit title as JIRA issue,
>>> making commit list just as CHANGELOG. That's another thing we discussed
>>> earlier and I think we need to discuss again, but that can be discussed
>>> from another thread.
>>> 
>>> Regarding maintaining contributors: easy to explain. Just take 

[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r130433567
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/windowing/WatermarkCountEvictionPolicy.java
 ---
@@ -17,20 +17,28 @@
  */
 package org.apache.storm.windowing;
 
+import org.apache.storm.streams.Pair;
+
+import java.util.concurrent.atomic.AtomicLong;
+
 /**
  * An eviction policy that tracks count based on watermark ts and
  * evicts events up to the watermark based on a threshold count.
  *
  * @param  the type of event tracked by this policy.
  */
-public class WatermarkCountEvictionPolicy extends 
CountEvictionPolicy {
+public class WatermarkCountEvictionPolicy implements EvictionPolicy> {
--- End diff --

Okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r130433067
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/topology/PersistentWindowedBoltExecutor.java
 ---
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.topology;
+
+import java.util.AbstractCollection;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.locks.ReentrantLock;
+import java.util.function.Supplier;
+
+import org.apache.storm.Config;
+import org.apache.storm.state.KeyValueState;
+import org.apache.storm.state.State;
+import org.apache.storm.state.StateFactory;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.base.BaseWindowedBolt;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.windowing.DefaultEvictionContext;
+import org.apache.storm.windowing.Event;
+import org.apache.storm.windowing.EventImpl;
+import org.apache.storm.windowing.WindowLifecycleListener;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Collections.emptyIterator;
+import static org.apache.storm.topology.WindowPartitionCache.CacheLoader;
+import static org.apache.storm.topology.WindowPartitionCache.RemovalCause;
+import static 
org.apache.storm.topology.WindowPartitionCache.RemovalListener;
+
+/**
+ * Wraps a {@link IStatefulWindowedBolt} and handles the execution. Uses 
state and the underlying
+ * checkpointing mechanisms to save the tuples in window to state. The 
tuples are also kept in-memory
+ * by transparently caching the window partitions and checkpointing them 
as needed.
+ */
+public class PersistentWindowedBoltExecutor extends 
WindowedBoltExecutor implements IStatefulBolt {
+private static final Logger LOG = 
LoggerFactory.getLogger(PersistentWindowedBoltExecutor.class);
+private final IStatefulWindowedBolt statefulWindowedBolt;
+private transient TopologyContext topologyContext;
+private transient OutputCollector outputCollector;
+private transient WindowState state;
+private transient boolean stateInitialized;
+private transient boolean prePrepared;
+
+public PersistentWindowedBoltExecutor(IStatefulWindowedBolt bolt) {
+super(bolt);
+statefulWindowedBolt = bolt;
+}
+
+@Override
+public void prepare(Map topoConf, TopologyContext 
context, OutputCollector collector) {
+List registrations = (List) 
topoConf.getOrDefault(Config.TOPOLOGY_STATE_KRYO_REGISTER, new ArrayList<>());
+registrations.add(ConcurrentLinkedQueue.class.getName());
+registrations.add(LinkedList.class.getName());
+registrations.add(AtomicInteger.class.getName());
+registrations.add(EventImpl.class.getName());
+registrations.add(WindowPartition.class.getName());
+registrations.add(DefaultEvictionContext.class.getName());
+topoConf.put(Config.TOPOLOGY_STATE_KRYO_REGISTER, registrations);
+prepare(topoConf, context, collector,
+getWindowState(topoConf, context),
+getPartitionState(topoConf, context),
+getWindowSystemState(topoConf, context));
+}
+
+@Override
+protected void validate(Map topoConf,
+

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130418867
  
--- Diff: examples/storm-kafka-client-examples/pom.xml ---
@@ -42,7 +42,9 @@
 org.apache.storm
 storm-kafka
 ${project.version}
+

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130419415
  
--- Diff: examples/storm-kafka-client-examples/pom.xml ---
@@ -73,19 +75,27 @@
 org.apache.storm
 storm-kafka-client
 ${project.version}
+

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130425918
  
--- Diff: 
external/storm-kafka-client/src/test/java/org/apache/storm/kafka/spout/builders/SingleTopicKafkaSpoutConfiguration.java
 ---
@@ -21,15 +21,10 @@
 import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
 
 import org.apache.kafka.clients.consumer.ConsumerConfig;
--- End diff --

I would call package where this class lives `config.builder` instead of 
`builders`, which is a bit misleading since this is really a configuration 
class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130431738
  
--- Diff: examples/storm-kafka-client-examples/pom.xml ---
@@ -73,19 +75,27 @@
 org.apache.storm
 storm-kafka-client
 ${project.version}
+
 
 
 org.apache.kafka
 ${storm.kafka.artifact.id}
 ${storm.kafka.client.version}
+compile
+
 
 
 org.apache.kafka
 kafka-clients
 ${storm.kafka.client.version}
+compile
+

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130431776
  
--- Diff: examples/storm-kafka-client-examples/pom.xml ---
@@ -73,19 +75,27 @@
 org.apache.storm
 storm-kafka-client
 ${project.version}
+
 
 
 org.apache.kafka
 ${storm.kafka.artifact.id}
 ${storm.kafka.client.version}
+compile
+

[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130428532
  
--- Diff: 
examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/spout/test/KafkaSpoutTopologyMainNamedTopics.java
 ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.test;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.storm.Config;
+import org.apache.storm.StormSubmitter;
+import org.apache.storm.generated.StormTopology;
+import org.apache.storm.kafka.spout.ByTopicRecordTranslator;
+import org.apache.storm.kafka.spout.KafkaSpout;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import 
org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff.TimeInterval;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.trident.KafkaProducerTopology;
+import org.apache.storm.topology.TopologyBuilder;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+public class KafkaSpoutTopologyMainNamedTopics {
+
+private static final String TOPIC_2_STREAM = "test_2_stream";
+private static final String TOPIC_0_1_STREAM = "test_0_1_stream";
+private static final String KAFKA_LOCAL_BROKER = "localhost:9092";
+private static final String TOPIC_0 = "kafka-spout-test";
+private static final String TOPIC_1 = "kafka-spout-test-1";
+private static final String TOPIC_2 = "kafka-spout-test-2";
+
+public static void main(String[] args) throws Exception {
+new KafkaSpoutTopologyMainNamedTopics().runMain(args);
+}
+
+protected void runMain(String[] args) throws Exception {
--- End diff --

Isn't this change removing the ability to run this code in LocalCluster 
mode? I think it is very useful. For example, I use it all the time to run 
these simple test examples from IntelliJ.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130429805
  
--- Diff: 
external/storm-kafka-client/src/test/java/org/apache/storm/kafka/spout/builders/SingleTopicKafkaSpoutConfiguration.java
 ---
@@ -21,15 +21,10 @@
 import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
 
 import org.apache.kafka.clients.consumer.ConsumerConfig;
--- End diff --

I also would call the two getXyz methods in this class createXyz, as they 
are static factory methods. I know that the name was already like that, but 
since we are changing it, we should just make it more conventional.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130417840
  
--- Diff: examples/storm-kafka-client-examples/README.markdown ---
@@ -0,0 +1,10 @@
+## Usage
+This module contains example topologies demonstrating storm-kafka-client 
spout and Trident usage.
+
+The module is built by `mvn clean package`. This will generate the 
`target/storm-kafka-client-examples-VERSION.jar` file. The jar contains all 
dependencies and can be submitted to Storm via the Storm CLI. For example:
--- End diff --

... built running ... the Storm CLI, e.g.:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130422627
  
--- Diff: 
examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/trident/TridentKafkaClientWordCountNamedTopics.java
 ---
@@ -42,72 +42,68 @@
 import org.apache.storm.tuple.Values;
 
 public class TridentKafkaClientWordCountNamedTopics {
+
 private static final String TOPIC_1 = "test-trident";
 private static final String TOPIC_2 = "test-trident-1";
 private static final String KAFKA_LOCAL_BROKER = "localhost:9092";
 
-private KafkaTridentSpoutOpaque 
newKafkaTridentSpoutOpaque() {
-return new KafkaTridentSpoutOpaque<>(newKafkaSpoutConfig());
+private KafkaTridentSpoutOpaque 
newKafkaTridentSpoutOpaque(KafkaSpoutConfig spoutConfig) {
+return new KafkaTridentSpoutOpaque<>(spoutConfig);
 }
 
 private static Func, List> 
JUST_VALUE_FUNC = new JustValueFunc();
 
 /**
- * Needs to be serializable
+ * Needs to be serializable.
  */
 private static class JustValueFunc implements 
Func, List>, Serializable {
+
 @Override
 public List apply(ConsumerRecord record) {
 return new Values(record.value());
 }
 }
 
-protected KafkaSpoutConfig newKafkaSpoutConfig() {
-return KafkaSpoutConfig.builder(KAFKA_LOCAL_BROKER, TOPIC_1, 
TOPIC_2)
-.setProp(ConsumerConfig.GROUP_ID_CONFIG, 
"kafkaSpoutTestGroup_" + System.nanoTime())
-.setProp(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 
200)
-.setRecordTranslator(JUST_VALUE_FUNC, new Fields("str"))
-.setRetry(newRetryService())
-.setOffsetCommitPeriodMs(10_000)
-.setFirstPollOffsetStrategy(EARLIEST)
-.setMaxUncommittedOffsets(250)
-.build();
+protected KafkaSpoutConfig newKafkaSpoutConfig(String 
bootstrapServers) {
+return KafkaSpoutConfig.builder(bootstrapServers, TOPIC_1, TOPIC_2)
+.setProp(ConsumerConfig.GROUP_ID_CONFIG, 
"kafkaSpoutTestGroup_" + System.nanoTime())
+.setProp(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 200)
+.setRecordTranslator(JUST_VALUE_FUNC, new Fields("str"))
+.setRetry(newRetryService())
+.setOffsetCommitPeriodMs(10_000)
+.setFirstPollOffsetStrategy(EARLIEST)
+.setMaxUncommittedOffsets(250)
+.build();
 }
 
 protected KafkaSpoutRetryService newRetryService() {
 return new KafkaSpoutRetryExponentialBackoff(new 
TimeInterval(500L, TimeUnit.MICROSECONDS),
-TimeInterval.milliSeconds(2), Integer.MAX_VALUE, 
TimeInterval.seconds(10));
+TimeInterval.milliSeconds(2), Integer.MAX_VALUE, 
TimeInterval.seconds(10));
 }
 
 public static void main(String[] args) throws Exception {
 new TridentKafkaClientWordCountNamedTopics().run(args);
 }
 
-protected void run(String[] args) throws AlreadyAliveException, 
InvalidTopologyException, AuthorizationException, InterruptedException {
-if (args.length > 0 && Arrays.stream(args).anyMatch(option -> 
option.equals("-h"))) {
-System.out.printf("Usage: java %s [%s] [%s] [%s] [%s]\n", 
getClass().getName(),
-"broker_host:broker_port", "topic1", "topic2", 
"topology_name");
-} else {
-final String brokerUrl = args.length > 0 ? args[0] : 
KAFKA_LOCAL_BROKER;
-final String topic1 = args.length > 1 ? args[1] : TOPIC_1;
-final String topic2 = args.length > 2 ? args[2] : TOPIC_2;
-
-System.out.printf("Running with broker_url: [%s], topics: [%s, 
%s]\n", brokerUrl, topic1, topic2);
-
-Config tpConf = new Config();
-tpConf.setDebug(true);
-tpConf.setMaxSpoutPending(5);
-
-// Producers
-StormSubmitter.submitTopology(topic1 + "-producer", tpConf, 
KafkaProducerTopology.newTopology(brokerUrl, topic1));
-StormSubmitter.submitTopology(topic2 + "-producer", tpConf, 
KafkaProducerTopology.newTopology(brokerUrl, topic2));
-// Consumer
-StormSubmitter.submitTopology("topics-consumer", tpConf, 
TridentKafkaConsumerTopology.newTopology(newKafkaTridentSpoutOpaque()));
-
-// Print results to console, which also causes the print 
filter in the consumer topology to print the results in the worker log
   

[GitHub] storm pull request #2249: WIP: STORM-2648/STORM-2357: Add storm-kafka-client...

2017-07-31 Thread srdo
GitHub user srdo opened a pull request:

https://github.com/apache/storm/pull/2249

WIP: STORM-2648/STORM-2357: Add storm-kafka-client support for 
at-most-onc…

…e processing and a toggle for whether messages should be emitted with a 
message id when not using at-least-once

See https://issues.apache.org/jira/browse/STORM-2357 and 
https://issues.apache.org/jira/browse/STORM-2648.

I'd like to get some opinions on whether this approach is a good idea, or 
whether I've overlooked a better option, before finishing this up with some 
tests. I don't love that we'll end up with 3 different committing behaviors.

In 2357 it was noted that the spout doesn't currently support true 
at-most-once, because using Kafka's auto commit option leaves the possibility 
that the spout receives a tuple, emits it to the topology, crashes and 
recovers, and then receives and emits the same tuple. The linked issue suggests 
solving this by committing polled offsets before emitting them to the topology, 
which is an option added here.

2648 notes that there is currently no way to make Storm track messages when 
using auto commit with this spout. This prevents Storm UI from showing the 
complete latency for the spout, and I would assume also prevents max spout 
pending from having an effect. I've added a toggle to KafkaSpoutConfig to force 
the spout to emit messages with message ids, even when using auto commit or the 
at-most-once option. The spout does nothing on ack or fail when not doing 
at-least-once.

I'd like to keep the spout config simple for the user, so I've added a 
processing guarantee setting corresponding to the standard at-least-once code 
path, the path that uses auto commit, and the path that commits offsets before 
emitting any tuples. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srdo/storm STORM-2648

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2249


commit 4fc4b71f9720f506be20740f780dfef93f2dd036
Author: Stig Rohde Døssing 
Date:   2017-07-31T18:26:55Z

STORM-2648/STORM-2357: Add storm-kafka-client support for at-most-once 
processing and a toggle for whether messages should be emitted with a message 
id when not using at-least-once




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread P. Taylor Goetz
I’m all for getting rid of the current process for CHANGELOG. My only concern 
with any JIRA-based solution is that we would need to be very good about 
setting the “Fix Version” field properly when merging a patch and updating the 
associated ticket. In the past I’ve seen a lot of patches merged without the 
associated JIRA updated. If we’re going to rely on JIRA as the source of truth 
for change logs, we need to be very conscientious about updating JIRA as 
necessary.

-Taylor

> On Jul 31, 2017, at 10:06 AM, Bobby Evans  wrote:
> 
> I am happy to switch as soon as someone has a working alternative.  The big 
> thing in my opinion is giving end users a clear list of all of the changes 
> that went into a release so they can review it for themselves.  However we do 
> it is fine with me as the current changelog file leaves a lot to be desired. 
> I personally would be fine with us updating the web page/release notes to 
> have a link to a JIRA query in it as a starting point.
> 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%20priority%20DESC
> or
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%20priority%20DESC
> for example.
> Later on we can start looking at more complex alternatives that run the above 
> query and join it with the git revision history and possibly pull requests to 
> give a more complete view for what has happened.
> 
> - Bobby
> 
> 
> On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim  
> wrote:
> 
> Let me also put long ago discussion about this:
> 
> http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+changelogs
> 
> 
> In my view, from long ago discussion, Haohui and Bobby agreed to not
> maintain CHANGELOG by hand. Haohui also suggested how to get them
> automatically, whereas I just would want to remove it, but that's also OK)
> We didn’t get agreement clearly about removing CHANGELOG but at least saw
> our needs to automate it.
> 
> 
> And in current discussion, again in my view, Roshan, Hugo, Stig agree to
> remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG, so 3
> PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
> least 2 more PMC members to not maintain CHANGELOG manually.
> 
> 
> I will initiate a VOTE thread if we need to. Again, release managers would
> be affected by this change so I would want to hear Taylor’s opinion before
> going forward, but this is clear pain point for mergers so will initiate a
> VOTE thread in several days (at least in this week) if Taylor doesn’t put
> opinion on this or misses this discussion.
> 
> 
> Thanks,
> 
> Jungtaek Lim (HeartSaVioR)
> 
> 2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:
> 
>> correction: other projects -> *some* other projects, though they're
>> popular projects (including in competition)
>> 
>> 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
>> 
>>> I'm happy that there're other guys having same difficult and sharing same
>>> feeling.
>>> 
>>> This discussion has been initiating several times (from me) and getting
>>> some +1s for each thread but didn't reach to actual work.
>>> 
>>> We already utilize JIRA, and I'm subscribing issues@ and taking care of
>>> issues forgot to mark resolve and/or labeling fixed versions.
>>> It may sounds ideal for us to let reporters caring about their issues,
>>> but committers can also help that, and in fact merger is in responsible to
>>> take care of resolving the issue, so irrelevant to contributor for this
>>> side.
>>> 
>>> My other consideration is that which thing is convenient for release
>>> manager. Taylor took the release manager all the time (thanks for the great
>>> work!) and it is directly related to release announcement so would like to
>>> hear his opinion. If it is more convenient or he think he can tolerate
>>> that, we can just go on.
>>> 
>>> Please note that other projects don't use merge commit. Most of the time
>>> they squash commits in PR into one, labeling commit title as JIRA issue,
>>> making commit list just as CHANGELOG. That's another thing we discussed
>>> earlier and I think we need to discuss again, but that can be discussed
>>> from another thread.
>>> 
>>> Regarding maintaining contributors: easy to explain. Just take a look at
>>> what Spark has been doing. Some other projects follow the approach as well.
>>> 
>>> We can run the script to extract authors of git commits, and just " |
>>> sort | uniq", and done. Pulling assigner from JIRA issue may be more
>>> accurate, since it requires actual account whereas author information in
>>> commit is not strictly required to identify them. We can apply hybrid
>>> approach as well, but for starter just following git commits looks OK to me.
>>> 
>>> IMHO they don't 

[GitHub] storm issue #2241: STORM-2306 : Messaging subsystem redesign.

2017-07-31 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2241
  
@roshannaik sorry I just reread your other post and it looks like I missed 
some of your questions.

`topology.producer.batch.size` and `topology.flush.tuple.freq.millis` are 
what ever the default values are.  I didn't modify them.  I am happy to explore 
the settings there too, but I just have not spent time on it, it just may take 
me a few days as I have other work to do too.

I agree that when we throttle the spout it is not able to truly gauge the 
maximum throughput possible.  TVL was not designed for that is why the latency 
measurements are based off of a simulation of when a message would have been 
inserted into something like kafka instead of when the message arrived in the 
spout.  If you do set the desired throughput higher than what the topology can 
handle you are able to get an idea of what the maximum throughput is, but other 
measurements are likely to be off so it is of limited value for what I was 
trying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2242: [STORM-1347] display topology version on UI

2017-07-31 Thread Ethanlm
Github user Ethanlm commented on the issue:

https://github.com/apache/storm/pull/2242
  
storm-cassandra failed. Should be not related to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread rahuljain373
Github user rahuljain373 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2248#discussion_r130399722
  
--- Diff: 
external/storm-jdbc/src/test/java/org/apache/storm/jdbc/common/JdbcClientTest.java
 ---
@@ -80,6 +85,24 @@ public void testInsertAndSelect() {
 Assert.assertEquals(rows, selectedRows);
 }
 
+@Rule
+public ExpectedException thrown = ExpectedException.none();
+
+@Test
+public void testInsertConnectionError() {
+
+ConnectionProvider connectionProvider = new 
MockConnectionProvider(null);
+this.client = new JdbcClient(connectionProvider, 60);
+
+List row = createRow(1, "frank");
+List rows  = new ArrayList();
+rows.add(row);
+String query  = "insert into user_details values(?,?,?)";
+
+thrown.expect(MultipleFailureException.class);
--- End diff --

Yup, RuntimeException is expected with earlier code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2233: Storm 2258: Streams api - support CoGroupByKey

2017-07-31 Thread kamleshbhatt
Github user kamleshbhatt commented on the issue:

https://github.com/apache/storm/pull/2233
  
Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2241: STORM-2306 : Messaging subsystem redesign.

2017-07-31 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2241
  
Issues that need to be addressed to remove `max.spout.pending` (sorry about 
the wall of text).

1. Worker to Worker backpressure.  The netty client and server have no 
backpressure in them.  If for some reason the network cannot keep up the netty 
client will continue to buffer messages internally in netty until you get 
GC/OOM issues like described in the design doc.  We ran into this when using 
the ZK based backpressure instead of `max.spout.pending` on a topology with a 
very high throughput and there was a network glitch.
2. Single choke point getting messages off of a worker.  Currently there is 
a single disruptor queue and a single thread that is responsible for routing 
all of the messages from within the worker to external workers.  If any of the 
clients sending messages to other workers did block (backpressure) it would 
stop all other messages from leaving the worker.  In practice this negates the 
"a choke up in a single bolt will not put the brakes on all the topology 
components" from your design.  And as the number of workers in a topology grows 
the impact when this does happen will grow too.
3. Load aware grouping needs to be back on by default and probably improved 
some.  Again one of the stated goals of your design for backpressure is "a 
choke up in a single bolt will not put the brakes on all the topology 
components".  Most of the existing groupings effectively negate this, as each 
upstream component will on average send a message to all down stream components 
relatively frequently.  Unless we can route around these backed up components 
one totally blocked bolt will stop all of a topology from functioning.  If you 
say the throughput drops by 20% when this is on, then we probably want to do 
some profiling and understand why this happens.  
4. Timer Tick Tuples.  There are a number of things that rely on timer 
ticks.  Both system critical things, like ackers and spouts timing out tuples 
and metrics (at least for now); and code within bolts and spouts that want to 
do something periodically without having a separate thread.  Right now there is 
a single thread that does these ticks for each worker.  In the past it was a 
real pain to try and debug failures when it would block trying to insert a 
message into a queue that was full.  Metrics stopped flowing.  Spouts stopped 
timing things out, and other really odd things happened, that I cannot remember 
off the top of my head.  I don't want to have to go back to that.
5. Cycles (non DAG topologies).  I know we strongly discourage users from 
building these types of things, and quite honestly I am happy to deprecate 
support for them, but we currently do support them.  So if we are going to stop 
supporting it lets get on with deprecating it in 1.x with clear warnings etc.  
Otherwise when this goes in there will be angry customers with deadlocked 
topologies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2241: STORM-2306 : Messaging subsystem redesign.

2017-07-31 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2241
  
@roshannaik I am happy to retry it with max spout pending disabled, but in 
my testing I found that disabling it negatively impacted the performance. (My 
initial tests prior to modifying TVL to have lower parallelism) showed that it 
was having a lot of trouble with GC slowing it down.  It could not handle 
150,000 sentences per second, and would max out at about 120,000 to 130,000

```
15 1 -c topology.workers=1 -c topology.acker.executors=2
```

But when I added in a maximum of 500

```
15 1 -c topology.workers=1 -c topology.acker.executors=2 -c 
topology.max.spout.pending=500
```

it was able to easily keep up.

Also later on I was trying to tune it to an optimal value, and I tried 
several different values for it.

```
30 1 -c topology.workers=1 -c topology.acker.executors=1 -c 
topology.max.spout.pending=1000 3 wc-test 1 1 1
```
which maxed out the throughput at abut 230,000 sentences per sec but 
setting it to 2000

```
30 1 -c topology.workers=1 -c topology.acker.executors=1 -c 
topology.max.spout.pending=2000 3 wc-test 1 1 1
```

dropped that maximum to 100,000. At this time I didn't spend the time to 
really dig in and see what the bottleneck was, like I did before so I cannot 
say if it was GC or not.

I am also opposed to removing `max.spout.pending` entirely until several 
issues with its removal can be addressed, but I'll address that in a separate 
post as it is kind of long and complicated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Stig Rohde Døssing
Maybe we could adapt the script Kafka uses? It looks simple enough
https://github.com/apache/kafka/blob/trunk/release_notes.py. I think the
release notes they have are very readable.

2017-07-31 16:06 GMT+02:00 Bobby Evans :

> I am happy to switch as soon as someone has a working alternative.  The
> big thing in my opinion is giving end users a clear list of all of the
> changes that went into a release so they can review it for themselves.
> However we do it is fine with me as the current changelog file leaves a lot
> to be desired. I personally would be fine with us updating the web
> page/release notes to have a link to a JIRA query in it as a starting point.
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%
> 20priority%20DESC
> or
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%
> 20priority%20DESC
> for example.
> Later on we can start looking at more complex alternatives that run the
> above query and join it with the git revision history and possibly pull
> requests to give a more complete view for what has happened.
>
> - Bobby
>
>
> On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim 
> wrote:
>
> Let me also put long ago discussion about this:
>
> http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+
> DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+
> changelogs
>
>
> In my view, from long ago discussion, Haohui and Bobby agreed to not
> maintain CHANGELOG by hand. Haohui also suggested how to get them
> automatically, whereas I just would want to remove it, but that's also OK)
> We didn’t get agreement clearly about removing CHANGELOG but at least saw
> our needs to automate it.
>
>
> And in current discussion, again in my view, Roshan, Hugo, Stig agree to
> remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG, so 3
> PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
> least 2 more PMC members to not maintain CHANGELOG manually.
>
>
> I will initiate a VOTE thread if we need to. Again, release managers would
> be affected by this change so I would want to hear Taylor’s opinion before
> going forward, but this is clear pain point for mergers so will initiate a
> VOTE thread in several days (at least in this week) if Taylor doesn’t put
> opinion on this or misses this discussion.
>
>
> Thanks,
>
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:
>
> > correction: other projects -> *some* other projects, though they're
> > popular projects (including in competition)
> >
> > 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
> >
> >> I'm happy that there're other guys having same difficult and sharing
> same
> >> feeling.
> >>
> >> This discussion has been initiating several times (from me) and getting
> >> some +1s for each thread but didn't reach to actual work.
> >>
> >> We already utilize JIRA, and I'm subscribing issues@ and taking care of
> >> issues forgot to mark resolve and/or labeling fixed versions.
> >> It may sounds ideal for us to let reporters caring about their issues,
> >> but committers can also help that, and in fact merger is in responsible
> to
> >> take care of resolving the issue, so irrelevant to contributor for this
> >> side.
> >>
> >> My other consideration is that which thing is convenient for release
> >> manager. Taylor took the release manager all the time (thanks for the
> great
> >> work!) and it is directly related to release announcement so would like
> to
> >> hear his opinion. If it is more convenient or he think he can tolerate
> >> that, we can just go on.
> >>
> >> Please note that other projects don't use merge commit. Most of the time
> >> they squash commits in PR into one, labeling commit title as JIRA issue,
> >> making commit list just as CHANGELOG. That's another thing we discussed
> >> earlier and I think we need to discuss again, but that can be discussed
> >> from another thread.
> >>
> >> Regarding maintaining contributors: easy to explain. Just take a look at
> >> what Spark has been doing. Some other projects follow the approach as
> well.
> >>
> >> We can run the script to extract authors of git commits, and just " |
> >> sort | uniq", and done. Pulling assigner from JIRA issue may be more
> >> accurate, since it requires actual account whereas author information in
> >> commit is not strictly required to identify them. We can apply hybrid
> >> approach as well, but for starter just following git commits looks OK
> to me.
> >>
> >> IMHO they don't feel proud strongly only they're listed in contributors.
> >> Looking at contribution graph works better in this case, given that it
> also
> >> shows commit count and lines of change. (regardless of accuracy)
> >> It may give more proud to mention them as release announce. It will 

[GitHub] storm issue #2242: [STORM-1347] display topology version on UI

2017-07-31 Thread Ethanlm
Github user Ethanlm commented on the issue:

https://github.com/apache/storm/pull/2242
  
Thanks @HeartSaVioR  Regenerated them


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Writing orc files with storm via java API

2017-07-31 Thread Bobby Evans
It should be possible to make this work, but it is not going to be simple.  The 
real issue is the format of the orc file.  It is not one record at a time, like 
CSV or other supported formats are.  Sadly this is currently an assumption with 
the AbstractHdfsBolt.
https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/format/RecordFormat.java
So to support it we would need to make some modifications, not impossible, just 
not a drop in replacement.  If this is something you want to tackle and 
contribute back I think we would all love it.  You might also run into some 
issues with metadata for the format being written at the end of the file.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
I am not totally sure how easy it is to recover an ORC file if that footer is 
missing because a worker crashed.  You might end up with data loss in some 
cases if you are not extremely careful.  You might also need to modify the ORC 
APIs themselves to be able to support storing/recovering the metadata in an 
external location for recovery to truly fix it, and then store them in ZK on a 
flush until the file is rotated.

The Trident HDFState
https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/trident/HdfsState.java
might be a more appropriate place to start, as the updated state is written out 
in micro batches, but you still have to deal with the footer issues, as trident 
really cares about exactly once processing.

So overall it is not a simple problem, and relying on an external server like 
hive would make it a lot simpler.


- Bobby


On Tuesday, July 25, 2017, 8:38:42 AM CDT, Igor Kuzmenko  
wrote:

Is there any implementation of storm bolt which can write files to HDFS in
ORC format, without using Hive Streaming API?
I've found java API for writing ORC files 
and I'm guessing is there any existing Hive bolts that uses it or any plans
to create such?


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread RPCMoritz
Github user RPCMoritz commented on a diff in the pull request:

https://github.com/apache/storm/pull/2248#discussion_r130360205
  
--- Diff: 
external/storm-jdbc/src/test/java/org/apache/storm/jdbc/common/JdbcClientTest.java
 ---
@@ -92,3 +115,27 @@ public void cleanup() {
 client.executeSql("drop table " + tableName);
 }
 }
+
+class MockConnectionProvider implements ConnectionProvider {
+
+private Map configMap;
+
+public MockConnectionProvider(Map mockCPConfigMap) {
--- End diff --

The class name could be chosen to be more descriptive -- 
ThrowingConnectionProvider?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread RPCMoritz
Github user RPCMoritz commented on a diff in the pull request:

https://github.com/apache/storm/pull/2248#discussion_r130362674
  
--- Diff: 
external/storm-jdbc/src/test/java/org/apache/storm/jdbc/common/JdbcClientTest.java
 ---
@@ -80,6 +85,24 @@ public void testInsertAndSelect() {
 Assert.assertEquals(rows, selectedRows);
 }
 
+@Rule
+public ExpectedException thrown = ExpectedException.none();
+
+@Test
+public void testInsertConnectionError() {
+
+ConnectionProvider connectionProvider = new 
MockConnectionProvider(null);
+this.client = new JdbcClient(connectionProvider, 60);
+
+List row = createRow(1, "frank");
+List rows  = new ArrayList();
+rows.add(row);
+String query  = "insert into user_details values(?,?,?)";
+
+thrown.expect(MultipleFailureException.class);
--- End diff --

Is this specific enough to fail with the previous code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread RPCMoritz
Github user RPCMoritz commented on a diff in the pull request:

https://github.com/apache/storm/pull/2248#discussion_r130360350
  
--- Diff: 
external/storm-jdbc/src/test/java/org/apache/storm/jdbc/common/JdbcClientTest.java
 ---
@@ -92,3 +115,27 @@ public void cleanup() {
 client.executeSql("drop table " + tableName);
 }
 }
+
+class MockConnectionProvider implements ConnectionProvider {
+
+private Map configMap;
+
+public MockConnectionProvider(Map mockCPConfigMap) {
+this.configMap = mockCPConfigMap;
+}
+
+@Override
+public synchronized void prepare() {
+// To be Implemented
+}
+
+@Override
+public Connection getConnection() {
+throw new RuntimeException("connection error");
+}
+
+@Override
+public void cleanup() {
+// To be Implemented
--- End diff --

this comment isn't necessary/misguiding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread RPCMoritz
Github user RPCMoritz commented on a diff in the pull request:

https://github.com/apache/storm/pull/2248#discussion_r130360392
  
--- Diff: 
external/storm-jdbc/src/test/java/org/apache/storm/jdbc/common/JdbcClientTest.java
 ---
@@ -92,3 +115,27 @@ public void cleanup() {
 client.executeSql("drop table " + tableName);
 }
 }
+
+class MockConnectionProvider implements ConnectionProvider {
+
+private Map configMap;
+
+public MockConnectionProvider(Map mockCPConfigMap) {
+this.configMap = mockCPConfigMap;
+}
+
+@Override
+public synchronized void prepare() {
+// To be Implemented
--- End diff --

this comment isn't necessary/misguiding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread RPCMoritz
Github user RPCMoritz commented on a diff in the pull request:

https://github.com/apache/storm/pull/2248#discussion_r130361066
  
--- Diff: 
external/storm-jdbc/src/main/java/org/apache/storm/jdbc/common/JdbcClient.java 
---
@@ -223,13 +237,25 @@ private void 
setPreparedStatementParams(PreparedStatement preparedStatement, Lis
 }
 }
 
-private void closeConnection(Connection connection) {
+private void closeConnection(Connection connection, Exception 
finalException) {
 if (connection != null) {
 try {
 connection.close();
 } catch (SQLException e) {
-throw new RuntimeException("Failed to close connection", 
e);
+if (finalException != null)
+{
+LOG.error("Failed to close connection");
--- End diff --

We should still log ```e``` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Bobby Evans
I am happy to switch as soon as someone has a working alternative.  The big 
thing in my opinion is giving end users a clear list of all of the changes that 
went into a release so they can review it for themselves.  However we do it is 
fine with me as the current changelog file leaves a lot to be desired. I 
personally would be fine with us updating the web page/release notes to have a 
link to a JIRA query in it as a starting point.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20STORM%20AND%20fixVersion%20in%20(1.0.4)%20ORDER%20BY%20priority%20DESC
or
https://issues.apache.org/jira/issues/?jql=project%20%3D%20STORM%20AND%20fixVersion%20in%20(1.1.1)%20ORDER%20BY%20priority%20DESC
for example.
Later on we can start looking at more complex alternatives that run the above 
query and join it with the git revision history and possibly pull requests to 
give a more complete view for what has happened.

- Bobby


On Monday, July 31, 2017, 1:42:11 AM CDT, Jungtaek Lim  
wrote:

Let me also put long ago discussion about this:

http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+changelogs


In my view, from long ago discussion, Haohui and Bobby agreed to not
maintain CHANGELOG by hand. Haohui also suggested how to get them
automatically, whereas I just would want to remove it, but that's also OK)
We didn’t get agreement clearly about removing CHANGELOG but at least saw
our needs to automate it.


And in current discussion, again in my view, Roshan, Hugo, Stig agree to
remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG, so 3
PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
least 2 more PMC members to not maintain CHANGELOG manually.


I will initiate a VOTE thread if we need to. Again, release managers would
be affected by this change so I would want to hear Taylor’s opinion before
going forward, but this is clear pain point for mergers so will initiate a
VOTE thread in several days (at least in this week) if Taylor doesn’t put
opinion on this or misses this discussion.


Thanks,

Jungtaek Lim (HeartSaVioR)

2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:

> correction: other projects -> *some* other projects, though they're
> popular projects (including in competition)
>
> 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
>
>> I'm happy that there're other guys having same difficult and sharing same
>> feeling.
>>
>> This discussion has been initiating several times (from me) and getting
>> some +1s for each thread but didn't reach to actual work.
>>
>> We already utilize JIRA, and I'm subscribing issues@ and taking care of
>> issues forgot to mark resolve and/or labeling fixed versions.
>> It may sounds ideal for us to let reporters caring about their issues,
>> but committers can also help that, and in fact merger is in responsible to
>> take care of resolving the issue, so irrelevant to contributor for this
>> side.
>>
>> My other consideration is that which thing is convenient for release
>> manager. Taylor took the release manager all the time (thanks for the great
>> work!) and it is directly related to release announcement so would like to
>> hear his opinion. If it is more convenient or he think he can tolerate
>> that, we can just go on.
>>
>> Please note that other projects don't use merge commit. Most of the time
>> they squash commits in PR into one, labeling commit title as JIRA issue,
>> making commit list just as CHANGELOG. That's another thing we discussed
>> earlier and I think we need to discuss again, but that can be discussed
>> from another thread.
>>
>> Regarding maintaining contributors: easy to explain. Just take a look at
>> what Spark has been doing. Some other projects follow the approach as well.
>>
>> We can run the script to extract authors of git commits, and just " |
>> sort | uniq", and done. Pulling assigner from JIRA issue may be more
>> accurate, since it requires actual account whereas author information in
>> commit is not strictly required to identify them. We can apply hybrid
>> approach as well, but for starter just following git commits looks OK to me.
>>
>> IMHO they don't feel proud strongly only they're listed in contributors.
>> Looking at contribution graph works better in this case, given that it also
>> shows commit count and lines of change. (regardless of accuracy)
>> It may give more proud to mention them as release announce. It will lead
>> contributors to play consistently, trying to participate and be mentioned
>> for releases as many as possible. IMO Spark built a great strategy for this
>> side, and if we all think it is great, why not follow?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 7월 28일 (금) 오전 6:58, Stig Rohde Døssing 님이
>> 작성:
>>
>>> We already have to keep JIRA updated, and keeping JIRA consistent is
>>> easier
>>> since there 

Re: [Propose] move website repository from svn to git

2017-07-31 Thread Bobby Evans
+1
I am fine with moving to git, but I would like it to be a different repo.
Our current repo is at least 160MB already (which is a lot to download) but 
nothing compared the the web site that has lots and lots of things checked in 
(I estimate it at about 1.5GB on an older version I have locally)


- Bobby


On Monday, July 31, 2017, 1:58:03 AM CDT, Xin Wang  
wrote:

+1 for moving to git.  - Xin



2017-07-31 14:54 GMT+08:00 Jungtaek Lim :

> Bump. I think this is worth to address soon, since some contributors
> occasionally submit patches regarding documentations.
> Personally SVN is no longer feel convenient to use. If we all feel the
> same, let's change then.
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2017년 7월 13일 (목) 오전 9:16, Jungtaek Lim 님이 작성:
>
> > Maybe we could try out Gitbox, though every committers should join their
> > Github accounts to 'apache' group and enable 2FA.
> >
> > 2017년 7월 13일 (목) 오전 8:38, Jungtaek Lim 님이 작성:
> >
> >> Did we render webpage with asf-site branch? I didn't recognize it.
> >>
> >> Yes I meant separate git repository, like 'storm-site'. I'm happy I'm
> not
> >> the only one who feels inconvenient with SVN repo.
> >> Would it better to initiate VOTE for this?
> >>
> >> Thanks,
> >> Jungtaek Lim (HeartSaVioR)
> >>
> >> 2017년 7월 13일 (목) 오전 4:30, P. Taylor Goetz 님이 작성:
> >>
> >>> We were using git before, then a year ago moved back to subversion to
> >>> implement versioned documentation [1].
> >>>
> >>> If we do decide to move back to git for this, I would recommend using a
> >>> separate git repository so it doesn’t bloat our main code repository.
> When
> >>> generating javadoc for a new version, the svn commit to publish the
> site
> >>> can take around 20 minutes.
> >>>
> >>> -Taylor
> >>>
> >>> > On Jul 12, 2017, at 10:33 AM, Jungtaek Lim 
> wrote:
> >>> >
> >>> > Hi devs,
> >>> >
> >>> > I think we discussed moving website repository from SVN to GIT from a
> >>> long
> >>> > time ago, and we were OK on that, but action was not taken.
> >>> >
> >>> > Now I can see number of projects (Spark, Kafka, Beam, maybe more) are
> >>> using
> >>> > separate GIT repository for website.
> >>> > Although we may still need to have version specific document (doc
> >>> > directory) from code repository and copy Jekyll build result to
> website
> >>> > repo, anyone can look at the whole website code and craft pull
> >>> requests to
> >>> > help us. Git would be more convenient for ourselves than SVN (since
> >>> we're
> >>> > maintaining Storm from GIT).
> >>> >
> >>> > So I'd like to propose having a new repository 'storm-website' or
> >>> > 'storm-site' with 'asf-site' as default branch, and move SVN contents
> >>> to
> >>> > GIT.
> >>> > (Sure we need to ask INFRA for helping Storm website to be rendered
> >>> from a
> >>> > new GIT repo.)
> >>> >
> >>> > What do you think?
> >>> >
> >>> > Thanks,
> >>> > Jungtaek Lim (HeartSaVioR)
> >>>
> >>>
>



-- 
Thanks,
Xin

Re: [DISCUSS] Ideas for resolving storm-drpc-server compilation issue on IDE

2017-07-31 Thread Bobby Evans
Those look reasonable to me.


- Bobby


On Monday, July 31, 2017, 2:22:47 AM CDT, Jungtaek Lim  
wrote:

I agreed to minimize the target of shade & relocation artifacts minimal as
possible, but as we shaded almost everything (meaning non-relocation will
affect user experience) so may need to find exhaustive set of troublesome
artifacts and relocate at least them. (Maybe union of everyone's lists?)

For me Guava, HttpClient, Netty (maybe no need to shade for now if we don't
plan to upgrade to 4.x: package name differs) is in my list.

Would be better to initiate poll or discussion with separate thread?

- Jungtaek Lim (HeartSaVioR)

2017년 7월 20일 (목) 오전 2:27, Bobby Evans 님이 작성:

> I am fine with a separate project for relocated dependencies (or even just
> separate packages, you do a maven install of them and not include them in
> the IDE at all).  Shading still has some drawbacks, but I think in a few
> cases it makes since.  I would prefer it if we picked a very small number
> of dependencies that cause people issues and just shade those.  Guava is
> the big one that I worry about. Netty is a possibility and I think asm
> would be another, but it is a transitive dependency so it would require us
> with our own version of kryo exposing the kryo API but pulling in a shaded
> asm.
> The servlet-api concerns me, but it looks like it is tied to the
> IHttpCredentialsPlugin which should move to the server package anyways.
>
> The rest I am not concerned about, are things that are exposed to end
> users, or are for test and not actually shipped.
> $ mvn dependecy:tree...
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ storm-client
> ---
> [INFO] org.apache.storm:storm-client:jar:2.0.0-SNAPSHOT
> [INFO] +- uk.org.lidalia:sysout-over-slf4j:jar:1.0.2:compile
> [INFO] +- org.slf4j:slf4j-api:jar:1.7.21:compile
> [INFO] +- org.apache.logging.log4j:log4j-api:jar:2.8.2:compile
> [INFO] +- org.apache.logging.log4j:log4j-core:jar:2.8.2:compile
> [INFO] +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.8.2:compile
> [INFO] +- org.slf4j:log4j-over-slf4j:jar:1.6.6:compile
> [INFO] +- com.google.guava:guava:jar:16.0.1:compile
> [INFO] +- org.apache.thrift:libthrift:jar:0.9.3:compile
> [INFO] |  \- org.apache.httpcomponents:httpcore:jar:4.4.1:compile
> [INFO] +- commons-io:commons-io:jar:2.5:compile
> [INFO] +- commons-lang:commons-lang:jar:2.5:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.2:compile
> [INFO] +- com.lmax:disruptor:jar:3.3.2:compile
> [INFO] +- com.googlecode.json-simple:json-simple:jar:1.1:compile
> [INFO] +- org.yaml:snakeyaml:jar:1.11:compile
> [INFO] +- io.netty:netty:jar:3.9.0.Final:compile
> [INFO] +- com.esotericsoftware:kryo:jar:3.0.3:compile
> [INFO] |  +- com.esotericsoftware:reflectasm:jar:1.10.1:compile
> [INFO] |  |  \- org.ow2.asm:asm:jar:5.0.3:compile
> [INFO] |  +- com.esotericsoftware:minlog:jar:1.3.0:compile
> [INFO] |  \- org.objenesis:objenesis:jar:2.1:compile
> [INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
> [INFO] |  \- jline:jline:jar:0.9.94:compile
> [INFO] +- org.apache.curator:curator-framework:jar:2.12.0:compile
> [INFO] +- org.jgrapht:jgrapht-core:jar:0.9.0:compile
> [INFO] +- javax.servlet:servlet-api:jar:2.5:compile
> [INFO] +- org.apache.httpcomponents:httpclient:jar:4.3.3:compile
> [INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
> [INFO] |  \- commons-codec:commons-codec:jar:1.6:compile
> [INFO] +- org.apache.curator:curator-client:jar:2.12.0:compile
> [INFO] +- junit:junit:jar:4.11:test
> [INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
> [INFO] +- org.mockito:mockito-core:jar:1.9.5:test
> [INFO] \- org.hamcrest:hamcrest-library:jar:1.3:test
> - Bobby
>
>
> On Wednesday, July 12, 2017, 9:45:43 AM CDT, Jungtaek Lim <
> kabh...@gmail.com> wrote:
>
> I'd like to bump on this again, since we have a few huge issues for Storm
> 2.0.0, and this issue is a kind of regression and effectively blocker.
> (Please note that current master branch removes shading for some libraries
> to make IDE happy.)
>
> At that time I didn't consider option 2 as possible solution, but now Flink
> is going with this option, and I can't find reason to not doing this.
>
> * Repository: https://github.com/apache/flink-shaded
> * Discussion thread:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Changing-Flink-s-shading-model-td17419.html
>
> Thought?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 3월 31일 (금) 오후 3:12, Jungtaek Lim 님이 작성:
>
> > Bobby,
> >
> > I've worked on separating worker and daemon classpath.
> >
> > - Issue: STORM-2441: Break down 'storm-core' to extract client (worker)
> > artifacts 
> > - PR: https://github.com/apache/storm/pull/2034
> >
> > I don't address your suggestion about "classpath selection" and "hiding
> > local mode". Please file issues if you would like to address.

[GitHub] storm issue #2243: STORM-2658: Extract storm-kafka-client examples to storm-...

2017-07-31 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2243
  
+1 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2243: STORM-2658: Extract storm-kafka-client examples to...

2017-07-31 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2243#discussion_r130352763
  
--- Diff: examples/storm-kafka-client-examples/README.markdown ---
@@ -0,0 +1,10 @@
+## Usage
+This module contains example topologies demonstrating storm-kafka-client 
spout and Trident usage.
+
+The module is built by `mvn clean package`. This will generate the 
`target/storm-kafka-client-examples-VERSION.jar` file. The jar contains all 
dependencies and can be submitted to Storm via the Storm CLI. For example:
+```
+storm jar storm-kafka-client-examples-2.0.0-SNAPSHOT.jar 
org.apache.storm.kafka.spout.test.KafkaSpoutTopologyMainNamedTopics
+```
+will submit the topologies set up by KafkaSpoutTopologyMainNamedTopics to 
Storm.
+
+Note that this example produces a jar containing all dependencies for ease 
of use. In a production environment you may want to reduce the jar size by 
extracting some dependencies (e.g. org.apache.kafka:kafka-clients) from the 
jar. You can do this by setting the dependencies you don't want to include in 
the jars to `provided` scope, and then manually copying the dependencies to 
your Storm extlib directory.
--- End diff --

Thanks, I didn't know about this flag. It's much better, will replace 
references to extlib.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2247: Force to make some files in final-package use unix line e...

2017-07-31 Thread cluo512
Github user cluo512 commented on the issue:

https://github.com/apache/storm/pull/2247
  
Hi Srdo.
Thank you for your advice. 
I create a issue on [https://issues.apache.org/jira/browse/STORM-2664]. And 
I will rename this PR later.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2247: Force to make some files in final-package use unix line e...

2017-07-31 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2247
  
Hi Cluo. 

This looks good. I checked, and there doesn't seem to be an issue running 
the powershell files with LF EOL, so I don't think there's a reason to keep 
them as CRLF.

Could you open an issue on https://issues.apache.org/jira and rename this 
PR and commit so it references the JIRA number? See for example 
https://github.com/apache/storm/pull/2248/commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread rahuljain373
GitHub user rahuljain373 reopened a pull request:

https://github.com/apache/storm/pull/2248

STORM-2028: Fix for uprooting the JDBC client exceptions in case of s…

Fix for uprooting the JDBC client exceptions in case of subsequent 
connection closure issues

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rahuljain373/storm STORM-2028

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2248


commit 024d2694cca2c98daed769fb82f93c945e715a10
Author: Rahul Jain 
Date:   2017-07-31T07:44:39Z

STORM-2028: Fix for uprooting the JDBC client exceptions in case of 
subsequent connection closure issues




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread rahuljain373
Github user rahuljain373 closed the pull request at:

https://github.com/apache/storm/pull/2248


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2248: STORM-2028: Fix for uprooting the JDBC client exce...

2017-07-31 Thread rahuljain373
GitHub user rahuljain373 opened a pull request:

https://github.com/apache/storm/pull/2248

STORM-2028: Fix for uprooting the JDBC client exceptions in case of s…

Fix for uprooting the JDBC client exceptions in case of subsequent 
connection closure issues

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rahuljain373/storm STORM-2028

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2248


commit 024d2694cca2c98daed769fb82f93c945e715a10
Author: Rahul Jain 
Date:   2017-07-31T07:44:39Z

STORM-2028: Fix for uprooting the JDBC client exceptions in case of 
subsequent connection closure issues




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Ideas for resolving storm-drpc-server compilation issue on IDE

2017-07-31 Thread Jungtaek Lim
I agreed to minimize the target of shade & relocation artifacts minimal as
possible, but as we shaded almost everything (meaning non-relocation will
affect user experience) so may need to find exhaustive set of troublesome
artifacts and relocate at least them. (Maybe union of everyone's lists?)

For me Guava, HttpClient, Netty (maybe no need to shade for now if we don't
plan to upgrade to 4.x: package name differs) is in my list.

Would be better to initiate poll or discussion with separate thread?

- Jungtaek Lim (HeartSaVioR)

2017년 7월 20일 (목) 오전 2:27, Bobby Evans 님이 작성:

> I am fine with a separate project for relocated dependencies (or even just
> separate packages, you do a maven install of them and not include them in
> the IDE at all).  Shading still has some drawbacks, but I think in a few
> cases it makes since.  I would prefer it if we picked a very small number
> of dependencies that cause people issues and just shade those.  Guava is
> the big one that I worry about. Netty is a possibility and I think asm
> would be another, but it is a transitive dependency so it would require us
> with our own version of kryo exposing the kryo API but pulling in a shaded
> asm.
> The servlet-api concerns me, but it looks like it is tied to the
> IHttpCredentialsPlugin which should move to the server package anyways.
>
> The rest I am not concerned about, are things that are exposed to end
> users, or are for test and not actually shipped.
> $ mvn dependecy:tree...
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ storm-client
> ---
> [INFO] org.apache.storm:storm-client:jar:2.0.0-SNAPSHOT
> [INFO] +- uk.org.lidalia:sysout-over-slf4j:jar:1.0.2:compile
> [INFO] +- org.slf4j:slf4j-api:jar:1.7.21:compile
> [INFO] +- org.apache.logging.log4j:log4j-api:jar:2.8.2:compile
> [INFO] +- org.apache.logging.log4j:log4j-core:jar:2.8.2:compile
> [INFO] +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.8.2:compile
> [INFO] +- org.slf4j:log4j-over-slf4j:jar:1.6.6:compile
> [INFO] +- com.google.guava:guava:jar:16.0.1:compile
> [INFO] +- org.apache.thrift:libthrift:jar:0.9.3:compile
> [INFO] |  \- org.apache.httpcomponents:httpcore:jar:4.4.1:compile
> [INFO] +- commons-io:commons-io:jar:2.5:compile
> [INFO] +- commons-lang:commons-lang:jar:2.5:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.2:compile
> [INFO] +- com.lmax:disruptor:jar:3.3.2:compile
> [INFO] +- com.googlecode.json-simple:json-simple:jar:1.1:compile
> [INFO] +- org.yaml:snakeyaml:jar:1.11:compile
> [INFO] +- io.netty:netty:jar:3.9.0.Final:compile
> [INFO] +- com.esotericsoftware:kryo:jar:3.0.3:compile
> [INFO] |  +- com.esotericsoftware:reflectasm:jar:1.10.1:compile
> [INFO] |  |  \- org.ow2.asm:asm:jar:5.0.3:compile
> [INFO] |  +- com.esotericsoftware:minlog:jar:1.3.0:compile
> [INFO] |  \- org.objenesis:objenesis:jar:2.1:compile
> [INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
> [INFO] |  \- jline:jline:jar:0.9.94:compile
> [INFO] +- org.apache.curator:curator-framework:jar:2.12.0:compile
> [INFO] +- org.jgrapht:jgrapht-core:jar:0.9.0:compile
> [INFO] +- javax.servlet:servlet-api:jar:2.5:compile
> [INFO] +- org.apache.httpcomponents:httpclient:jar:4.3.3:compile
> [INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
> [INFO] |  \- commons-codec:commons-codec:jar:1.6:compile
> [INFO] +- org.apache.curator:curator-client:jar:2.12.0:compile
> [INFO] +- junit:junit:jar:4.11:test
> [INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
> [INFO] +- org.mockito:mockito-core:jar:1.9.5:test
> [INFO] \- org.hamcrest:hamcrest-library:jar:1.3:test
> - Bobby
>
>
> On Wednesday, July 12, 2017, 9:45:43 AM CDT, Jungtaek Lim <
> kabh...@gmail.com> wrote:
>
> I'd like to bump on this again, since we have a few huge issues for Storm
> 2.0.0, and this issue is a kind of regression and effectively blocker.
> (Please note that current master branch removes shading for some libraries
> to make IDE happy.)
>
> At that time I didn't consider option 2 as possible solution, but now Flink
> is going with this option, and I can't find reason to not doing this.
>
> * Repository: https://github.com/apache/flink-shaded
> * Discussion thread:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Changing-Flink-s-shading-model-td17419.html
>
> Thought?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 3월 31일 (금) 오후 3:12, Jungtaek Lim 님이 작성:
>
> > Bobby,
> >
> > I've worked on separating worker and daemon classpath.
> >
> > - Issue: STORM-2441: Break down 'storm-core' to extract client (worker)
> > artifacts 
> > - PR: https://github.com/apache/storm/pull/2034
> >
> > I don't address your suggestion about "classpath selection" and "hiding
> > local mode". Please file issues if you would like to address.
> >
> > Btw, I exclude artifacts from shade & relocation list so still need to
> > address dependency issue.
> >
> > Folks,
> 

Re: [Propose] move website repository from svn to git

2017-07-31 Thread Xin Wang
+1 for moving to git.  - Xin



2017-07-31 14:54 GMT+08:00 Jungtaek Lim :

> Bump. I think this is worth to address soon, since some contributors
> occasionally submit patches regarding documentations.
> Personally SVN is no longer feel convenient to use. If we all feel the
> same, let's change then.
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2017년 7월 13일 (목) 오전 9:16, Jungtaek Lim 님이 작성:
>
> > Maybe we could try out Gitbox, though every committers should join their
> > Github accounts to 'apache' group and enable 2FA.
> >
> > 2017년 7월 13일 (목) 오전 8:38, Jungtaek Lim 님이 작성:
> >
> >> Did we render webpage with asf-site branch? I didn't recognize it.
> >>
> >> Yes I meant separate git repository, like 'storm-site'. I'm happy I'm
> not
> >> the only one who feels inconvenient with SVN repo.
> >> Would it better to initiate VOTE for this?
> >>
> >> Thanks,
> >> Jungtaek Lim (HeartSaVioR)
> >>
> >> 2017년 7월 13일 (목) 오전 4:30, P. Taylor Goetz 님이 작성:
> >>
> >>> We were using git before, then a year ago moved back to subversion to
> >>> implement versioned documentation [1].
> >>>
> >>> If we do decide to move back to git for this, I would recommend using a
> >>> separate git repository so it doesn’t bloat our main code repository.
> When
> >>> generating javadoc for a new version, the svn commit to publish the
> site
> >>> can take around 20 minutes.
> >>>
> >>> -Taylor
> >>>
> >>> > On Jul 12, 2017, at 10:33 AM, Jungtaek Lim 
> wrote:
> >>> >
> >>> > Hi devs,
> >>> >
> >>> > I think we discussed moving website repository from SVN to GIT from a
> >>> long
> >>> > time ago, and we were OK on that, but action was not taken.
> >>> >
> >>> > Now I can see number of projects (Spark, Kafka, Beam, maybe more) are
> >>> using
> >>> > separate GIT repository for website.
> >>> > Although we may still need to have version specific document (doc
> >>> > directory) from code repository and copy Jekyll build result to
> website
> >>> > repo, anyone can look at the whole website code and craft pull
> >>> requests to
> >>> > help us. Git would be more convenient for ourselves than SVN (since
> >>> we're
> >>> > maintaining Storm from GIT).
> >>> >
> >>> > So I'd like to propose having a new repository 'storm-website' or
> >>> > 'storm-site' with 'asf-site' as default branch, and move SVN contents
> >>> to
> >>> > GIT.
> >>> > (Sure we need to ask INFRA for helping Storm website to be rendered
> >>> from a
> >>> > new GIT repo.)
> >>> >
> >>> > What do you think?
> >>> >
> >>> > Thanks,
> >>> > Jungtaek Lim (HeartSaVioR)
> >>>
> >>>
>



-- 
Thanks,
Xin


Re: [Propose] move website repository from svn to git

2017-07-31 Thread Jungtaek Lim
Bump. I think this is worth to address soon, since some contributors
occasionally submit patches regarding documentations.
Personally SVN is no longer feel convenient to use. If we all feel the
same, let's change then.

-Jungtaek Lim (HeartSaVioR)

2017년 7월 13일 (목) 오전 9:16, Jungtaek Lim 님이 작성:

> Maybe we could try out Gitbox, though every committers should join their
> Github accounts to 'apache' group and enable 2FA.
>
> 2017년 7월 13일 (목) 오전 8:38, Jungtaek Lim 님이 작성:
>
>> Did we render webpage with asf-site branch? I didn't recognize it.
>>
>> Yes I meant separate git repository, like 'storm-site'. I'm happy I'm not
>> the only one who feels inconvenient with SVN repo.
>> Would it better to initiate VOTE for this?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 7월 13일 (목) 오전 4:30, P. Taylor Goetz 님이 작성:
>>
>>> We were using git before, then a year ago moved back to subversion to
>>> implement versioned documentation [1].
>>>
>>> If we do decide to move back to git for this, I would recommend using a
>>> separate git repository so it doesn’t bloat our main code repository. When
>>> generating javadoc for a new version, the svn commit to publish the site
>>> can take around 20 minutes.
>>>
>>> -Taylor
>>>
>>> > On Jul 12, 2017, at 10:33 AM, Jungtaek Lim  wrote:
>>> >
>>> > Hi devs,
>>> >
>>> > I think we discussed moving website repository from SVN to GIT from a
>>> long
>>> > time ago, and we were OK on that, but action was not taken.
>>> >
>>> > Now I can see number of projects (Spark, Kafka, Beam, maybe more) are
>>> using
>>> > separate GIT repository for website.
>>> > Although we may still need to have version specific document (doc
>>> > directory) from code repository and copy Jekyll build result to website
>>> > repo, anyone can look at the whole website code and craft pull
>>> requests to
>>> > help us. Git would be more convenient for ourselves than SVN (since
>>> we're
>>> > maintaining Storm from GIT).
>>> >
>>> > So I'd like to propose having a new repository 'storm-website' or
>>> > 'storm-site' with 'asf-site' as default branch, and move SVN contents
>>> to
>>> > GIT.
>>> > (Sure we need to ask INFRA for helping Storm website to be rendered
>>> from a
>>> > new GIT repo.)
>>> >
>>> > What do you think?
>>> >
>>> > Thanks,
>>> > Jungtaek Lim (HeartSaVioR)
>>>
>>>


Re: [DISCUSS] Remove CHANGELOG file

2017-07-31 Thread Jungtaek Lim
Let me also put long ago discussion about this:

http://search-hadoop.com/m/Storm/8gnYyUdhVp1eajp31?subj=+DISCUSSION+More+convenient+way+to+maintain+committer+contributor+list+and+changelogs


In my view, from long ago discussion, Haohui and Bobby agreed to not
maintain CHANGELOG by hand. Haohui also suggested how to get them
automatically, whereas I just would want to remove it, but that's also OK)
We didn’t get agreement clearly about removing CHANGELOG but at least saw
our needs to automate it.


And in current discussion, again in my view, Roshan, Hugo, Stig agree to
remove CHANGELOG. I’ve been continuously claiming to remove CHANGELOG, so 3
PMC members and 1 contributor seem to agree on removing CHANGELOG, and at
least 2 more PMC members to not maintain CHANGELOG manually.


I will initiate a VOTE thread if we need to. Again, release managers would
be affected by this change so I would want to hear Taylor’s opinion before
going forward, but this is clear pain point for mergers so will initiate a
VOTE thread in several days (at least in this week) if Taylor doesn’t put
opinion on this or misses this discussion.


Thanks,

Jungtaek Lim (HeartSaVioR)

2017년 7월 28일 (금) 오전 10:53, Jungtaek Lim 님이 작성:

> correction: other projects -> *some* other projects, though they're
> popular projects (including in competition)
>
> 2017년 7월 28일 (금) 오전 10:51, Jungtaek Lim 님이 작성:
>
>> I'm happy that there're other guys having same difficult and sharing same
>> feeling.
>>
>> This discussion has been initiating several times (from me) and getting
>> some +1s for each thread but didn't reach to actual work.
>>
>> We already utilize JIRA, and I'm subscribing issues@ and taking care of
>> issues forgot to mark resolve and/or labeling fixed versions.
>> It may sounds ideal for us to let reporters caring about their issues,
>> but committers can also help that, and in fact merger is in responsible to
>> take care of resolving the issue, so irrelevant to contributor for this
>> side.
>>
>> My other consideration is that which thing is convenient for release
>> manager. Taylor took the release manager all the time (thanks for the great
>> work!) and it is directly related to release announcement so would like to
>> hear his opinion. If it is more convenient or he think he can tolerate
>> that, we can just go on.
>>
>> Please note that other projects don't use merge commit. Most of the time
>> they squash commits in PR into one, labeling commit title as JIRA issue,
>> making commit list just as CHANGELOG. That's another thing we discussed
>> earlier and I think we need to discuss again, but that can be discussed
>> from another thread.
>>
>> Regarding maintaining contributors: easy to explain. Just take a look at
>> what Spark has been doing. Some other projects follow the approach as well.
>>
>> We can run the script to extract authors of git commits, and just " |
>> sort | uniq", and done. Pulling assigner from JIRA issue may be more
>> accurate, since it requires actual account whereas author information in
>> commit is not strictly required to identify them. We can apply hybrid
>> approach as well, but for starter just following git commits looks OK to me.
>>
>> IMHO they don't feel proud strongly only they're listed in contributors.
>> Looking at contribution graph works better in this case, given that it also
>> shows commit count and lines of change. (regardless of accuracy)
>> It may give more proud to mention them as release announce. It will lead
>> contributors to play consistently, trying to participate and be mentioned
>> for releases as many as possible. IMO Spark built a great strategy for this
>> side, and if we all think it is great, why not follow?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 7월 28일 (금) 오전 6:58, Stig Rohde Døssing 님이
>> 작성:
>>
>>> We already have to keep JIRA updated, and keeping JIRA consistent is
>>> easier
>>> since there isn't one view of the resolved issues for each git branch
>>> like
>>> we have with CHANGELOG, so there's no worry about e.g. master having a
>>> different opinion on solved issues in 1.2.0 than 1.x-branch has.
>>>
>>> I think we already have the guideline that only small (e.g. typo) changes
>>> are okay without a JIRA issue. We're already encouraging one commit per
>>> issue, most of the PRs I've seen recently have been squashing before
>>> merge.
>>> Is this not your experience?
>>>
>>> I think we have the contributors/committers lists on SVN as well for
>>> generating http://storm.apache.org/contribute/People.html at
>>> https://svn.apache.org/repos/asf/storm/site/_data/. I think Jungtaek was
>>> suggesting keeping the committers list, and generating the contributors
>>> list for each release by either commit authors or JIRA assignees, but he
>>> can probably elaborate better.
>>>
>>> 2017-07-27 23:06 GMT+02:00 Hugo Da Cruz Louro :
>>>
>>> > I am +1 for discontinuing 

[GitHub] storm pull request #2247: Force to make some files in final-package use unix...

2017-07-31 Thread cluo512
GitHub user cluo512 opened a pull request:

https://github.com/apache/storm/pull/2247

Force to make some files in final-package use unix line ending

I made a distribution from source code on windows OS, but when I want to 
launch this on linux. I found there are some mistake for line break in the 
scripts. And then I found the settings for line ending in the binary.xml was 
not set obviously, so it is decided by the OS where run the maven command.
I think its better to set the line ending obviously for unix. because we 
mostly use linux system to deploy storm and, at the same time, this 
distribution will run well in the windows OS.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cluo512/storm master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2247


commit 63a3ed588f3f597e8c6f186305bfce4ccf7d5387
Author: cluo <051...@163.com>
Date:   2017-07-31T06:08:05Z

Force to make the final-package some files use unix line ending




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---