date:20170801

[GitHub] storm issue #2240: [STORM-2657] Update SECURITY.MD

2017-08-01 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/storm/pull/2240
  
@revans2 
Hello,I have modified this PR followed your suggestion.Thanks again for 
your hard work.Review it,Please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS] Ideas for resolving storm-drpc-server compilation issue on IDE

2017-08-01 Thread Hugo Da Cruz Louro

I have been following this discussion thread as part of the storm-core-ui 
migration. I would like to bring up a couple of points:

* The names of the packages "storm-client" and "storm-server" are a bit 
misleading to me. Isn’t what we really mean here "storm-workers" and 
"storm-daemons” ? Even if not these names, we should pick names that as close 
as possible to the “physical system”.

* storm-client-misc
   * I noticed that this module only has two classes [1]. They are currently 
used in the module storm-starter and nowhere else. If that is the case, we 
should just put the classes in the module storm-starter. The concern is if some 
users may be using them in their deployments. Do you know of any users using 
these classes? Perhaps we could poll 
us...@storm.apache.org and find out.

  * the -misc extension is also very confusing to me. My first thought was that 
it was some sort of library dependency placeholder, or something like that. If 
at all possible, my suggestion would be for us to eliminate this module 
altogether.

  * Since we Storm 2.0 is a major release, if we find out that not many users 
(maybe none) are using the classes [1] we could probably just put the classes 
HttpForwardingMetricsConsumer, HttpForwardingMetricsServer in storm-starter. As 
for the concern of breaking backwards compatibility, document a workaround 
using storm-starter.

Thanks,
Hugo

[1] - HttpForwardingMetricsConsumer, HttpForwardingMetricsServer

On Jul 31, 2017, at 6:51 AM, Bobby Evans 
> wrote:

Those look reasonable to me.

- Bobby

On Monday, July 31, 2017, 2:22:47 AM CDT, Jungtaek Lim 
> wrote:

I agreed to minimize the target of shade & relocation artifacts minimal as
possible, but as we shaded almost everything (meaning non-relocation will
affect user experience) so may need to find exhaustive set of troublesome
artifacts and relocate at least them. (Maybe union of everyone's lists?)

For me Guava, HttpClient, Netty (maybe no need to shade for now if we don't
plan to upgrade to 4.x: package name differs) is in my list.

Would be better to initiate poll or discussion with separate thread?

- Jungtaek Lim (HeartSaVioR)

2017년 7월 20일 (목) 오전 2:27, Bobby Evans 
>님이 작성:

I am fine with a separate project for relocated dependencies (or even just
separate packages, you do a maven install of them and not include them in
the IDE at all).  Shading still has some drawbacks, but I think in a few
cases it makes since.  I would prefer it if we picked a very small number
of dependencies that cause people issues and just shade those.  Guava is
the big one that I worry about. Netty is a possibility and I think asm
would be another, but it is a transitive dependency so it would require us
with our own version of kryo exposing the kryo API but pulling in a shaded
asm.
The servlet-api concerns me, but it looks like it is tied to the
IHttpCredentialsPlugin which should move to the server package anyways.

The rest I am not concerned about, are things that are exposed to end
users, or are for test and not actually shipped.
$ mvn dependecy:tree...
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ storm-client
---
[INFO] org.apache.storm:storm-client:jar:2.0.0-SNAPSHOT
[INFO] +- uk.org.lidalia:sysout-over-slf4j:jar:1.0.2:compile
[INFO] +- org.slf4j:slf4j-api:jar:1.7.21:compile
[INFO] +- org.apache.logging.log4j:log4j-api:jar:2.8.2:compile
[INFO] +- org.apache.logging.log4j:log4j-core:jar:2.8.2:compile
[INFO] +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.8.2:compile
[INFO] +- org.slf4j:log4j-over-slf4j:jar:1.6.6:compile
[INFO] +- com.google.guava:guava:jar:16.0.1:compile
[INFO] +- org.apache.thrift:libthrift:jar:0.9.3:compile
[INFO] |  \- org.apache.httpcomponents:httpcore:jar:4.4.1:compile
[INFO] +- commons-io:commons-io:jar:2.5:compile
[INFO] +- commons-lang:commons-lang:jar:2.5:compile
[INFO] +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] +- com.lmax:disruptor:jar:3.3.2:compile
[INFO] +- com.googlecode.json-simple:json-simple:jar:1.1:compile
[INFO] +- org.yaml:snakeyaml:jar:1.11:compile
[INFO] +- io.netty:netty:jar:3.9.0.Final:compile
[INFO] +- com.esotericsoftware:kryo:jar:3.0.3:compile
[INFO] |  +- com.esotericsoftware:reflectasm:jar:1.10.1:compile
[INFO] |  |  \- org.ow2.asm:asm:jar:5.0.3:compile
[INFO] |  +- com.esotericsoftware:minlog:jar:1.3.0:compile
[INFO] |  \- org.objenesis:objenesis:jar:2.1:compile
[INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
[INFO] |  \- jline:jline:jar:0.9.94:compile
[INFO] +- org.apache.curator:curator-framework:jar:2.12.0:compile
[INFO] +- org.jgrapht:jgrapht-core:jar:0.9.0:compile
[INFO] +- javax.servlet:servlet-api:jar:2.5:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.3.3:compile
[INFO] |  +-

Re: [DISCUSS] Remove CHANGELOG file

2017-08-01 Thread Jungtaek Lim

FYI: Just created PR around master branch:
https://github.com/apache/storm/pull/2253
I'll apply this to other version line branches as well, so please take a
look at this and comment. I'll just apply this in tomorrow if no
outstanding comment is seen.

2017년 8월 2일 (수) 오전 6:41, Jungtaek Lim 님이 작성:

> Forgot to say, +1 on Stig's explanation. I don't see critical issue from
> locating release note (previously CHANGELOG) file.
> After releasing, release note on website will also have static
> representation (string content) of CHANGELOG so we will provide CHANGELOG
> at least two places.
>
> I'll bring several pull requests on removing CHANGELOG.md at all active
> version lines soon. I also feel we all agree to do it, but just to leave a
> history. I'll also modify DEVELOPER.md to remove 4. of "Merge a pull
> request or patch" on pull requests.
>
> 2017년 8월 1일 (화) 오전 7:02, Jungtaek Lim 님이 작성:
>
>> I'm seeing several voices worrying about JIRA update.
>>
>> I think the main reason to miss to update is that we're doing it
>> manually. If you remember the PR about adopting Kafka merge script, it also
>> updates corresponding JIRA issue at the end of merge. If you're not aware
>> of, please refer https://github.com/apache/storm/pull/1468 to see long
>> explanation and discussion.
>>
>> Our main concern to adopt Spark/Kafka merge script was squashing commits
>> (and also no merge commit, maybe), while I still personally see the huge
>> benefit (commit list itself becomes CHANGELOG) and I see some others fan of
>> squashed commit, but we still modify the script to do the merge commit like
>> what we're doing. That's what we can discuss and decide, not the blocker
>> for merge script I think.
>>
>> Let's suppose we get rid of commit for updating CHANGELOG, and we still
>> rely on merge commit. Could we determine which JIRA issue is addressed only
>> from merge commit's commit title? Yes or no, depending on how contributor
>> names their branch, and we can't force that (and forcing even their branch
>> is going to be really annoying). So commit title of the merge commit should
>> be conform to the formal format (say, contains JIRA title or so), or just
>> leave squashed commit. Refining the title of merge commit manually will be
>> going to another pain for merger, so should be automated as well.
>>
>> tl;dr. This is the time to reconsider merge script, maybe modify
>> Spark/Kafka merge script to conform to Storm project. This helps squashing
>> commits (only if we decide to go on), or set informative title to the merge
>> commit if we reside on merge commit. This also helps on resolving
>> corresponding JIRA issue as well.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 8월 1일 (화) 오전 5:48, Stig Rohde Døssing 님이
>> 작성:
>>
>>> Would it fit alongside the other release artifacts in e.g.
>>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/?
>>> That
>>> seems to be what Kafka is doing as well
>>> https://dist.apache.org/repos/dist/release/kafka/0.11.0.0/.
>>>
>>> If we could put the change log up along the other artifacts, we could
>>> probably get away with not having it included in the src/bin
>>> distributions,
>>> because people could get the change log from the same mirrors they got
>>> the
>>> distributions from.
>>>
>>> 2017-07-31 22:37 GMT+02:00 P. Taylor Goetz :
>>>
>>> > A couple thoughts/questions regarding the mechanics of publishing the
>>> > resulting HTML file.
>>> >
>>> > When voting on release candidates, in the past we point to the
>>> CHANGELOG
>>> > file in git. What would we do in this case?
>>> >
>>> > My assumption is the release manager would generate the file and post
>>> it
>>> > to their account on people.apache.org. After a successful vote, the
>>> > change log would be published to the storm.a.o website, presumably in a
>>> > /changelogs/${version}.html file.
>>> >
>>> > One could argue we could simply link to a JIRA filter for that release,
>>> > but I don’t like the idea of linking to something inherently mutable
>>> as a
>>> > release artifact.
>>> >
>>> > Would we include the file in the source and/or binary distributions? If
>>> > so, where, and what would be the process?
>>> >
>>> > I’m interested in hearing others’ thoughts.
>>> >
>>> > -Taylor
>>> >
>>> >
>>> > > On Jul 31, 2017, at 3:50 PM, Stig Rohde Døssing <
>>> stigdoess...@gmail.com>
>>> > wrote:
>>> > >
>>> > > Opened JIRA here https://issues.apache.org/jira/browse/STORM-2665
>>> and
>>> > took
>>> > > a look at adjusting Kafka's script here
>>> > > https://github.com/apache/storm/pull/2250
>>> > >
>>> > > 2017-07-31 21:02 GMT+02:00 Bobby Evans >> >:
>>> > >
>>> > >> So it looks like we all agree, now we just need someone to file a
>>> JIRA
>>> > and
>>> > >> a corresponding pull request.  The kafka script looks like a good
>>> place
>>> > to
>>> > >> start, but we can iterate on it in the

Re: [Discussion]: Storm Improvemement Proposal (SIP) to discuss changes

2017-08-01 Thread Jungtaek Lim

Still +1 to introduce this only for non connectors. Maybe would want to
skip this also for non connectors and non storm-core
(storm-client/storm-server) like Flux, SQL, storm-webapp as well, but maybe
still have small chance to need it.

Despite that I voted to +1, I still worry about efforts on reviewing SIP:
this will only work if we (in dev@ list) are open to participate and review
SIP in desired duration. Two sides of the coin: it might incurs more active
community, but it will just fail if community is not enough active. Each
SIP discussion is easy to be staled if we don't care about much, and if we
also want to introduce vote for SIP, easier to be staled.

So we all should have willing to go with this. I'm OK to take additional
load, but would like to hear others opinions as well.

- Jungtaek Lim (HeartSaVioR)

2017년 8월 2일 (수) 오전 2:27, Harsha 님이 작성:

> Trying to bring attention this again.
> We currently have few big feature PRs going on and there is considerable
> discussion about the design and Implementation etcc. My intention of
> starting SIP is to add these details before someone goes and writes up a
> PR and everyone has to go through reading of design and sometimes those
> docs are not clear and we end up having long discussion on the PRs which
> should mainly about the code review itself.
> We should at least start making this process mandatory for any new big
> feature especially to the storm-core. I am less concerned about
> connectors and other parts which should have least resistive path and
> they are usually easy to review.
> If the devs put their thoughts and design and goes through discussion
> and get everyone on the same line when the PR shows up it will be less
> surprising and everyone involved know how the PR/Code supposed to work.
>
> -Harsha
>
> On Fri, Jun 9, 2017, at 09:16 PM, Harsha wrote:
> > Arun,
> >For big features we did follow design doc/review. Making it
> >formal makes everyone to follow a process.
> > Again this process is not for bug fixes as we stated its about New
> > Features/Config Changes/Public interface changes. I don't think it puts
> > any extra effort for anyone who is writing detailed JIRA but by making
> > it formal makes everyone to add these details in a centra process. Not
> > everyone will look at mailing list but its easier to follow a wiki page.
> >  We should atleast give it a try before we vote it out.
> >
> > Roshan,
> >  Adding connector should require a SIP as well and changing any
> >  public interfaces should be a KIP. Intention here is we've
> >  central place where everyone can follow in detail whats the
> >  public interface/new feature changes went in. We've changed
> >  KafkaSpout quite a bit and there is current discussion thats
> >  going to change it , having this documented in a central place
> >  will make it easy to follow and recording them in release notes
> >  as well.
> >
> > Taylor,
> > We can't call it a too tedious process without even giving it a
> > try. This has been followed to a greater success at kafka and
> > also Flink started the process as well
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > .
> > If it actually proved to more of hindrance than helping the community we
> > can move away from it.
> >
> > " Kafka has somewhat of a reputation for setting potentially too high a
> > bar. I'd rather not see that happen with this community."
> > Sure. But it also depends on the community. Just because some community
> > enforcing too high bar that doesn't mean we are trying to do it via this
> > process. Again we always have option if we ever veer too far in the
> > wrong direction to bring up and improve or remove this process.
> >
> > We should also as a community strive to have better quality and I am
> > hoping this will give us a chance to not only let users know what are
> > changes coming in but also keep the dev list to have a chance and join
> > the discussion.
> >
> > -Harsha
> >
> > On Jun 9, 2017, 7:18 PM -0700, Arun Iyer , wrote:
> > I am for documenting and upfront design reviews, but maybe we should
> > keep it less formal and make it part of the JIRA to start with.
> >
> > Do we have any upcoming features for which we would like to see a
> > proposal? May be start with a couple of proposals
> > and see it works out before making it formal.
> >
> >
> > Thanks,
> > Arun
> >
> >
> >
> > 6/9/17, 6:49 PM, "P. Taylor Goetz"  wrote:
> >
> > -0
> >
> > The KIP process feels kind of heavy. I'd rather start with a lighter
> > effort like improving JIRA submissions and pull requests (some pull
> > requests/JIRAs, even from committers and PMC members, are woefully
> > inadequate in terms of detail), and see how that works out.
> >
> > I share Bobby's concern that doing so might raise the bar for
> >

Re: [Propose] move website repository from svn to git

2017-08-01 Thread Jungtaek Lim

FYI: I just take a step to this, but blocked at creating git repository in
reporeq.apache.org.

Just filed https://issues.apache.org/jira/browse/INFRA-14765. In that issue
I also asked how to serve website with non-main project repository.

2017년 7월 31일 (월) 오후 10:56, Bobby Evans 님이 작성:

> +1
> I am fine with moving to git, but I would like it to be a different repo.
> Our current repo is at least 160MB already (which is a lot to download)
> but nothing compared the the web site that has lots and lots of things
> checked in (I estimate it at about 1.5GB on an older version I have locally)
>
>
> - Bobby
>
>
> On Monday, July 31, 2017, 1:58:03 AM CDT, Xin Wang 
> wrote:
>
> +1 for moving to git.  - Xin
>
>
>
> 2017-07-31 14:54 GMT+08:00 Jungtaek Lim :
>
> > Bump. I think this is worth to address soon, since some contributors
> > occasionally submit patches regarding documentations.
> > Personally SVN is no longer feel convenient to use. If we all feel the
> > same, let's change then.
> >
> > -Jungtaek Lim (HeartSaVioR)
> >
> > 2017년 7월 13일 (목) 오전 9:16, Jungtaek Lim 님이 작성:
> >
> > > Maybe we could try out Gitbox, though every committers should join
> their
> > > Github accounts to 'apache' group and enable 2FA.
> > >
> > > 2017년 7월 13일 (목) 오전 8:38, Jungtaek Lim 님이 작성:
> > >
> > >> Did we render webpage with asf-site branch? I didn't recognize it.
> > >>
> > >> Yes I meant separate git repository, like 'storm-site'. I'm happy I'm
> > not
> > >> the only one who feels inconvenient with SVN repo.
> > >> Would it better to initiate VOTE for this?
> > >>
> > >> Thanks,
> > >> Jungtaek Lim (HeartSaVioR)
> > >>
> > >> 2017년 7월 13일 (목) 오전 4:30, P. Taylor Goetz 님이 작성:
> > >>
> > >>> We were using git before, then a year ago moved back to subversion to
> > >>> implement versioned documentation [1].
> > >>>
> > >>> If we do decide to move back to git for this, I would recommend
> using a
> > >>> separate git repository so it doesn’t bloat our main code repository.
> > When
> > >>> generating javadoc for a new version, the svn commit to publish the
> > site
> > >>> can take around 20 minutes.
> > >>>
> > >>> -Taylor
> > >>>
> > >>> > On Jul 12, 2017, at 10:33 AM, Jungtaek Lim 
> > wrote:
> > >>> >
> > >>> > Hi devs,
> > >>> >
> > >>> > I think we discussed moving website repository from SVN to GIT
> from a
> > >>> long
> > >>> > time ago, and we were OK on that, but action was not taken.
> > >>> >
> > >>> > Now I can see number of projects (Spark, Kafka, Beam, maybe more)
> are
> > >>> using
> > >>> > separate GIT repository for website.
> > >>> > Although we may still need to have version specific document (doc
> > >>> > directory) from code repository and copy Jekyll build result to
> > website
> > >>> > repo, anyone can look at the whole website code and craft pull
> > >>> requests to
> > >>> > help us. Git would be more convenient for ourselves than SVN (since
> > >>> we're
> > >>> > maintaining Storm from GIT).
> > >>> >
> > >>> > So I'd like to propose having a new repository 'storm-website' or
> > >>> > 'storm-site' with 'asf-site' as default branch, and move SVN
> contents
> > >>> to
> > >>> > GIT.
> > >>> > (Sure we need to ask INFRA for helping Storm website to be rendered
> > >>> from a
> > >>> > new GIT repo.)
> > >>> >
> > >>> > What do you think?
> > >>> >
> > >>> > Thanks,
> > >>> > Jungtaek Lim (HeartSaVioR)
> > >>>
> > >>>
> >
>
>
>
> --
> Thanks,
> Xin

[GitHub] storm pull request #2253: Remove CHANGELOG.md and update DEVELOPER.md to ref...

2017-08-01 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/2253#discussion_r130748562
  
--- Diff: DEVELOPER.md ---
@@ -214,14 +214,9 @@ To pull in a merge request you should generally follow 
the command line instruct
 $ git pull  
 You can use `./dev-tools/storm-merge.py ` to produce the 
above command most of the time.
 
-4.  Assuming that the pull request merges without any conflicts:
-Update the top-level `CHANGELOG.md`, and add in the JIRA ticket number 
(example: `STORM-1234`) and ticket
-description to the change log.  Make sure that you place the JIRA 
ticket number in the commit comments where
-applicable.
+4. Run any sanity tests that you think are needed.
--- End diff --

The commit message refers to the commit on committing change on CHANGELOG.
If we would want to encourage contributors to place ticket number in the 
commit, it should be placed on `Create a pull request`. If you mean placing the 
ticket number on merge commit, we can describe it to 3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2253: Remove CHANGELOG.md and update DEVELOPER.md to ref...

2017-08-01 Thread srdo

Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/2253#discussion_r130747633
  
--- Diff: DEVELOPER.md ---
@@ -214,14 +214,9 @@ To pull in a merge request you should generally follow 
the command line instruct
 $ git pull  
 You can use `./dev-tools/storm-merge.py ` to produce the 
above command most of the time.
 
-4.  Assuming that the pull request merges without any conflicts:
-Update the top-level `CHANGELOG.md`, and add in the JIRA ticket number 
(example: `STORM-1234`) and ticket
-description to the change log.  Make sure that you place the JIRA 
ticket number in the commit comments where
-applicable.
+4. Run any sanity tests that you think are needed.
--- End diff --

Nit: I think it still makes sense to encourage placing the ticket number in 
the commit message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2253: Remove CHANGELOG.md and update DEVELOPER.md to ref...

2017-08-01 Thread HeartSaVioR

GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/storm/pull/2253

Remove CHANGELOG.md and update DEVELOPER.md to reflect the change

This is for master branch, and I'll proceed this for all other version 
lines without additional pull requests: 1.x-branch, 1.1.x-branch, 1.0.x-branch, 
0.10.x-branch, 0.9.x-branch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/storm remove-changelog-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2253


commit 59f0f1872c691446c6474ea92105880796b3d4e2
Author: Jungtaek Lim 
Date:   2017-08-01T22:09:46Z

Remove CHANGELOG.md and update DEVELOPER.md to reflect the change




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS] Remove CHANGELOG file

2017-08-01 Thread Jungtaek Lim

Forgot to say, +1 on Stig's explanation. I don't see critical issue from
locating release note (previously CHANGELOG) file.
After releasing, release note on website will also have static
representation (string content) of CHANGELOG so we will provide CHANGELOG
at least two places.

I'll bring several pull requests on removing CHANGELOG.md at all active
version lines soon. I also feel we all agree to do it, but just to leave a
history. I'll also modify DEVELOPER.md to remove 4. of "Merge a pull
request or patch" on pull requests.

2017년 8월 1일 (화) 오전 7:02, Jungtaek Lim 님이 작성:

> I'm seeing several voices worrying about JIRA update.
>
> I think the main reason to miss to update is that we're doing it manually.
> If you remember the PR about adopting Kafka merge script, it also updates
> corresponding JIRA issue at the end of merge. If you're not aware of,
> please refer https://github.com/apache/storm/pull/1468 to see long
> explanation and discussion.
>
> Our main concern to adopt Spark/Kafka merge script was squashing commits
> (and also no merge commit, maybe), while I still personally see the huge
> benefit (commit list itself becomes CHANGELOG) and I see some others fan of
> squashed commit, but we still modify the script to do the merge commit like
> what we're doing. That's what we can discuss and decide, not the blocker
> for merge script I think.
>
> Let's suppose we get rid of commit for updating CHANGELOG, and we still
> rely on merge commit. Could we determine which JIRA issue is addressed only
> from merge commit's commit title? Yes or no, depending on how contributor
> names their branch, and we can't force that (and forcing even their branch
> is going to be really annoying). So commit title of the merge commit should
> be conform to the formal format (say, contains JIRA title or so), or just
> leave squashed commit. Refining the title of merge commit manually will be
> going to another pain for merger, so should be automated as well.
>
> tl;dr. This is the time to reconsider merge script, maybe modify
> Spark/Kafka merge script to conform to Storm project. This helps squashing
> commits (only if we decide to go on), or set informative title to the merge
> commit if we reside on merge commit. This also helps on resolving
> corresponding JIRA issue as well.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 8월 1일 (화) 오전 5:48, Stig Rohde Døssing 님이 작성:
>
>> Would it fit alongside the other release artifacts in e.g.
>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.1.1-rc2/?
>> That
>> seems to be what Kafka is doing as well
>> https://dist.apache.org/repos/dist/release/kafka/0.11.0.0/.
>>
>> If we could put the change log up along the other artifacts, we could
>> probably get away with not having it included in the src/bin
>> distributions,
>> because people could get the change log from the same mirrors they got the
>> distributions from.
>>
>> 2017-07-31 22:37 GMT+02:00 P. Taylor Goetz :
>>
>> > A couple thoughts/questions regarding the mechanics of publishing the
>> > resulting HTML file.
>> >
>> > When voting on release candidates, in the past we point to the CHANGELOG
>> > file in git. What would we do in this case?
>> >
>> > My assumption is the release manager would generate the file and post it
>> > to their account on people.apache.org. After a successful vote, the
>> > change log would be published to the storm.a.o website, presumably in a
>> > /changelogs/${version}.html file.
>> >
>> > One could argue we could simply link to a JIRA filter for that release,
>> > but I don’t like the idea of linking to something inherently mutable as
>> a
>> > release artifact.
>> >
>> > Would we include the file in the source and/or binary distributions? If
>> > so, where, and what would be the process?
>> >
>> > I’m interested in hearing others’ thoughts.
>> >
>> > -Taylor
>> >
>> >
>> > > On Jul 31, 2017, at 3:50 PM, Stig Rohde Døssing <
>> stigdoess...@gmail.com>
>> > wrote:
>> > >
>> > > Opened JIRA here https://issues.apache.org/jira/browse/STORM-2665 and
>> > took
>> > > a look at adjusting Kafka's script here
>> > > https://github.com/apache/storm/pull/2250
>> > >
>> > > 2017-07-31 21:02 GMT+02:00 Bobby Evans :
>> > >
>> > >> So it looks like we all agree, now we just need someone to file a
>> JIRA
>> > and
>> > >> a corresponding pull request.  The kafka script looks like a good
>> place
>> > to
>> > >> start, but we can iterate on it in the pull request to try and
>> address
>> > >> Taylor's concern about JIRA not being up to date.  I would love to do
>> > it,
>> > >> but I am really overloaded right now so if someone else wants to take
>> > lead
>> > >> on it that would be great.
>> > >>
>> > >>
>> > >> - Bobby
>> > >>
>> > >>
>> > >> On Monday, July 31, 2017, 1:45:14 PM CDT, P. Taylor Goetz <
>> > >> ptgo...@gmail.com> wrote:
>> > >>
>> > >> I’m all for getting rid of the current process

Re: Considerably slow building website

2017-08-01 Thread Jungtaek Lim

I can't think better way than separating common docs and release specific
docs.

This approach allows us get rid of all release specific docs (source md
files) from storm site repo, and we only need to build them from storm repo
and copy to publish directory of storm site repo. On the side effect,
release specific docs will not expose the link to move to another releases.

If we are OK to go on this approach, dropping 'First Look' section and
'Documentation' on footer would be necessary, which mean we no longer rely
on 'current' release.
'Documentation' section clearly matches specific release so just good to
drop. Docs in 'First Look' barely change so maybe we can bring them to the
common docs, but even if we want to maintain per release manner, we can
just move 'First Look' to the first of 'Documentation' page of each release.

We need to have another header and footer for release specific docs so that
it doesn't rely on other releases. We may also want to isolate doc links to
the release specific docs, but not 100% sure this is what most of us want.

2017년 8월 2일 (수) 오전 3:29, P. Taylor Goetz 님이 작성:

> Thanks for the heads up. If I screw something up we can always regenerate
> from the release tag.
>
> -Taylor
>
> > On Aug 1, 2017, at 2:27 PM, Bobby Evans 
> wrote:
> >
> > Be careful when removing the javadocs.  There are links to the javadocs
> from within the docs themselves.
> >
> >
> > - Bobby
> >
> >
> > On Tuesday, August 1, 2017, 12:57:56 PM CDT, P. Taylor Goetz <
> ptgo...@gmail.com> wrote:
> >
> > I cleaned up the download page to remove some of the older releases and
> added a link to archive.a.o for older releases. I will also clean up dist
> as requested by infra.
> >
> > While I’m at it, I’ll clean up the javadoc so we only include javadoc
> for releases on the download page.
> >
> > That should help a little bit, but I agree that the publishing process
> is painful and would welcome any improvements.
> >
> > One option (I haven’t tested yet) might be to simply move the javadoc to
> the “publish” directory so it doesn’t get regenerated every time the site
> gets published. That would mean the javadoc links won’t work when running
> Jekyll locally, but I think it’s a fair trade off.
> >
> > -Taylor
> >
> >> On Aug 1, 2017, at 9:39 AM, Bobby Evans 
> wrote:
> >>
> >> Rebuilding everything each time is sadly necessary as currently the
> header/footer for all of the content is inline in each page.  So if we add
> a new release every page changes.  To fix this we would have to change the
> header to dynamically include the HTML from another file that gets updated
> on it's own.
> >> We might also want to think about rearranging things a bit, and reduce
> the number of releases that we have on the site.  Do we really need both
> 0.9.6 and 0.9.7, or 0.10.0 through 0.10.2.  Maybe there is a way to archive
> some of these so they are a part of the final site, but are not generated
> each time? (probably would need the header change at a minimum to work)
> >>
> >>
> >> - Bobby
> >>
> >>
> >> On Tuesday, August 1, 2017, 6:01:03 AM CDT, Jungtaek Lim <
> kabh...@gmail.com> wrote:
> >>
> >> I found I forgot to build website with "-d publish/" parameter. Now it
> >> reduced to 1347.585 secs but that is still way too long
> >>
> >> I've done some tests on building website ('jekyll build -d publish/
> >> --profile'):
> >>
> >> 1. as it is : 1347.585 secs
> >> 2. excluding 'releases' directories : 2.38 secs
> >> 3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
> >> directory of releases : 45 secs
> >>
> >> The build time is not stable but you can see how much the difference
> is. If
> >> we can separate building doc for each release, that should be best and
> it
> >> should reduce the build time greatly.
> >>
> >> If we can't separate building doc, we may want to take alternative
> >> approach: reducing maintaining releases. You can imagine that if we keep
> >> adding docs for new releases in website repo it should increase overall
> >> build time. I guess we may be better to provide only the last version of
> >> version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
> >> 2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning,
> major
> >> changes shouldn't be introduced in bug-fix releases so don't need to
> >> maintain docs separately.
> >>
> >> I would like to gather opinions around this along with moving website to
> >> git. Looking forward to hear others opinions.
> >>
> >> Thanks,
> >> Jungtaek Lim (HeartSaVioR)
> >>
> >> 2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:
> >>
> >>> Also found that we don't expose 1.0.4 in documentation dropdown and
> 1.0.4
> >>> directory is not created in 'publish/releases' directory. Maybe also
> missed
> >>> that.
> >>>
> >>> 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
> >>>
>  Hi devs,
> 
>  I'm trying

add new tasks to already assigned slots after scheduling for the first time

2017-08-01 Thread AMir Firouzi

I need some migration features for my scheduler in storm. i schedule the
tasks first time based on some logic and then after a while i need to
migrate some tasks to another workers. but after doing so if i assign the
task to an already used slot storm(nimbus) nags about not being able to use
an already used slot. is it impossible to do so? otherwise i have to use
another slots while some old used slots have the capacity to contains more
tasks and also compacting related tasks to a slot reduces intra-worker
traffic.

thanks

[GitHub] storm issue #2250: WIP: STORM-2665: Adapt Kafka's release note generation sc...

2017-08-01 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2250
  
+1 for the change, and it would be better to rename existing python package 
directories and include this to dev-tools directory anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Writing orc files with storm via java API

2017-08-01 Thread Kristopher Kane

For ORC specifically, I would ONLY create an ORC HDFS file based on a Tuple
batch and create/flush/close off the ORC file in one go. Adjust batch sizes
and message timeout  for what makes sense of your case. Yes, you will
likely have many small files in HDFS, but, since this ORC, the assumption
is you will be leveraging them via Hive. If that is the case, you can use
Hive to concat the ORC files at a partition level.

Other containerized formats need the same care but will need their own post
processing of small files.

Avro is a container format which doesn't have a footer and thus ideal for
schema + record acknowledgement processing.

Kris

On Mon, Jul 31, 2017 at 10:40 AM, Bobby Evans 
wrote:

> It should be possible to make this work, but it is not going to be
> simple.  The real issue is the format of the orc file.  It is not one
> record at a time, like CSV or other supported formats are.  Sadly this is
> currently an assumption with the AbstractHdfsBolt.
> https://github.com/apache/storm/blob/master/external/
> storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/format/
> RecordFormat.java
> So to support it we would need to make some modifications, not impossible,
> just not a drop in replacement.  If this is something you want to tackle
> and contribute back I think we would all love it.  You might also run into
> some issues with metadata for the format being written at the end of the
> file.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
> I am not totally sure how easy it is to recover an ORC file if that footer
> is missing because a worker crashed.  You might end up with data loss in
> some cases if you are not extremely careful.  You might also need to modify
> the ORC APIs themselves to be able to support storing/recovering the
> metadata in an external location for recovery to truly fix it, and then
> store them in ZK on a flush until the file is rotated.
>
> The Trident HDFState
> https://github.com/apache/storm/blob/master/external/
> storm-hdfs/src/main/java/org/apache/storm/hdfs/trident/HdfsState.java
> might be a more appropriate place to start, as the updated state is
> written out in micro batches, but you still have to deal with the footer
> issues, as trident really cares about exactly once processing.
>
> So overall it is not a simple problem, and relying on an external server
> like hive would make it a lot simpler.
>
>
> - Bobby
>
>
> On Tuesday, July 25, 2017, 8:38:42 AM CDT, Igor Kuzmenko <
> f1she...@gmail.com> wrote:
>
> Is there any implementation of storm bolt which can write files to HDFS in
> ORC format, without using Hive Streaming API?
> I've found java API for writing ORC files 
> and I'm guessing is there any existing Hive bolts that uses it or any plans
> to create such?
>

Re: possible to have supervisors without _eventlogger and _acker tasks

2017-08-01 Thread AMir Firouzi

Thanks Bobby for your instant & informative reply,
i actually respect these rules. i schedule all of these loggers and ackers,
but right now my scheduler put all the system tasks(loggers and acker
tasks) into one worker in one machine and i'm not getting the best
performance! I think it's because all of the tasks should transfer data to
these tasks in another machines and network latency slows down the storm.
but i'm wondering if i put some of these system tasks near other
(bolt/spout) tasks, would it effect the performance?
thanks again for your answer.

On Tue, Aug 1, 2017 at 6:20 PM Bobby Evans 
wrote:

> By default there are no `_eventlogger` tasks.  To have this feature
> enabled you need to turn it on by setting topology.eventlogger.executors to
> a positive number.  Ackers are on by default, but can be disabled by
> setting the number of topology.acker.executors to 0.  You should respect
> these when scheduling a topology because if they are supposed to be there
> and they are not scheduled messages will be sent to them, but they will be
> lost.  In the case of acking all of the tuples will time out.  In the case
> of the event logger the UI will show it working, but nothing will ever come
> out.
> Now that is on a per topology basis, not on a per worker basis.  These
> bolts are like any other bolt.  They can be in any worker your scheduler
> wants to put them in.  When inserting an acker bolt it is using a keyed
> grouping connected to just about everything in your topology, so where you
> place it is not that critical as it is going to be talking to everything.
> The event logger bolts are similar, but using a fields grouping based off
> of component id.
>
> https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357
> You could try to be smart to try and collocate the component with the
> logger for it, but honestly this feature slows your topology down so much
> already it is probably not worth trying to optimize it as it really will
> only be used when you need to do some serious debugging.
>
>
> - Bobby
>
>
> On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi <
> firouz...@gmail.com> wrote:
>
> hi guys
> i'm working on my own scheduler for storm. i wonder what happens if i
> create a worker process and put some tasks in it(bolt/spout tasks) but no
> _eventlogger and _acker tasks. what happens? is it a problem? tuples
> transferred/emitted from within tasks in this worker will be skipped or
> they just use another _acker or _loggers in other workers?
>
> thanks in advance
>

[ANNOUNCE] Apache Storm 1.1.1 Released

2017-08-01 Thread P. Taylor Goetz

The Apache Storm community is pleased to announce the release of Apache Storm 
version 1.1.1.

Storm is a distributed, fault-tolerant, and high-performance realtime 
computation system that provides strong guarantees on the processing of data. 
You can read more about Storm on the project website:

http://storm.apache.org

Downloads of source and binary distributions are listed in our download
section:

http://storm.apache.org/downloads.html

You can read more about this release in the following blog post:

http://storm.apache.org/2017/08/01/storm111-released.html

Distribution artifacts are available in Maven Central at the following 
coordinates:

groupId: org.apache.storm
artifactId: storm-core
version: 1.1.1

The full list of changes is available here[1]. Please let us know [2] if you 
encounter any problems.

Regards,

The Apache Storm Team

[1]: https://github.com/apache/storm/blob/v1.1.1/CHANGELOG.md
[2]: https://issues.apache.org/jira/browse/STORM

[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

2017-08-01 Thread roshannaik

Github user roshannaik commented on a diff in the pull request:

https://github.com/apache/storm/pull/2241#discussion_r130701047
  
--- Diff: conf/defaults.yaml ---
@@ -253,11 +244,15 @@ topology.trident.batch.emit.interval.millis: 500
 topology.testing.always.try.serialize: false
 topology.classpath: null
 topology.environment: null
-topology.bolts.outgoing.overflow.buffer.enable: false
-topology.disruptor.wait.timeout.millis: 1000
-topology.disruptor.batch.size: 100
-topology.disruptor.batch.timeout.millis: 1
-topology.disable.loadaware.messaging: false
+topology.disruptor.wait.timeout.millis: 1000  # TODO: Roshan: not used, 
but we may/not want this behavior
+topology.transfer.buffer.size: 5
+topology.transfer.batch.size: 10
+topology.executor.receive.buffer.size: 5
+topology.producer.batch.size: 1000
+topology.flush.tuple.freq.millis: 100
--- End diff --

Actually for now it should be set to 1ms, to be inline with the existing 
setting for flusher. Not yet determined what is a reasonable default for the 
new system.

**Batching:** I decided to provide the batching ability as option, just 
like we have had it previously... although the code would be much simpler 
without it. With batching disabled(batchSz=1), the degradation in throughput 
that I have seen is relatively smaller for JCQueue...but nevertheless some 
degradation exists.
 When you disable batching, JCQueue updates internal metrics more 
frequently which partly impacts throughput. Flush tuple Timer will not be 
started if batchSz=1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Considerably slow building website

2017-08-01 Thread P. Taylor Goetz

Thanks for the heads up. If I screw something up we can always regenerate from 
the release tag.

-Taylor

> On Aug 1, 2017, at 2:27 PM, Bobby Evans  wrote:
> 
> Be careful when removing the javadocs.  There are links to the javadocs from 
> within the docs themselves.
> 
> 
> - Bobby
> 
> 
> On Tuesday, August 1, 2017, 12:57:56 PM CDT, P. Taylor Goetz 
>  wrote:
> 
> I cleaned up the download page to remove some of the older releases and added 
> a link to archive.a.o for older releases. I will also clean up dist as 
> requested by infra.
> 
> While I’m at it, I’ll clean up the javadoc so we only include javadoc for 
> releases on the download page.
> 
> That should help a little bit, but I agree that the publishing process is 
> painful and would welcome any improvements.
> 
> One option (I haven’t tested yet) might be to simply move the javadoc to the 
> “publish” directory so it doesn’t get regenerated every time the site gets 
> published. That would mean the javadoc links won’t work when running Jekyll 
> locally, but I think it’s a fair trade off.
> 
> -Taylor
> 
>> On Aug 1, 2017, at 9:39 AM, Bobby Evans  wrote:
>> 
>> Rebuilding everything each time is sadly necessary as currently the 
>> header/footer for all of the content is inline in each page.  So if we add a 
>> new release every page changes.  To fix this we would have to change the 
>> header to dynamically include the HTML from another file that gets updated 
>> on it's own.
>> We might also want to think about rearranging things a bit, and reduce the 
>> number of releases that we have on the site.  Do we really need both 0.9.6 
>> and 0.9.7, or 0.10.0 through 0.10.2.  Maybe there is a way to archive some 
>> of these so they are a part of the final site, but are not generated each 
>> time? (probably would need the header change at a minimum to work)
>> 
>> 
>> - Bobby
>> 
>> 
>> On Tuesday, August 1, 2017, 6:01:03 AM CDT, Jungtaek Lim  
>> wrote:
>> 
>> I found I forgot to build website with "-d publish/" parameter. Now it
>> reduced to 1347.585 secs but that is still way too long
>> 
>> I've done some tests on building website ('jekyll build -d publish/
>> --profile'):
>> 
>> 1. as it is : 1347.585 secs
>> 2. excluding 'releases' directories : 2.38 secs
>> 3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
>> directory of releases : 45 secs
>> 
>> The build time is not stable but you can see how much the difference is. If
>> we can separate building doc for each release, that should be best and it
>> should reduce the build time greatly.
>> 
>> If we can't separate building doc, we may want to take alternative
>> approach: reducing maintaining releases. You can imagine that if we keep
>> adding docs for new releases in website repo it should increase overall
>> build time. I guess we may be better to provide only the last version of
>> version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
>> 2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning, major
>> changes shouldn't be introduced in bug-fix releases so don't need to
>> maintain docs separately.
>> 
>> I would like to gather opinions around this along with moving website to
>> git. Looking forward to hear others opinions.
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>> 
>> 2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:
>> 
>>> Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
>>> directory is not created in 'publish/releases' directory. Maybe also missed
>>> that.
>>> 
>>> 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
>>> 
 Hi devs,
 
 I'm trying to modify release note on 1.0.4 one of user reported about
 wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
 website locally. Any hints to reduce the time? 50 mins for only building
 the website is really annoying and anyone don't want to wait for that if we
 modify "a" file.
 
 And I found Storm 1.1.0 release note markdown file is missing. Taylor,
 could you add it back to the SVN repo?
 
 Thanks,
 Jungtaek Lim (HeartSaVioR)

Re: Considerably slow building website

2017-08-01 Thread Bobby Evans

Be careful when removing the javadocs.  There are links to the javadocs from 
within the docs themselves.


- Bobby


On Tuesday, August 1, 2017, 12:57:56 PM CDT, P. Taylor Goetz 
 wrote:

I cleaned up the download page to remove some of the older releases and added a 
link to archive.a.o for older releases. I will also clean up dist as requested 
by infra.

While I’m at it, I’ll clean up the javadoc so we only include javadoc for 
releases on the download page.

That should help a little bit, but I agree that the publishing process is 
painful and would welcome any improvements.

One option (I haven’t tested yet) might be to simply move the javadoc to the 
“publish” directory so it doesn’t get regenerated every time the site gets 
published. That would mean the javadoc links won’t work when running Jekyll 
locally, but I think it’s a fair trade off.

-Taylor
 
> On Aug 1, 2017, at 9:39 AM, Bobby Evans  wrote:
> 
> Rebuilding everything each time is sadly necessary as currently the 
> header/footer for all of the content is inline in each page.  So if we add a 
> new release every page changes.  To fix this we would have to change the 
> header to dynamically include the HTML from another file that gets updated on 
> it's own.
> We might also want to think about rearranging things a bit, and reduce the 
> number of releases that we have on the site.  Do we really need both 0.9.6 
> and 0.9.7, or 0.10.0 through 0.10.2.  Maybe there is a way to archive some of 
> these so they are a part of the final site, but are not generated each time? 
> (probably would need the header change at a minimum to work)
> 
> 
> - Bobby
> 
> 
> On Tuesday, August 1, 2017, 6:01:03 AM CDT, Jungtaek Lim  
> wrote:
> 
> I found I forgot to build website with "-d publish/" parameter. Now it
> reduced to 1347.585 secs but that is still way too long
> 
> I've done some tests on building website ('jekyll build -d publish/
> --profile'):
> 
> 1. as it is : 1347.585 secs
> 2. excluding 'releases' directories : 2.38 secs
> 3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
> directory of releases : 45 secs
> 
> The build time is not stable but you can see how much the difference is. If
> we can separate building doc for each release, that should be best and it
> should reduce the build time greatly.
> 
> If we can't separate building doc, we may want to take alternative
> approach: reducing maintaining releases. You can imagine that if we keep
> adding docs for new releases in website repo it should increase overall
> build time. I guess we may be better to provide only the last version of
> version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
> 2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning, major
> changes shouldn't be introduced in bug-fix releases so don't need to
> maintain docs separately.
> 
> I would like to gather opinions around this along with moving website to
> git. Looking forward to hear others opinions.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:
> 
>> Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
>> directory is not created in 'publish/releases' directory. Maybe also missed
>> that.
>> 
>> 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
>> 
>>> Hi devs,
>>> 
>>> I'm trying to modify release note on 1.0.4 one of user reported about
>>> wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
>>> website locally. Any hints to reduce the time? 50 mins for only building
>>> the website is really annoying and anyone don't want to wait for that if we
>>> modify "a" file.
>>> 
>>> And I found Storm 1.1.0 release note markdown file is missing. Taylor,
>>> could you add it back to the SVN repo?
>>> 
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>

Re: Considerably slow building website

2017-08-01 Thread P. Taylor Goetz

I cleaned up the download page to remove some of the older releases and added a 
link to archive.a.o for older releases. I will also clean up dist as requested 
by infra.

While I’m at it, I’ll clean up the javadoc so we only include javadoc for 
releases on the download page.

That should help a little bit, but I agree that the publishing process is 
painful and would welcome any improvements.

One option (I haven’t tested yet) might be to simply move the javadoc to the 
“publish” directory so it doesn’t get regenerated every time the site gets 
published. That would mean the javadoc links won’t work when running Jekyll 
locally, but I think it’s a fair trade off.

-Taylor
 
> On Aug 1, 2017, at 9:39 AM, Bobby Evans  wrote:
> 
> Rebuilding everything each time is sadly necessary as currently the 
> header/footer for all of the content is inline in each page.  So if we add a 
> new release every page changes.  To fix this we would have to change the 
> header to dynamically include the HTML from another file that gets updated on 
> it's own.
> We might also want to think about rearranging things a bit, and reduce the 
> number of releases that we have on the site.  Do we really need both 0.9.6 
> and 0.9.7, or 0.10.0 through 0.10.2.  Maybe there is a way to archive some of 
> these so they are a part of the final site, but are not generated each time? 
> (probably would need the header change at a minimum to work)
> 
> 
> - Bobby
> 
> 
> On Tuesday, August 1, 2017, 6:01:03 AM CDT, Jungtaek Lim  
> wrote:
> 
> I found I forgot to build website with "-d publish/" parameter. Now it
> reduced to 1347.585 secs but that is still way too long
> 
> I've done some tests on building website ('jekyll build -d publish/
> --profile'):
> 
> 1. as it is : 1347.585 secs
> 2. excluding 'releases' directories : 2.38 secs
> 3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
> directory of releases : 45 secs
> 
> The build time is not stable but you can see how much the difference is. If
> we can separate building doc for each release, that should be best and it
> should reduce the build time greatly.
> 
> If we can't separate building doc, we may want to take alternative
> approach: reducing maintaining releases. You can imagine that if we keep
> adding docs for new releases in website repo it should increase overall
> build time. I guess we may be better to provide only the last version of
> version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
> 2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning, major
> changes shouldn't be introduced in bug-fix releases so don't need to
> maintain docs separately.
> 
> I would like to gather opinions around this along with moving website to
> git. Looking forward to hear others opinions.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:
> 
>> Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
>> directory is not created in 'publish/releases' directory. Maybe also missed
>> that.
>> 
>> 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
>> 
>>> Hi devs,
>>> 
>>> I'm trying to modify release note on 1.0.4 one of user reported about
>>> wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
>>> website locally. Any hints to reduce the time? 50 mins for only building
>>> the website is really annoying and anyone don't want to wait for that if we
>>> modify "a" file.
>>> 
>>> And I found Storm 1.1.0 release note markdown file is missing. Taylor,
>>> could you add it back to the SVN repo?
>>> 
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>

[GitHub] storm issue #2202: STORM-2623: Add in a whitelist for scheduler strategies

2017-08-01 Thread knusbaum

Github user knusbaum commented on the issue:

https://github.com/apache/storm/pull/2202
  
@HeartSaVioR @kishorvpatil Please have a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [Discussion]: Storm Improvemement Proposal (SIP) to discuss changes

2017-08-01 Thread Harsha

Trying to bring attention this again. 
We currently have few big feature PRs going on and there is considerable
discussion about the design and Implementation etcc. My intention of
starting SIP is to add these details before someone goes and writes up a
PR and everyone has to go through reading of design and sometimes those
docs are not clear and we end up having long discussion on the PRs which
should mainly about the code review itself.
We should at least start making this process mandatory for any new big
feature especially to the storm-core. I am less concerned about
connectors and other parts which should have least resistive path and
they are usually easy to review.
If the devs put their thoughts and design and goes through discussion
and get everyone on the same line when the PR shows up it will be less
surprising and everyone involved know how the PR/Code supposed to work. 

-Harsha

On Fri, Jun 9, 2017, at 09:16 PM, Harsha wrote:
> Arun,
>For big features we did follow design doc/review. Making it
>formal makes everyone to follow a process. 
> Again this process is not for bug fixes as we stated its about New
> Features/Config Changes/Public interface changes. I don't think it puts
> any extra effort for anyone who is writing detailed JIRA but by making
> it formal makes everyone to add these details in a centra process. Not
> everyone will look at mailing list but its easier to follow a wiki page.
>  We should atleast give it a try before we vote it out.
> 
> Roshan,
>  Adding connector should require a SIP as well and changing any
>  public interfaces should be a KIP. Intention here is we've
>  central place where everyone can follow in detail whats the
>  public interface/new feature changes went in. We've changed
>  KafkaSpout quite a bit and there is current discussion thats
>  going to change it , having this documented in a central place
>  will make it easy to follow and recording them in release notes
>  as well.
> 
> Taylor,
> We can't call it a too tedious process without even giving it a
> try. This has been followed to a greater success at kafka and
> also Flink started the process as well
> 
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> .
> If it actually proved to more of hindrance than helping the community we
> can move away from it.
> 
> " Kafka has somewhat of a reputation for setting potentially too high a
> bar. I'd rather not see that happen with this community."
> Sure. But it also depends on the community. Just because some community
> enforcing too high bar that doesn't mean we are trying to do it via this
> process. Again we always have option if we ever veer too far in the
> wrong direction to bring up and improve or remove this process.
> 
> We should also as a community strive to have better quality and I am
> hoping this will give us a chance to not only let users know what are
> changes coming in but also keep the dev list to have a chance and join
> the discussion.
> 
> -Harsha
> 
> On Jun 9, 2017, 7:18 PM -0700, Arun Iyer , wrote:
> I am for documenting and upfront design reviews, but maybe we should
> keep it less formal and make it part of the JIRA to start with.
> 
> Do we have any upcoming features for which we would like to see a
> proposal? May be start with a couple of proposals
> and see it works out before making it formal.
> 
> 
> Thanks,
> Arun
> 
> 
> 
> 6/9/17, 6:49 PM, "P. Taylor Goetz"  wrote:
> 
> -0
> 
> The KIP process feels kind of heavy. I'd rather start with a lighter
> effort like improving JIRA submissions and pull requests (some pull
> requests/JIRAs, even from committers and PMC members, are woefully
> inadequate in terms of detail), and see how that works out.
> 
> I share Bobby's concern that doing so might raise the bar for
> contributions and potentially have a chilling effect. We don't want to
> scare away contributors. Kafka has somewhat of a reputation for setting
> potentially too high a bar. I'd rather not see that happen with this
> community.
> 
> I will say that I like the idea of proposals for big features, ideally
> before any coding even begins -- so that others have a chance to
> collaborate. But I'm hesitant to impose too much process, voting, etc.
> That could scare people off.
> 
> I think we should think carefully before going down this trail.
> 
> -Taylor
> 
> On Jun 9, 2017, at 8:57 PM, Priyank Shah  wrote:
> 
> +1 for SIPs including a new connector. The person writing SIP can
> provide details about the external system for which connector is being
> written to let others know why a certain design decision was made. This
> will make it easy for reviewers.
> 
> On 6/9/17, 5:24 PM, "Satish Duggana"  wrote:
> 
> +1 for SIPs. It is so useful as mentioned by others in

Re: Considerably slow building website

2017-08-01 Thread Jungtaek Lim

Another approach is separating release specific docs and common docs. I
found other projects are already doing that, most of them are using root
path as common, and docs// as release specific. Common
docs still need to provide menu to link the root page of each release docs
indeed.

This approach needs some redesign of docs: that is only blocker for the
approach.

2017년 8월 1일 (화) 오후 10:39, Bobby Evans 님이 작성:

> Rebuilding everything each time is sadly necessary as currently the
> header/footer for all of the content is inline in each page.  So if we add
> a new release every page changes.  To fix this we would have to change the
> header to dynamically include the HTML from another file that gets updated
> on it's own.
> We might also want to think about rearranging things a bit, and reduce the
> number of releases that we have on the site.  Do we really need both 0.9.6
> and 0.9.7, or 0.10.0 through 0.10.2.  Maybe there is a way to archive some
> of these so they are a part of the final site, but are not generated each
> time? (probably would need the header change at a minimum to work)
>
>
> - Bobby
>
>
> On Tuesday, August 1, 2017, 6:01:03 AM CDT, Jungtaek Lim <
> kabh...@gmail.com> wrote:
>
> I found I forgot to build website with "-d publish/" parameter. Now it
> reduced to 1347.585 secs but that is still way too long
>
> I've done some tests on building website ('jekyll build -d publish/
> --profile'):
>
> 1. as it is : 1347.585 secs
> 2. excluding 'releases' directories : 2.38 secs
> 3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
> directory of releases : 45 secs
>
> The build time is not stable but you can see how much the difference is. If
> we can separate building doc for each release, that should be best and it
> should reduce the build time greatly.
>
> If we can't separate building doc, we may want to take alternative
> approach: reducing maintaining releases. You can imagine that if we keep
> adding docs for new releases in website repo it should increase overall
> build time. I guess we may be better to provide only the last version of
> version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
> 2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning, major
> changes shouldn't be introduced in bug-fix releases so don't need to
> maintain docs separately.
>
> I would like to gather opinions around this along with moving website to
> git. Looking forward to hear others opinions.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:
>
> > Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
> > directory is not created in 'publish/releases' directory. Maybe also
> missed
> > that.
> >
> > 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
> >
> >> Hi devs,
> >>
> >> I'm trying to modify release note on 1.0.4 one of user reported about
> >> wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
> >> website locally. Any hints to reduce the time? 50 mins for only building
> >> the website is really annoying and anyone don't want to wait for that
> if we
> >> modify "a" file.
> >>
> >> And I found Storm 1.1.0 release note markdown file is missing. Taylor,
> >> could you add it back to the SVN repo?
> >>
> >> Thanks,
> >> Jungtaek Lim (HeartSaVioR)
> >>
> >

Re: possible to have supervisors without _eventlogger and _acker tasks

2017-08-01 Thread Bobby Evans

By default there are no `_eventlogger` tasks.  To have this feature enabled you 
need to turn it on by setting topology.eventlogger.executors to a positive 
number.  Ackers are on by default, but can be disabled by setting the number of 
topology.acker.executors to 0.  You should respect these when scheduling a 
topology because if they are supposed to be there and they are not scheduled 
messages will be sent to them, but they will be lost.  In the case of acking 
all of the tuples will time out.  In the case of the event logger the UI will 
show it working, but nothing will ever come out.
Now that is on a per topology basis, not on a per worker basis.  These bolts 
are like any other bolt.  They can be in any worker your scheduler wants to put 
them in.  When inserting an acker bolt it is using a keyed grouping connected 
to just about everything in your topology, so where you place it is not that 
critical as it is going to be talking to everything.  The event logger bolts 
are similar, but using a fields grouping based off of component id.  
https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357
You could try to be smart to try and collocate the component with the logger 
for it, but honestly this feature slows your topology down so much already it 
is probably not worth trying to optimize it as it really will only be used when 
you need to do some serious debugging.


- Bobby


On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi  
wrote:

hi guys
i'm working on my own scheduler for storm. i wonder what happens if i
create a worker process and put some tasks in it(bolt/spout tasks) but no
_eventlogger and _acker tasks. what happens? is it a problem? tuples
transferred/emitted from within tasks in this worker will be skipped or
they just use another _acker or _loggers in other workers?

thanks in advance

Re: Considerably slow building website

2017-08-01 Thread Bobby Evans

Rebuilding everything each time is sadly necessary as currently the 
header/footer for all of the content is inline in each page.  So if we add a 
new release every page changes.  To fix this we would have to change the header 
to dynamically include the HTML from another file that gets updated on it's own.
We might also want to think about rearranging things a bit, and reduce the 
number of releases that we have on the site.  Do we really need both 0.9.6 and 
0.9.7, or 0.10.0 through 0.10.2.  Maybe there is a way to archive some of these 
so they are a part of the final site, but are not generated each time? 
(probably would need the header change at a minimum to work)

- Bobby

On Tuesday, August 1, 2017, 6:01:03 AM CDT, Jungtaek Lim  
wrote:

I found I forgot to build website with "-d publish/" parameter. Now it
reduced to 1347.585 secs but that is still way too long

I've done some tests on building website ('jekyll build -d publish/
--profile'):

1. as it is : 1347.585 secs
2. excluding 'releases' directories : 2.38 secs
3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
directory of releases : 45 secs

The build time is not stable but you can see how much the difference is. If
we can separate building doc for each release, that should be best and it
should reduce the build time greatly.

If we can't separate building doc, we may want to take alternative
approach: reducing maintaining releases. You can imagine that if we keep
adding docs for new releases in website repo it should increase overall
build time. I guess we may be better to provide only the last version of
version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning, major
changes shouldn't be introduced in bug-fix releases so don't need to
maintain docs separately.

I would like to gather opinions around this along with moving website to
git. Looking forward to hear others opinions.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:

> Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
> directory is not created in 'publish/releases' directory. Maybe also missed
> that.
>
> 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
>
>> Hi devs,
>>
>> I'm trying to modify release note on 1.0.4 one of user reported about
>> wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
>> website locally. Any hints to reduce the time? 50 mins for only building
>> the website is really annoying and anyone don't want to wait for that if we
>> modify "a" file.
>>
>> And I found Storm 1.1.0 release note markdown file is missing. Taylor,
>> could you add it back to the SVN repo?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>

[GitHub] storm pull request #2252: Add explanation for issues@ mailing list

2017-08-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/2252


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2251: [STORM-2613] Tuples that aren't sampled shouldn't be cons...

2017-08-01 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2251
  
The failure looks unrelated.
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Considerably slow building website

2017-08-01 Thread Jungtaek Lim

I found I forgot to build website with "-d publish/" parameter. Now it
reduced to 1347.585 secs but that is still way too long

I've done some tests on building website ('jekyll build -d publish/
--profile'):

1. as it is : 1347.585 secs
2. excluding 'releases' directories : 2.38 secs
3. excluding 'releases' directories, and including '2.0.0-SNAPSHOT'
directory of releases : 45 secs

The build time is not stable but you can see how much the difference is. If
we can separate building doc for each release, that should be best and it
should reduce the build time greatly.

If we can't separate building doc, we may want to take alternative
approach: reducing maintaining releases. You can imagine that if we keep
adding docs for new releases in website repo it should increase overall
build time. I guess we may be better to provide only the last version of
version lines: 0.9.7, 0.10.2, 1.0.4, 1.1.0 (will be 1.1.1 soon),
2.0.0-SNAPSHOT, total 5 releases. If we respect semantic versioning, major
changes shouldn't be introduced in bug-fix releases so don't need to
maintain docs separately.

I would like to gather opinions around this along with moving website to
git. Looking forward to hear others opinions.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 8월 1일 (화) 오전 7:44, Jungtaek Lim 님이 작성:

> Also found that we don't expose 1.0.4 in documentation dropdown and 1.0.4
> directory is not created in 'publish/releases' directory. Maybe also missed
> that.
>
> 2017년 8월 1일 (화) 오전 7:36, Jungtaek Lim 님이 작성:
>
>> Hi devs,
>>
>> I'm trying to modify release note on 1.0.4 one of user reported about
>> wrong CHANGELOG. And surprisingly, it took about 50 mins to serve the
>> website locally. Any hints to reduce the time? 50 mins for only building
>> the website is really annoying and anyone don't want to wait for that if we
>> modify "a" file.
>>
>> And I found Storm 1.1.0 release note markdown file is missing. Taylor,
>> could you add it back to the SVN repo?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>

[GitHub] storm issue #2218: STORM-2614: Enhance stateful windowing to persist the win...

2017-08-01 Thread arunmahadevan

Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2218
  
@srdo , @satishd,  thanks for the review. Pushed some changes to address 
the latest review comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

possible to have supervisors without _eventlogger and _acker tasks

2017-08-01 Thread AMir Firouzi

hi guys
i'm working on my own scheduler for storm. i wonder what happens if i
create a worker process and put some tasks in it(bolt/spout tasks) but no
_eventlogger and _acker tasks. what happens? is it a problem? tuples
transferred/emitted from within tasks in this worker will be skipped or
they just use another _acker or _loggers in other workers?

thanks in advance

[GitHub] storm issue #2241: STORM-2306 : Messaging subsystem redesign.

2017-08-01 Thread roshannaik

Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2241
  
@revans2 
- Perhaps best to wait till I introduce the sleep strategy before  
retry-ing those lower throughput runs. This is my top priority right now.
- Great inputs on the max.spout.pending. My thoughts on the same:

1. Yes. We should not remove it until netty issue is fixed. Good to aim for 
fixing this in Storm 2.0 as max.spout.pending is only useful in ACK mode.
2.  Not clear about this issue
3. Yes the LoadAware model makes sense from a logical standpoint. Its perf 
bothered me enough that and I felt it might be hurting as well as helping the 
larger cause. I anticipated some "backpressure" on this decision. I am ok to 
re-enable it and document this as something to consider for perf tuning. For 
some of this perf testing, IMO we need  be conscious when to enable/disable it. 
Would be great if someone can look into improving its perf.
4. Will address the timers issue in a separate post.
5. Unclear about some aspects of this issue, but deprecation of cyclic 
topos seems reasonable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2251: [STORM-2613] Tuples that aren't sampled shouldn't be cons...

2017-08-01 Thread vinodkc

Github user vinodkc commented on the issue:

https://github.com/apache/storm/pull/2251
  
retest this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2252: Add explanation for issues@ mailing list

2017-08-01 Thread srdo

Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2252
  
+1, thanks for adding this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

2017-08-01 Thread roshannaik

Github user roshannaik commented on a diff in the pull request:

https://github.com/apache/storm/pull/2241#discussion_r130538411
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/supervisor/BasicContainer.java
 ---
@@ -346,7 +346,7 @@ protected String getWildcardDir(File dir) {
 }
 
 protected List frameworkClasspath(SimpleVersion topoVersion) {
-File stormWorkerLibDir = new File(_stormHome, "lib-worker");
+File stormWorkerLibDir = new File(_stormHome, "lib");
--- End diff --

Will revert it. After speaking to @HeartSaVioR figured out the build issue 
on my end that was needing this change. Its not needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2252: Add explanation for issues@ mailing list

2017-08-01 Thread HeartSaVioR

GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/storm/pull/2252

Add explanation for issues@ mailing list

We seem to miss describing the existence of issues@ and how to subscribe 
the list.
This will be also applied to [getting help page on 
website](http://storm.apache.org/getting-help.html) after reviewing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/storm 
add-issues-subscription-on-readme

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2252


commit 3f386c9174e98bc97f4786baa77539d9fbf17642
Author: Jungtaek Lim 
Date:   2017-08-01T07:32:37Z

Add explanation for issues@ mailing list




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-08-01 Thread arunmahadevan

Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r130528810
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/topology/PersistentWindowedBoltExecutor.java
 ---
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.topology;
+
+import java.util.AbstractCollection;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.locks.ReentrantLock;
+import java.util.function.Supplier;
+
+import org.apache.storm.Config;
+import org.apache.storm.state.KeyValueState;
+import org.apache.storm.state.State;
+import org.apache.storm.state.StateFactory;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.base.BaseWindowedBolt;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.windowing.DefaultEvictionContext;
+import org.apache.storm.windowing.Event;
+import org.apache.storm.windowing.EventImpl;
+import org.apache.storm.windowing.WindowLifecycleListener;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Collections.emptyIterator;
+import static org.apache.storm.topology.WindowPartitionCache.CacheLoader;
+import static org.apache.storm.topology.WindowPartitionCache.RemovalCause;
+import static 
org.apache.storm.topology.WindowPartitionCache.RemovalListener;
+
+/**
+ * Wraps a {@link IStatefulWindowedBolt} and handles the execution. Uses 
state and the underlying
+ * checkpointing mechanisms to save the tuples in window to state. The 
tuples are also kept in-memory
+ * by transparently caching the window partitions and checkpointing them 
as needed.
+ */
+public class PersistentWindowedBoltExecutor extends 
WindowedBoltExecutor implements IStatefulBolt {
+private static final Logger LOG = 
LoggerFactory.getLogger(PersistentWindowedBoltExecutor.class);
+private final IStatefulWindowedBolt statefulWindowedBolt;
+private transient TopologyContext topologyContext;
+private transient OutputCollector outputCollector;
+private transient WindowState state;
+private transient boolean stateInitialized;
+private transient boolean prePrepared;
+
+public PersistentWindowedBoltExecutor(IStatefulWindowedBolt bolt) {
+super(bolt);
+statefulWindowedBolt = bolt;
+}
+
+@Override
+public void prepare(Map topoConf, TopologyContext 
context, OutputCollector collector) {
+List registrations = (List) 
topoConf.getOrDefault(Config.TOPOLOGY_STATE_KRYO_REGISTER, new ArrayList<>());
+registrations.add(ConcurrentLinkedQueue.class.getName());
+registrations.add(LinkedList.class.getName());
+registrations.add(AtomicInteger.class.getName());
+registrations.add(EventImpl.class.getName());
+registrations.add(WindowPartition.class.getName());
+registrations.add(DefaultEvictionContext.class.getName());
+topoConf.put(Config.TOPOLOGY_STATE_KRYO_REGISTER, registrations);
+prepare(topoConf, context, collector,
+getWindowState(topoConf, context),
+getPartitionState(topoConf, context),
+getWindowSystemState(topoConf, context));
+}
+
+@Override
+protected void validate(Map topoConf,
+

[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-08-01 Thread arunmahadevan

Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r130528860
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/windowing/WindowManager.java ---
@@ -111,14 +125,86 @@ public void add(Event windowEvent) {
 LOG.debug("Got watermark event with ts {}", 
windowEvent.getTimestamp());
 }
 track(windowEvent);
-compactWindow();
+if (!stateful) {
+compactWindow();
+}
 }
 
 /**
  * The callback invoked by the trigger policy.
  */
 @Override
 public boolean onTrigger() {
+return stateful ? doOnTriggerStateful() : doOnTrigger();
+}
+
+private static class IteratorStatus {
+private boolean valid = true;
+
+void invalidate() {
+valid = false;
+}
+
+boolean isValid() {
+return valid;
+}
+}
+
+private static Iterator expiringIterator(Iterator inner, 
IteratorStatus status) {
+return new Iterator() {
+@Override
+public boolean hasNext() {
+if (status.isValid()) {
+return inner.hasNext();
+}
+throw new IllegalStateException("Stale iterator");
+}
+
+@Override
+public T next() {
+if (status.isValid()) {
+return inner.next();
+}
+throw new IllegalStateException("Stale iterator");
+}
+};
+}
+
+private boolean doOnTriggerStateful() {
+Supplier scanEventsStateful = 
this::scanEventsStateful;
+Iterator it = scanEventsStateful.get();
+boolean hasEvents = it.hasNext();
+if (hasEvents) {
+final IteratorStatus status = new IteratorStatus();
+LOG.debug("invoking windowLifecycleListener onActivation with 
iterator");
+// reuse the retrieved iterator
+Supplier wrapper = new Supplier() {
+Iterator initial = it;
+@Override
+public Iterator get() {
+if (status.isValid()) {
--- End diff --

Will update the message. 
Within execute, a bolt may invoke `get`/`getIter` multiple times. So here 
we have a supplier and return a new iterator each time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-08-01 Thread arunmahadevan

Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r130528788
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/topology/PersistentWindowedBoltExecutor.java
 ---
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.topology;
+
+import java.util.AbstractCollection;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.locks.ReentrantLock;
+import java.util.function.Supplier;
+
+import org.apache.storm.Config;
+import org.apache.storm.state.KeyValueState;
+import org.apache.storm.state.State;
+import org.apache.storm.state.StateFactory;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.base.BaseWindowedBolt;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.windowing.DefaultEvictionContext;
+import org.apache.storm.windowing.Event;
+import org.apache.storm.windowing.EventImpl;
+import org.apache.storm.windowing.WindowLifecycleListener;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Collections.emptyIterator;
+import static org.apache.storm.topology.WindowPartitionCache.CacheLoader;
+import static org.apache.storm.topology.WindowPartitionCache.RemovalCause;
+import static 
org.apache.storm.topology.WindowPartitionCache.RemovalListener;
+
+/**
+ * Wraps a {@link IStatefulWindowedBolt} and handles the execution. Uses 
state and the underlying
+ * checkpointing mechanisms to save the tuples in window to state. The 
tuples are also kept in-memory
+ * by transparently caching the window partitions and checkpointing them 
as needed.
+ */
+public class PersistentWindowedBoltExecutor extends 
WindowedBoltExecutor implements IStatefulBolt {
+private static final Logger LOG = 
LoggerFactory.getLogger(PersistentWindowedBoltExecutor.class);
+private final IStatefulWindowedBolt statefulWindowedBolt;
+private transient TopologyContext topologyContext;
+private transient OutputCollector outputCollector;
+private transient WindowState state;
+private transient boolean stateInitialized;
+private transient boolean prePrepared;
+
+public PersistentWindowedBoltExecutor(IStatefulWindowedBolt bolt) {
+super(bolt);
+statefulWindowedBolt = bolt;
+}
+
+@Override
+public void prepare(Map topoConf, TopologyContext 
context, OutputCollector collector) {
+List registrations = (List) 
topoConf.getOrDefault(Config.TOPOLOGY_STATE_KRYO_REGISTER, new ArrayList<>());
+registrations.add(ConcurrentLinkedQueue.class.getName());
+registrations.add(LinkedList.class.getName());
+registrations.add(AtomicInteger.class.getName());
+registrations.add(EventImpl.class.getName());
+registrations.add(WindowPartition.class.getName());
+registrations.add(DefaultEvictionContext.class.getName());
+topoConf.put(Config.TOPOLOGY_STATE_KRYO_REGISTER, registrations);
+prepare(topoConf, context, collector,
+getWindowState(topoConf, context),
+getPartitionState(topoConf, context),
+getWindowSystemState(topoConf, context));
+}
+
+@Override
+protected void validate(Map topoConf,
+

[GitHub] storm pull request #2251: [STORM-2613] Tuples that aren't sampled shouldn't ...

2017-08-01 Thread vinodkc

GitHub user vinodkc opened a pull request:

https://github.com/apache/storm/pull/2251

[STORM-2613] Tuples that aren't sampled shouldn't be considered for latency 
calculation

Set delta to -1 , so that time delay for tuples that aren't sampled will 
not be added in BoltExecutorStats.
This is a followup PR of  https://github.com/apache/storm/pull/2185 to fix  
1.x version

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/storm-1 br_fix_stat_issue_in_1.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2251


commit fa9262d2a1d4cbeb20b25e0f36c084f20308efa1
Author: vinodkc 
Date:   2017-08-01T06:17:50Z

Tuples that aren't sampled shouldn't be considered for latency calculion




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

38 matches

Mail list logo