from:"Amol Kekre"

Re: Contribution and committer guidelines

2019-01-29 Thread amol kekre

Justin,
I agree with your thoughts. Vetos are not rare in Apex. We are trying to
figure a way to get there.

Amol

On Tue, Jan 29, 2019 at 3:01 PM Justin Mclean 
wrote:

> Hi,
>
> If someone submits what you think is poor quality code just point it out
> to them and ask them to fix it or even better fix it yourself to show them
> what is expected. Vetoing something list that seems a little heavy handed
> and is not the best way to encourage community growth. It’s better to
> improve the quality of others contributions rather than blocking them from
> contributing. Vetos in practice are very rare, how many have actually
> occurred in this project? Wouldn't it be better to focus on practical ways
> to get people involved and increase contribution rather than hypothetical
> situations of when to veto a code change?
>
> Thanks,
> Justin

Re: Contribution and committer guidelines

2019-01-28 Thread amol kekre

Vlad,
We are discussing what qualifies as " technical justification". The
proposal also is for putting a time bound on the process.

Amol

On Mon, Jan 28, 2019 at 1:36 PM Pramod Immaneni 
wrote:

> On Mon, Jan 28, 2019 at 12:45 PM Vlad Rozov  wrote:
>
> > IMO, the performance criteria is quite vague and it needs to be taken on
> a
> > case by case basis. Fixing not critical bug or adding minor functionality
> > is different from fixing security issue or data loss/corruption and while
> > the first one never justifies performance degradation, the second one may
> > justify a significant performance degradation.
> >
>
> Yes it is only a guideline and the situation demands like security issue
> would necessiate or determine the appropriate thing to do.
>
>
> >
> > My question is specific to refactoring and/or code quality. Whether this
> > policy is accepted or not, -1 in code review is still a veto.
> >
>
> So we are discussing how we can improve the situation so contributors feel
> like contributing to the project as opposed to staying away from it, which
> I think we all agree is happening. In your email thread about attic
> discussion, a high bar was cited as the reason by at least 3 members and it
> has come up in the past as well. Hence this discussion on what we to do in
> this aspect. Could we relax some requirements without leading to unstable
> or unreliable software. The alternative is nothing would change and those
> contributors will keep away and the paucity of contributions will continue.
> It is wishful thinking but if some contributors come back and start
> contributing again, others might too and who knows in future we may be able
> to go back to the high bar.
>
> Thanks
>
>
> > Thank you,
> >
> > Vlad
> >
> >
> > > On Jan 28, 2019, at 11:58, Pramod Immaneni 
> > wrote:
> > >
> > > Amol regarding performance my thoughts were along similar lines but was
> > > concerned about performance degradation to the real-time path, that new
> > > changes can bring in. I would use stronger language than "do not
> degrade
> > > current performance significantly" at least for the real-time path, we
> > > could say something like "real-time path should have as minimum
> > performance
> > > degradation as possible". Regarding logic flaws, typically it is cut
> and
> > > dry and not very subjective. There are exceptions of course. Also,
> what I
> > > have seen with functionality testing, at least in this context where
> > there
> > > is no dedicated QA testing the code, is that not all code paths and
> > > combinations are exercised. Fixing, logic issues in the lower level
> > > functions etc, of the code, leads to overall better quality. We could
> > have
> > > the language in the guideline such that it defaults to resolving all
> > > logical flaws but also leaves the door open for exceptions. If there
> are
> > > any scenarios you have in mind, we can discuss those and call it out as
> > > part of those exceptions.
> > >
> > > Regarding Vlad's question, I would encourage folks who brought up this
> > > point in the earlier discussion, point to examples where they
> personally
> > > faced this problem. In my case I have seen long delays in merging PRs,
> > > sometimes months, not because the reviewer(s) didn't have time but
> > because
> > > it was stuck in back and forth discussions and disagreement on one or
> > > two points between contributor and reviewer(s). In the bigger scheme of
> > > things, in my opinion, those points were trivial and caused more angst
> > > than what would have taken to correct them in the future, had we gone
> one
> > > way vs the other. I have seen this both as a contributor and as
> > co-reviewer
> > > from my peer reviewers in the PR. I can dig into the archives and find
> > > those if needed.
> > >
> > > Thanks
> > >
> > > On Mon, Jan 28, 2019 at 8:43 AM Vlad Rozov  wrote:
> > >
> > >> Is there an example from prior PRs where it was not accepted/merged
> due
> > to
> > >> a disagreement between a contributor and a committer on the amount of
> > >> refactoring or code quality?
> > >>
> > >> Thank you,
> > >>
> > >> Vlad
> > >>
> > >>> On Jan 27, 2019, at 06:56, Chinmay Kolhatkar <
> > >> chinmaykolhatka...@gmail.com> wrote:
> > >>>
> > >>> +1.
> > >>>
&

Re: Contribution and committer guidelines

2019-01-28 Thread amol kekre

Vlad,
This thread is in regards to code review itself, and we are discussing what
conditions can a reviewer do a -1. In case a reviewer is ignoring the new
policies and continues to give -1 using previous thought process, we will
need to have a mechanism where the -1 is overridden. -1 needs an
explanation that fits with what we come up with on this thread.

Amol


On Mon, Jan 28, 2019 at 12:45 PM Vlad Rozov  wrote:

> IMO, the performance criteria is quite vague and it needs to be taken on a
> case by case basis. Fixing not critical bug or adding minor functionality
> is different from fixing security issue or data loss/corruption and while
> the first one never justifies performance degradation, the second one may
> justify a significant performance degradation.
>
> My question is specific to refactoring and/or code quality. Whether this
> policy is accepted or not, -1 in code review is still a veto.
>
> Thank you,
>
> Vlad
>
>
> > On Jan 28, 2019, at 11:58, Pramod Immaneni 
> wrote:
> >
> > Amol regarding performance my thoughts were along similar lines but was
> > concerned about performance degradation to the real-time path, that new
> > changes can bring in. I would use stronger language than "do not degrade
> > current performance significantly" at least for the real-time path, we
> > could say something like "real-time path should have as minimum
> performance
> > degradation as possible". Regarding logic flaws, typically it is cut and
> > dry and not very subjective. There are exceptions of course. Also, what I
> > have seen with functionality testing, at least in this context where
> there
> > is no dedicated QA testing the code, is that not all code paths and
> > combinations are exercised. Fixing, logic issues in the lower level
> > functions etc, of the code, leads to overall better quality. We could
> have
> > the language in the guideline such that it defaults to resolving all
> > logical flaws but also leaves the door open for exceptions. If there are
> > any scenarios you have in mind, we can discuss those and call it out as
> > part of those exceptions.
> >
> > Regarding Vlad's question, I would encourage folks who brought up this
> > point in the earlier discussion, point to examples where they personally
> > faced this problem. In my case I have seen long delays in merging PRs,
> > sometimes months, not because the reviewer(s) didn't have time but
> because
> > it was stuck in back and forth discussions and disagreement on one or
> > two points between contributor and reviewer(s). In the bigger scheme of
> > things, in my opinion, those points were trivial and caused more angst
> > than what would have taken to correct them in the future, had we gone one
> > way vs the other. I have seen this both as a contributor and as
> co-reviewer
> > from my peer reviewers in the PR. I can dig into the archives and find
> > those if needed.
> >
> > Thanks
> >
> > On Mon, Jan 28, 2019 at 8:43 AM Vlad Rozov  wrote:
> >
> >> Is there an example from prior PRs where it was not accepted/merged due
> to
> >> a disagreement between a contributor and a committer on the amount of
> >> refactoring or code quality?
> >>
> >> Thank you,
> >>
> >> Vlad
> >>
> >>> On Jan 27, 2019, at 06:56, Chinmay Kolhatkar <
> >> chinmaykolhatka...@gmail.com> wrote:
> >>>
> >>> +1.
> >>>
> >>> On Sat, 26 Jan 2019, 11:56 pm amol kekre  >>>
> >>>> +1 for this proposal. The only caveat I have is
> >>>> -> "acceptable performance and resolving logical flaws identified
> during
> >>>> the review process"
> >>>>
> >>>> is subjective. Functionally working should cover any logical issues.
> >>>> Performance should be applicable only to bug fixes and small
> >> enhancements
> >>>> to current features. I will word is as "do not degrade current
> >> performance
> >>>> significantly".
> >>>>
> >>>> Amol
> >>>>
> >>>>
> >>>> On Fri, Jan 25, 2019 at 9:41 PM Sanjay Pujare <
> sanjay.puj...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>> On Fri, Jan 25, 2019 at 5:20 PM Pramod Immaneni <
> >>>> pramod.imman...@gmail.com
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>&

Re: Contribution and committer guidelines

2019-01-26 Thread amol kekre

+1 for this proposal. The only caveat I have is
-> "acceptable performance and resolving logical flaws identified during
the review process"

is subjective. Functionally working should cover any logical issues.
Performance should be applicable only to bug fixes and small enhancements
to current features. I will word is as "do not degrade current performance
significantly".

Amol


On Fri, Jan 25, 2019 at 9:41 PM Sanjay Pujare 
wrote:

> +1
>
>
> On Fri, Jan 25, 2019 at 5:20 PM Pramod Immaneni  >
> wrote:
>
> > Our contributor and committer guidelines haven't changed in a while. In
> > light of the discussion that happened a few weeks ago, where
> > high commit threshold was cited as one of the factors discouraging
> > submissions, I suggest we discuss some ideas and see if the guidelines
> > should be updated.
> >
> > I have one. We pick some reasonable time period like a month after a PR
> is
> > submitted. If the PR review process is still going on *and* there is a
> > disagreement between the contributor and reviewer, we will look to see if
> > the submission satisfies some acceptable criteria and if it does we
> accept
> > it. We can discuss what those criteria should be in this thread.
> >
> > The basics should be met, such as code format, license, copyright, unit
> > tests passing, functionality working, acceptable performance and
> resolving
> > logical flaws identified during the review process. Beyond that, if there
> > is a disagreement with code quality or refactor depth between committer
> and
> > contributor or the contributor agrees but does not want to spend more
> time
> > on it at that moment, we accept the submission and create a separate JIRA
> > to track any future work. We can revisit the policy in future once code
> > submissions have picked up and do what's appropriate at that time.
> >
> > Thanks
> >
>

Re: [DISCUSS] Time for attic?

2019-01-09 Thread amol kekre

Vlad,
Would you want to continue to be involved in the project, even if this
involvement is itself causing community folks to stay away? If the issue is
cultural, things will not improve. Doing the same thing again and expecting
different result will not work. Why not change the policies that enforces a
different culture, and then wait 6 months to see if things change. With
regards to listing features, that needs to be something that the
contributors should decide.

Amol


On Wed, Jan 9, 2019 at 9:22 AM Vlad Rozov  wrote:

> Remember that to vote -1 it is necessary to provide justification, so I’d
> like to see the justifications and the plan from those who do not want to
> move Apex to the attic. I am also not very happy that my past efforts will
> be placed in the attic, but let’s face the reality. It is not that I don’t
> want to be involved in the project, but as the PMC I am responsible for
> maintaining the correct state of the project and with the current level of
> contributions, IMO, it belongs the the attic.
>
> Thank you,
>
> Vlad
>
> > On Jan 9, 2019, at 09:02, Pramod Immaneni 
> wrote:
> >
> > What would be the purpose of such a vote? From the discussions it is
> quite
> > apparent that there is a significant, possibly majority view that project
> > shouldn’t go to attic. The same could be reported to the board, can’t it?
> > Like I also said if you or others don’t like where the project is at and
> > feel it is a dead end, you don’t have to continue to be involved with the
> > project and that’s your prerogative. Let others who want to continue,
> take
> > it forward, why try to force your will on to everyone.
> >
> > Thanks
> >
> > On Wed, Jan 9, 2019 at 8:43 AM Vlad Rozov  wrote:
> >
> >> Without concrete details of what will be committed (support for k8s,
> >> hadoop 3.x, kafka 2.x, etc) and what requirements in code submission
> needs
> >> to be relaxed (well written java code, consistent code style, successful
> >> build with passing unit tests in CI, providing unit test, etc) the
> >> statements below are way too vague. Note that I started this e-mail
> thread
> >> with the intention to see what contributions the community may expect.
> >> Without concrete details of the future contribution, I’ll submit a vote
> by
> >> end of January.
> >>
> >> Thank you,
> >>
> >> Vlad
> >>
> >>> On Jan 9, 2019, at 00:47, priyanka gugale  wrote:
> >>>
> >>> I do believe and know of some work done in private forks by people.
> There
> >>> could be couple of reasons why it didn't go public. One could be high
> bar
> >>> for code submission (I don't have references at hand but that's general
> >>> feeling amongst committers) and other could be lack of motivation.
> >>>
> >>> Let's try to put some efforts to re-survive the work, motivate
> >> committers,
> >>> and take hard decisions later if nothing works. A product like Apex /
> >>> Malhar definitely deserves to survive.
> >>>
> >>> -Priyanka
> >>>
> >>> On Wed, Jan 9, 2019 at 12:07 PM Atri Sharma  wrote:
> >>>
>  The reason for a private fork was due to potential IP conflicts with
>  my current organization. I am working to get approvals and clearances,
>  and post that, shall publish the said effort.
> 
>  On Wed, Jan 9, 2019 at 12:02 PM Justin Mclean <
> jus...@classsoftware.com
> >>>
>  wrote:
> >
> > Hi,
> >
> >> I have a private fork for an experimental project. It might be open
> >> sourced in a couple of months.
> >
> > I’m curious, if you don’t mind answering a couple of questions:
> >
> > As you are a committer on this project is there any reason that this
>  work wasn’t done in public fork or even better on a branch of the Apex
>  repo? Why would a delay of a couple of months be required? If it’s “it
>  might be” what realistically are the chances of that happening?
> >
> > Thanks,
> > Justin
> 
>  --
>  Regards,
> 
>  Atri
>  Apache Concerted
> 
> >>
> >> --
> > Thanks,
> > Pramod
> > http://ts.la/pramod3443
>
>

Re: [DISCUSS] Time for attic?

2019-01-08 Thread amol kekre

In the past we have made it difficult for code to be committed. Declining
contributions is normal outcome of that. We should take a look at relaxing
the threshold for commits, welcome more folks including into PMC before
looking at taking the project to attic.

Amol


On Mon, Jan 7, 2019 at 8:24 PM Pramod Immaneni 
wrote:

> I would like to point out that there is no agreement among the project PMCs
> that the project should go to attic, not many are for it and quite a few
> oppose it. If a PMC member or committer cannot or does not wish to
> participate at any time, they can choose to remain inactive, there is no
> obligation. If someone feels strongly that they rather not remain inactive
> while being part of the project they can choose to resign, although that
> would be largely regrettable. Jan end sounds reasonable for the submission
> although it’s outcome should not determine the future of the project.
>
> Thanks
>
> On Mon, Jan 7, 2019 at 6:09 PM Vlad Rozov  wrote:
>
> > Let’s agree on ETA (by end of January?) for the contribution. Also, one
> > contribution in several months is not sufficient to keep project active,
> > IMO. Anybody else, especially other PMCs?
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > > On Jan 7, 2019, at 12:06, Pramod Immaneni 
> > wrote:
> > >
> > > Yes, I have an operator that I am trying to get clearance on before
> > > submitting. Will likely need a maven server to host a dependency that's
> > not
> > > on central.
> > >
> > > Thanks
> > >
> > > On Mon, Jan 7, 2019 at 9:08 AM Vlad Rozov  wrote:
> > >
> > >> Does anyone plan to contribute to the project in the near future?
> > >> Otherwise, I plan to submit a vote to move the project to Apache
> attic.
> > >>
> > >> Thank you,
> > >>
> > >> Vlad
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Pramod
> > > http://ts.la/pramod3443
> >
> > --
> Thanks,
> Pramod
> http://ts.la/pramod3443
>

Re: [ANNOUNCE] New Apache Apex PMC Member: Chinmay Kolhatkar

2018-05-24 Thread amol kekre

Congrats Chinmay

Amol

On Thu, May 24, 2018 at 10:34 AM, Hitesh Kapoor 
wrote:

> Congratulations Chinmay!!
>
> Regards,
> Hitesh Kapoor
>
> On Thu 24 May, 2018, 11:02 PM Ilya Ganelin,  wrote:
>
> > Congrats!
> >
> > On Thu, May 24, 2018, 10:09 AM Pramod Immaneni 
> wrote:
> >
> > > Congratulations Chinmay.
> > >
> > > On Thu, May 24, 2018 at 9:39 AM Thomas Weise  wrote:
> > >
> > > > The Apache Apex PMC is pleased to announce that Chinmay Kolhatkar is
> > now
> > > a
> > > > PMC member.
> > > >
> > > > Chinmay has contributed to Apex in many ways, including:
> > > >
> > > > - Various transform operators in Malhar
> > > > - SQL translation based on Calcite
> > > > - Apache Bigtop integration
> > > > - Docker sandbox
> > > > - Blogs and conference presentations
> > > >
> > > > We appreciate all his contributions to the project so far, and are
> > > looking
> > > > forward to more.
> > > >
> > > > Congrats!
> > > > Thomas, for the Apache Apex PMC.
> > > >
> > >
> >
>

Re: Core release 3.7.0

2018-01-17 Thread Amol Kekre

+1

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Wed, Jan 17, 2018 at 1:55 PM, Pramod Immaneni 
wrote:

> +1
>
> > On Jan 17, 2018, at 7:25 AM, Thomas Weise  wrote:
> >
> > Last release was 3.6.0 in May and following issues are ready for release:
> >
> > https://issues.apache.org/jira/issues/?jql=fixVersion%
> 20%3D%203.7.0%20AND%20project%20%3D%20APEXCORE%20ORDER%20BY%20status%20ASC
> >
> > Any opinions on cutting a release?
> >
> > Any committer interested running the release?
> >
> > Thanks,
> > Thomas
>
>

Re: Malhar release 3.8.0

2017-10-25 Thread Amol Kekre

+1 on a new malhar release.

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Tue, Oct 24, 2017 at 9:12 PM, Tushar Gosavi 
wrote:

> +1 on creating a new malhar release.
>
> - Tushar.
>
>
> On Wed, Oct 25, 2017 at 4:39 AM, Pramod Immaneni 
> wrote:
>
> > +1 on creating a new release. I, unfortunately, do not have the time
> > currently to participate in the release activities.
> >
> > On Mon, Oct 23, 2017 at 7:15 PM, Thomas Weise  wrote:
> >
> > > The last release was back in March, there are quite a few JIRAs that
> have
> > > been completed since and should be released.
> > >
> > > https://issues.apache.org/jira/issues/?jql=fixVersion%
> > > 20%3D%203.8.0%20AND%20project%20%3D%20APEXMALHAR%20ORDER%
> > > 20BY%20status%20ASC
> > >
> > > From looking at the list there is nothing that should stand in the way
> > of a
> > > release?
> > >
> > > Also, once the release is out it would be a good opportunity to effect
> > the
> > > major version change.
> > >
> > > Anyone interested to be the release manager?
> > >
> > > Thanks,
> > > Thomas
> > >
> >
>

Re: [DISCUSS] inactive PR

2017-09-27 Thread Amol Kekre

Vlad,
I am +1. Do proceed. I am not sure what the process is, i.e wait a day or
so to get folks to give final opinion, or just proceed. Either way, your
call.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Wed, Sep 27, 2017 at 8:05 PM, Pramod Immaneni 
wrote:

> It should be ok in my opinion to close the currently open inactive PRs that
> fall into that category once we have the guidelines updated.
>
> On Wed, Sep 27, 2017 at 9:52 AM, Vlad Rozov  wrote:
>
> > Based on the discussion I'll update contributor/committer guidelines to
> >
> > 1. ask a contributor to close PR when (s)he is not ready to work on it in
> > a timely manner
> > 2. allow committers to close inactive PR after 2 month of inactivity
> >
> > Any objections to closing existing (currently open) PRs that are inactive
> > for 2 month?
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 9/24/17 21:19, Vlad Rozov wrote:
> >
> >> Assuming that a contributor tries to open new PR using the same remote
> >> branch as the original PR instead of re-opening closed PR, github
> provides
> >> a notification reminding that one already exists, so I don't see why
> people
> >> will generally miss the old PR.
> >>
> >> The only case where closed PR can't be re-open is when the original
> >> (remote) branch was deleted and re-created or after a forced push to the
> >> original remote branch (that github can't distinguish from deleted and
> >> re-created branch). Would you agree that a forced push for a PR that was
> >> inactive for a significant period of time should be avoided as it will
> be
> >> impossible for reviewers to recollect comments without ability to see
> the
> >> old patch they (comments) apply to?
> >>
> >> Thank you,
> >>
> >> Vlad
> >>
> >> On 9/24/17 15:27, Pramod Immaneni wrote:
> >>
> >>> If PR is open, the previous comments are available in the same context
> >>> as new discussions. There is no need to remember to go back to a
> previous
> >>> closed PR to figure out what was discussed or what is outstanding.
> People
> >>> will generally miss the old PR and will either not reopen it or will go
> >>> through it, so its possible previous reviewers concerns would be lost.
> Also
> >>> I don’t think three months is not an unreasonable time to leave PRs
> open,
> >>> two could work.
> >>>
> >>> On Sep 24, 2017, at 2:56 PM, Vlad Rozov  wrote:
> 
>  If a PR is closed due to inactivity and a contributor fails to
> remember
>  that he/she open a PR in the past, what is the chance that a
> committer can
>  recollect what was discussed on a PR (whether it stays open or is
> closed)
>  that was inactive for 2-3 month :)? IMO, we should try to optimize
> process
>  for good community members (those who follow contributor guidelines)
> and
>  not those who do not follow.
> 
>  Thank you,
> 
>  Vlad
> 
>  On 9/24/17 09:29, Pramod Immaneni wrote:
> 
> > On Sep 24, 2017, at 9:21 AM, Thomas Weise  wrote:
> >>
> >> On Sun, Sep 24, 2017 at 9:08 AM, Pramod Immaneni <
> >> pra...@datatorrent.com >
> >> wrote:
> >>
> >> On Sep 24, 2017, at 8:28 AM, Thomas Weise  wrote:
> 
>  +1 for closing inactive PRs after documented period of inactivity
>  (contributor guidelines)
> 
>  There is nothing "draconian" or negative about closing a PR, it
> is a
>  function that github provides that should be used to improve
> 
> >>> collaboration.
> >>>
>  PR is a review tool, it is not good to have stale or abandoned PRs
> 
> >>> sitting
> >>>
>  as open. When there is no activity on a PR and it is waiting for
>  action
> 
> >>> by
> >>>
>  the contributor (not ready for review), it should be closed and
> then
>  re-opened once the contributor was able to move it forward and it
>  becomes
>  ready for review.
> 
>  Thomas
> 
> >>> Please refer to my email again, I am not against closing PR if
> there
> >>> is
> >>> inactivity. My issue is with the time period. In reality, most
> >>> people will
> >>> create new PRs instead of reopening old ones and the old
> >>> context/comments
> >>> will be forgotten and not addressed.
> >>>
> >>>
> >>> Why will contributors open new PRs even in cases where changes are
> >> requested on an open PR? Because it is not documented or reviewers
> >> don't
> >> encourage the proper process? We should solve that problem.
> >>
> > In cases where PR was closed due to inactivity and the contributor
> > comes back later to work on it, they are likely going to create a
> new PR as
> > opposed to finding the closed one and reopening it. The guidelines
>

Re: [DISCUSS] inactive PR

2017-09-24 Thread Amol Kekre

I am +1 on closing inactive PRs. Time wise 1 month looks short, 3 months
looks long to me. In either case I do not have strong opinion on the time
side; so I am 0+ on either.

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Sun, Sep 24, 2017 at 3:27 PM, Pramod Immaneni 
wrote:

> If PR is open, the previous comments are available in the same context as
> new discussions. There is no need to remember to go back to a previous
> closed PR to figure out what was discussed or what is outstanding. People
> will generally miss the old PR and will either not reopen it or will go
> through it, so its possible previous reviewers concerns would be lost. Also
> I don’t think three months is not an unreasonable time to leave PRs open,
> two could work.
>
> > On Sep 24, 2017, at 2:56 PM, Vlad Rozov  wrote:
> >
> > If a PR is closed due to inactivity and a contributor fails to remember
> that he/she open a PR in the past, what is the chance that a committer can
> recollect what was discussed on a PR (whether it stays open or is closed)
> that was inactive for 2-3 month :)? IMO, we should try to optimize process
> for good community members (those who follow contributor guidelines) and
> not those who do not follow.
> >
> > Thank you,
> >
> > Vlad
> >
> > On 9/24/17 09:29, Pramod Immaneni wrote:
> >>> On Sep 24, 2017, at 9:21 AM, Thomas Weise  wrote:
> >>>
> >>> On Sun, Sep 24, 2017 at 9:08 AM, Pramod Immaneni <
> pra...@datatorrent.com >
> >>> wrote:
> >>>
> > On Sep 24, 2017, at 8:28 AM, Thomas Weise  wrote:
> >
> > +1 for closing inactive PRs after documented period of inactivity
> > (contributor guidelines)
> >
> > There is nothing "draconian" or negative about closing a PR, it is a
> > function that github provides that should be used to improve
>  collaboration.
> > PR is a review tool, it is not good to have stale or abandoned PRs
>  sitting
> > as open. When there is no activity on a PR and it is waiting for
> action
>  by
> > the contributor (not ready for review), it should be closed and then
> > re-opened once the contributor was able to move it forward and it
> becomes
> > ready for review.
> >
> > Thomas
>  Please refer to my email again, I am not against closing PR if there
> is
>  inactivity. My issue is with the time period. In reality, most people
> will
>  create new PRs instead of reopening old ones and the old
> context/comments
>  will be forgotten and not addressed.
> 
> 
> >>> Why will contributors open new PRs even in cases where changes are
> >>> requested on an open PR? Because it is not documented or reviewers
> don't
> >>> encourage the proper process? We should solve that problem.
> >> In cases where PR was closed due to inactivity and the contributor
> comes back later to work on it, they are likely going to create a new PR as
> opposed to finding the closed one and reopening it. The guidelines can
> include proper process but most likely this is one of those things that
> will require checking on the committers part.
> >>
> >>
> >
>
>

Re: checking dependencies for known vulnerabilities

2017-09-08 Thread Amol Kekre

Vlad,
Assuming you are in agreement that vulnerabilities should not be shown in
public way; how would failing the build help. The reasons for failure will
have be noted in public to be worked on. Anyway, IMO Apex may be better off
exposing CVE as we are better off knowing these. But if folks want to
details suppressed I am fine with it.

The more important part is to amortize the cost of fixing CVE in current
dependencies over time as pointed by you by lowering severity level
gradually.

Thks,
Amol

E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com

On Fri, Sep 8, 2017 at 3:21 PM, Pramod Immaneni 
wrote:

> Manually during release is fine but we do need to come up with a process to
> shorten the cycle it will take to address these. Maybe we can come up
> with some guidelines on how to identify when something does not affect the
> software, for example, if a vulnerability comes into picture when a
> particular library is used in some way and we don't use it that way. It
> could serve as an initial filter and those that make it out of that will
> need deeper analysis to figure out whether they are real issues.
>
> Thanks
>
> On Fri, Sep 8, 2017 at 3:10 PM, Thomas Weise  wrote:
>
> > On Fri, Sep 8, 2017 at 2:33 PM, Pramod Immaneni 
> > wrote:
> >
> > > Second and more importantly, the vulnerabilities cannot be
> > > reported in a public way which integrating with the open build systems
> > will
> > > do.
> >
> >
> > How about implementing it so that it can be run manually, for example as
> > part of a release?
> >
> > False alarms are a problem, but ultimately relevant vulnerabilities will
> > need to be identified and fixed. It's part of project maintenance (like
> CI
> > and other times), which cannot be neglected.
> >
>

Re: following committer guideline when merging PR

2017-09-08 Thread Amol Kekre

Make sense. +1

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Sep 8, 2017 at 9:16 AM, Vlad Rozov  wrote:

> Committers,
>
> Please make sure to follow Apex community guideline when merging PR
> http://apex.apache.org/contributing.html.
>
> 1. Ensure that basic requirements for a pull request are met. This
>includes:
>  * Sufficient time has passed for others to review
>  * PR was suffiently reviewed and comments were addressed.
>Seevoting policy .
>  * When there are multiple reviewers, wait till other reviewers
>approve, with timeout of 48 hours before merging
>  * /If the PR was open for a long time, email dev@ declaring intent
>to merge/
>  * Commit messages and PR title need to reference JIRA (pull
>requests will be linked to ticket)
>  * /Travis CI and Jenkins pull request build needs to pass/
>  * /Ensure tests are added/modified for new features or fixes/
>  * Ensure appropriate JavaDoc comments have been added
>  * Verify contributions don't depend on incompatible licences
>(seehttps://www.apache.org/legal/resolved.html#category-x)
> 2. Use the github/rebase and merge/option or the git command line to
>merge the pull request (see link|view command line options|on the PR).
> 3. /Update JIRA after pushing the changes. Set the|Fix
>version|field and resolve the JIRA with proper resolution. *Also
>verify that other fields (type, priority, assignee) are correct*./
>
>
> A couple of recent PR merges (#661, #669) to apex-malhar require a second
> look from the committers.
>
> Thank you,
>
> Vlad
>

Re: following committer guideline when merging PR

2017-09-08 Thread Amol Kekre

Make sense. +1

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Sep 8, 2017 at 9:16 AM, Vlad Rozov  wrote:

> Committers,
>
> Please make sure to follow Apex community guideline when merging PR
> http://apex.apache.org/contributing.html.
>
> 1. Ensure that basic requirements for a pull request are met. This
>includes:
>  * Sufficient time has passed for others to review
>  * PR was suffiently reviewed and comments were addressed.
>Seevoting policy .
>  * When there are multiple reviewers, wait till other reviewers
>approve, with timeout of 48 hours before merging
>  * /If the PR was open for a long time, email dev@ declaring intent
>to merge/
>  * Commit messages and PR title need to reference JIRA (pull
>requests will be linked to ticket)
>  * /Travis CI and Jenkins pull request build needs to pass/
>  * /Ensure tests are added/modified for new features or fixes/
>  * Ensure appropriate JavaDoc comments have been added
>  * Verify contributions don't depend on incompatible licences
>(seehttps://www.apache.org/legal/resolved.html#category-x)
> 2. Use the github/rebase and merge/option or the git command line to
>merge the pull request (see link|view command line options|on the PR).
> 3. /Update JIRA after pushing the changes. Set the|Fix
>version|field and resolve the JIRA with proper resolution. *Also
>verify that other fields (type, priority, assignee) are correct*./
>
>
> A couple of recent PR merges (#661, #669) to apex-malhar require a second
> look from the committers.
>
> Thank you,
>
> Vlad
>

Re: [DISCUSS] Major version change for Apex Library (Malhar)

2017-09-01 Thread Amol Kekre

This vote was not done per process. The discussion was still on going. A
decision that is more of code impact (consensus) is being called a
procedural decision (majority vote). Moreover end of vote day/time was also
not called ahead of the vote to determine when the vote ends. These seem to
be a premise that only a few care about project. All red flags.

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Sep 1, 2017 at 5:49 PM, Thomas Weise <t...@apache.org> wrote:

> The first step in allowing a real community to grow would be to wear the
> project hat, participate in discussions as individual, and consider how to
> enable changes vs. trying to block active community members that contribute
> on their own time from taking the project forward.
>
> Versioning and parallel release lines exist for a reason. Nothing needs to
> be reinvented, everything that is needed to not disrupt existing users
> while allowing for changes that evolve a product already exists.
>
> A number of folks don't wear the project hat, don't contribute in a
> constructive manner and are otherwise not actively visible in the project.
> Do they participate in this discussion out of their own interest or because
> they are paid to do so? That combined with a look at the contributor stats
> should provide a fairly good orientation.
>
> This discussion here is about making changes and evolve the project, not to
> organize a DataTorrent & friends blockade. Put your own effort, research,
> follow a discussion thread, present your own opinion. Separately, it will
> be necessary to take up the topic of project independence at the PMC level.
>
> Thomas
>
>
> On Fri, Sep 1, 2017 at 3:14 PM, Sandeep Deshmukh <
> sandeep.deshm...@gmail.com
> > wrote:
>
> > I totally agree with Sandesh. Things are being pushed when there is clear
> > disagreement. If Apex has to grow the community, it can't grow using
> divide
> > and conquer method.
> >
> > On Fri, Sep 1, 2017 at 10:45 PM, Sandesh Hegde <sand...@datatorrent.com>
> > wrote:
> >
> > > Using all the technicalities and loop holes, we can declare many votes
> > > invalid. What purpose does it solve? This thread is dividing the
> > community,
> > > instead of recognizing the difference if we move forward with this,
> there
> > > is a chance that Apex will alienate many contributors. What's the end
> > game
> > > here? At what cost?
> > >
> > > On Fri, Sep 1, 2017 at 9:31 AM Thomas Weise <t...@apache.org> wrote:
> > >
> > > > Yes, you would need a separate discussion/vote on changes not being
> > > > reflected in master that you make to a branch (current procedure).
> > > >
> > > > Regarding procedural vote, the decision to start development towards
> > new
> > > > major release is a longer term decision, not just code change.
> > > >
> > > > https://www.apache.org/foundation/glossary.html#MajorityApproval
> > > >
> > > > "Refers to a vote (sense 1) which has completed with at least three
> > > binding
> > > > +1 votes and more +1 votes than -1 votes. ( I.e. , a simple majority
> > > with a
> > > > minimum quorum of three positive votes.) Note that in votes requiring
> > > > majority approval a -1 vote is simply a vote against, not a veto.
> > Compare
> > > > Consensus Approval. See also the description of the voting process."
> > > >
> > > >
> > > > For code modifications the rules are different, -1 is a veto that
> needs
> > > to
> > > > have a valid technical reason why the change cannot be made.
> Otherwise
> > it
> > > > is void. None of the -1s in the vote result provide such
> justification.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > >
> > > > On Thu, Aug 31, 2017 at 10:06 PM, Pramod Immaneni <
> > > pra...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Thomas,
> > > > >
> > > > > Wouldn't you need to call a separate procedural vote for whether
> > > changes
> > > > > cannot be allowed into 3.x without requiring they be submitted to
> 4.x
> > > as
> > > > > there was a disagreement there? Also, I am not sure that the
> > procedural
> > > > > vote argument can be used here for 4.x given that it involves
> > > > modifications
> > > > > to exi

Re: [DISCUSS] changing project name to apex-library

2017-08-25 Thread Amol Kekre

Vlad,
Concerns have not been addressed. There is a disconnect on the need to do
this now, and then on how to do so.

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Aug 25, 2017 at 9:01 AM, Vlad Rozov <vro...@apache.org> wrote:

> How do we move from here? If all the concerns regarding version and
> artifactId change are addressed we should move forward with the vote, if
> not, it will be good to raise them here rather than in the voting thread.
>
> Thank you,
>
> Vlad
>
>
> On 8/24/17 10:26, Thomas Weise wrote:
>
>> On Thu, Aug 24, 2017 at 9:42 AM, Amol Kekre <a...@datatorrent.com> wrote:
>>
>> In terms of rebasing versions, there is no urgency in mimic-ing some of
>>> the
>>> other projects. Apex has already be been versioned.
>>>
>>
>> There is an expectation users have for a version number, which is
>> different
>> for 3.x or 1.x or 0.x. Apex library maturity is nowhere near 3.x. That was
>> already discussed.
>>
>> What functional gain do
>>
>>> we have by changing versions, names? Functionality wise Apex users do not
>>> gain anything. With regards to bumping to 4.X, we should wait for a
>>> proposal/plan for a new functional api.
>>>
>>> Addition of such API does not require major version change. New API have
>> been added and no major version change was done. Major version change is
>> for backward incompatible changes.
>>
>> Examples:
>> - rename packages
>> - remove deprecated code
>> - relocate operators that were not designed for production use
>> - change to functionality of operators
>>
>> There is an illusion of backward compatibility (which does not exist
>> today). That cannot be used as justification to not make changes.
>>
>>
>> On Wed, Aug 23, 2017 at 10:26 AM, Vlad Rozov <vro...@apache.org> wrote:
>>>
>>> Please see my comments in-line.
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>> On 8/23/17 09:11, Pramod Immaneni wrote:
>>>>
>>>> That is not accurate, I have mentioned and probably others as well that
>>>>> changing the name of the project would be disruptive to users. Users
>>>>> are
>>>>> used to using the malhar project and its artifacts a certain way and
>>>>>
>>>> this
>>>
>>>> would cause them immediate confusion followed by consternation and then
>>>>> changes that could extend beyond their application such as
>>>>> documentation
>>>>> etc.
>>>>>
>>>>> Changing the name is as disruptive to users as changing minor/patch
>>>> version. I don't see a big difference in changing one line in pom.xml
>>>> (version) vs changing 2 lines (version and artifact). There is a bigger
>>>> change/disruption that does IMO require major version change and
>>>> renaming
>>>> project to use the single brand (Apache Apex) at the same time is
>>>> beneficial both to the project and users. Changing package and major
>>>> version will impact documentation in much bigger way compared to
>>>> changing
>>>> artifactId.
>>>>
>>>> Second the project has been around for quite some time and has reached a
>>>>> version 3.x, the second part of the proposed change is to reset it back
>>>>>
>>>> to
>>>
>>>> 1.0-SNAPSHOT. I don't think that is accurate for the project and the
>>>>> maturity it would portray to the users. Not to get subjective but there
>>>>> are
>>>>> operators in malhar that are best of the breed when it comes to
>>>>>
>>>> streaming
>>>
>>>> functionality they achieve.
>>>>>
>>>>> There are many Apache projects that were around much longer than malhar
>>>> and have not yet reached 3.x version even though they are also used in
>>>> production and are considered more stable. Number of evolving packages
>>>>
>>> and
>>>
>>>> interfaces in malhar do not qualify it for 3.x or 4.x. IMO, version must
>>>>
>>> be
>>>
>>>> driven by the engineering/community, not by the marketing.
>>>>
>>>> Third think about all the changes it would need, code, project
>>>>> infrastructure such as github repo and jira project, documentation,
>>>

Re: [DISCUSS] changing project name to apex-library

2017-08-24 Thread Amol Kekre

In terms of rebasing versions, there is no urgency in mimic-ing some of the
other projects. Apex has already be been versioned. What functional gain do
we have by changing versions, names? Functionality wise Apex users do not
gain anything. With regards to bumping to 4.X, we should wait for a
proposal/plan for a new functional api.

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Wed, Aug 23, 2017 at 10:26 AM, Vlad Rozov  wrote:

> Please see my comments in-line.
>
> Thank you,
>
> Vlad
>
> On 8/23/17 09:11, Pramod Immaneni wrote:
>
>> That is not accurate, I have mentioned and probably others as well that
>> changing the name of the project would be disruptive to users. Users are
>> used to using the malhar project and its artifacts a certain way and this
>> would cause them immediate confusion followed by consternation and then
>> changes that could extend beyond their application such as documentation
>> etc.
>>
> Changing the name is as disruptive to users as changing minor/patch
> version. I don't see a big difference in changing one line in pom.xml
> (version) vs changing 2 lines (version and artifact). There is a bigger
> change/disruption that does IMO require major version change and renaming
> project to use the single brand (Apache Apex) at the same time is
> beneficial both to the project and users. Changing package and major
> version will impact documentation in much bigger way compared to changing
> artifactId.
>
>>
>> Second the project has been around for quite some time and has reached a
>> version 3.x, the second part of the proposed change is to reset it back to
>> 1.0-SNAPSHOT. I don't think that is accurate for the project and the
>> maturity it would portray to the users. Not to get subjective but there
>> are
>> operators in malhar that are best of the breed when it comes to streaming
>> functionality they achieve.
>>
> There are many Apache projects that were around much longer than malhar
> and have not yet reached 3.x version even though they are also used in
> production and are considered more stable. Number of evolving packages and
> interfaces in malhar do not qualify it for 3.x or 4.x. IMO, version must be
> driven by the engineering/community, not by the marketing.
>
>>
>> Third think about all the changes it would need, code, project
>> infrastructure such as github repo and jira project, documentation,
>> website
>> etc and the time all the developers have to spend to adapt to this.
>> Wouldn't we want to spend this time doing something more productive.
>>
> I don't think it is as drastic as it looks to be. It was done in a past
> and is supported by all tools involved.
>
>>
>> I would think changing a project name and resetting the version is a big
>> deal and should be done if there something big to gain for the project by
>> doing this. What is the big gain we achieve to justify all this
>> consternation? If we want to increase adoption, one of the things we need
>> to do is to provide users with a platform that behaves in an expected and
>> stable manner.
>>
> It will be good to provide details why is it "a big deal". Why changing
> groupId was not a big deal and changing artifactId is a big deal?
>
> I completely agree with the increasing adoption, but it comes from the
> quality, not from the quantity and whether version is 1.x, 3.x or 4.x does
> not change the quality of the library.
>
>>
>> Thanks
>>
>>
>> On Wed, Aug 23, 2017 at 8:09 AM Vlad Rozov  wrote:
>>
>> All -1 are technically void at this point as justification given are why
>>> project may continue without modifications and not why the modification
>>> must not be done. Whether we proceed with the vote or with the
>>> discussion, arguments should be what are pros and cons of a code change,
>>> not that the project may continue without them. The same should apply
>>> not only to the current set of changes, but to all future discussions.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On 8/23/17 06:54, Thomas Weise wrote:
>>>
 The discussion already took place [1]. There are two options under vote

>>> out
>>>
 of that discussion and for the first option there is a single -1. Use of

>>> -1
>>>
 during voting (and veto on PR) when not showing up during the preceding
 discussion is problematic.

 Thomas

 [1] https://lists.apache.org/thread.html/bd1db8a2d01e23b0c0ab98a785f6ee
 9492a1ac9e52d422568a46e5f3@%3Cdev.apex.apache.org%3E

 On Wed, Aug 23, 2017 at 1:58 AM, Justin Mclean <
 jus...@classsoftware.com

 wrote:

 Hi,
>
> Votes are only valid on code modifications with a reason. [1]
>
> However it looks to me that there’s not consensus and which way forward
>
 is
>>>
 best I would suggest cancelling the vote and having a discussion of the
> benefit or not of making the change.
>
> Thanks,
> Justin
>

Re: -1 or veto voting

2017-08-23 Thread Amol Kekre

Thomas,
My worry is that consequences of main-branch being 4.x have not been
discussed in detail. How about we take that up on discussion thread. I can
volunteer to put 4.x to vote post that discussion.

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Wed, Aug 23, 2017 at 7:03 AM, Thomas Weise <t...@apache.org> wrote:

> The earlier discussion had concerns about making changes in 3.x and the
> expressed preference was major version change. Accordingly the vote is for
> major version change.
>
>
> On Wed, Aug 23, 2017 at 6:56 AM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > The earlier discussion had concerns about this vote and the need to brand
> > to 4.x right now. IMO they were not sufficiently addressed.
> >
> > Thks
> > Amol
> >
> >
> > E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
> >
> > www.datatorrent.com
> >
> >
> > On Wed, Aug 23, 2017 at 6:54 AM, Thomas Weise <t...@apache.org> wrote:
> >
> > > The discussion already took place [1]. There are two options under vote
> > out
> > > of that discussion and for the first option there is a single -1. Use
> of
> > -1
> > > during voting (and veto on PR) when not showing up during the preceding
> > > discussion is problematic.
> > >
> > > Thomas
> > >
> > > [1] https://lists.apache.org/thread.html/
> bd1db8a2d01e23b0c0ab98a785f6ee
> > > 9492a1ac9e52d422568a46e5f3@%3Cdev.apex.apache.org%3E
> > >
> > > On Wed, Aug 23, 2017 at 1:58 AM, Justin Mclean <
> jus...@classsoftware.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Votes are only valid on code modifications with a reason. [1]
> > > >
> > > > However it looks to me that there’s not consensus and which way
> forward
> > > is
> > > > best I would suggest cancelling the vote and having a discussion of
> the
> > > > benefit or not of making the change.
> > > >
> > > > Thanks,
> > > > Justin
> > > >
> > > > 1. https://www.apache.org/foundation/voting.html
> > >
> >
>

Re: -1 or veto voting

2017-08-23 Thread Amol Kekre

The earlier discussion had concerns about this vote and the need to brand
to 4.x right now. IMO they were not sufficiently addressed.

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Wed, Aug 23, 2017 at 6:54 AM, Thomas Weise  wrote:

> The discussion already took place [1]. There are two options under vote out
> of that discussion and for the first option there is a single -1. Use of -1
> during voting (and veto on PR) when not showing up during the preceding
> discussion is problematic.
>
> Thomas
>
> [1] https://lists.apache.org/thread.html/bd1db8a2d01e23b0c0ab98a785f6ee
> 9492a1ac9e52d422568a46e5f3@%3Cdev.apex.apache.org%3E
>
> On Wed, Aug 23, 2017 at 1:58 AM, Justin Mclean 
> wrote:
>
> > Hi,
> >
> > Votes are only valid on code modifications with a reason. [1]
> >
> > However it looks to me that there’s not consensus and which way forward
> is
> > best I would suggest cancelling the vote and having a discussion of the
> > benefit or not of making the change.
> >
> > Thanks,
> > Justin
> >
> > 1. https://www.apache.org/foundation/voting.html
>

Re: [VOTE] Major version change for Apex Library (Malhar)

2017-08-22 Thread Amol Kekre

On just voting part, I remain -1 on both options

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Tue, Aug 22, 2017 at 4:35 PM, Pramod Immaneni 
wrote:

> I think we should take this discussion to a separate thread as it is a vote
> thread. I don't see a need for this change now as there isn't enough
> justification (such as things are falling apart without this) for the
> disruption it will cause. My earlier point is that there was a
> justification when the project started to change the groupid and it is not
> the same now.
>
> Thanks
>
> On Tue, Aug 22, 2017 at 2:45 PM, Vlad Rozov  wrote:
>
> > Do you mean that prior to groupId change nobody was using that groupId or
> > that nobody was using the library itself :)? If nobody was using the
> > library, the version 3.x at the beginning of the project is questionable.
> >
> > My question is why -1 (veto) as long as things won't fall apart either
> way.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 8/22/17 14:09, Pramod Immaneni wrote:
> >
> >> The groupId change was done at the beginning of the project about two
> >> years
> >> ago before there was an apex release for anyone to use.
> >>
> >> On Tue, Aug 22, 2017 at 1:39 PM, Vlad Rozov  wrote:
> >>
> >> I would argue that things won't fall apart in both cases whether
> >>> artifactId and version are changed or not, so I don't see why it is -1
> >>> for
> >>> the option 2. When groupId was changed from com.datatorrent to
> >>> org.apache.apex, things have not fall apart :).
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 8/22/17 08:31, Pramod Immaneni wrote:
> >>>
> >>> +1 for option 1
>  -1 for option 2 as I see no impending need to do this now, as in if we
>  don't do this, things will fall apart. It will be a source of more
>  disruption and confusion. Malhar has been around for quite some time,
>  evolving and growing during this period and going to version 4.0 would
>  be
>  a
>  natural progression. Since this is a major version change, there is
> more
>  of
>  a license to relegate things that are deemed unsuitable for production
>  use
>  to contrib (an area designated for that purpose), remove deprecated
>  items,
>  move things around and possibly even make backwards incompatible
>  functionality changes so I don't see a need to change the artifact id
>  and
>  identity of the project.
> 
>  Thanks
> 
>  On Tue, Aug 22, 2017 at 8:16 AM, Munagala Ramanath <
>  amberar...@yahoo.com.invalid> wrote:
> 
>  +1 for option 2 (primary)
> 
> > +1 for option 1 (secondary)
> > Ram
> >
> >
> > On Tuesday, August 22, 2017, 6:58:46 AM PDT, Vlad Rozov <
> > vro...@apache.org>
> > wrote:
> >
> > +1 for option 2 (primary)
> > +1 for option 1 (secondary)
> >
> > Thank you,
> >
> > Vlad
> >
> > On 8/21/17 23:37, Ananth G wrote:
> >
> > +1 for option 2 and second vote for option 1
> >>
> >> Have we finalized the library name ? Going from Apex-malhar 3.7 to
> >>
> >> Apex-malhar-1.0 would be counter intuitive. Also it would be great
> if
> > we
> > have an agreed process to mark an operator from @evolving to stable
> > version
> > given we are trying to address this as well as part of the proposal
> >
> > Regards
> >> Ananth
> >>
> >> On 22 Aug 2017, at 11:40 am, Thomas Weise  wrote:
> >>
> >>> +1 for option 2 (second vote +1 for option 1)
> >>>
> >>>
> >>> On Mon, Aug 21, 2017 at 6:39 PM, Thomas Weise 
> >>> wrote:
> >>>
>  This is to formalize the major version change for Malhar discussed
>  in
> 
>  [1].
> >>>
> >> There are two options for major version change. Major version change
> >>
> >>> will
> >>>
> >> rename legacy packages to org.apache.apex sub packages while
> retaining
> >>
> >>> file
> >>>
> >> history in git. Other cleanup such as removing deprecated code is
> also
> >>
> >>> expected.
> 
>  1. Version 4.0 as major version change from 3.x
> 
>  2. Version 1.0 with simultaneous change of Maven artifact IDs
> 
>  Please refer to the discussion thread [1] for reasoning behind
> both
>  of
> 
>  the
> >>>
> >> options.
> >>
> >>> Please vote on both options. Primary vote for your preferred
> option,
>  secondary for the other. Secondary vote can be used when counting
> 
>  primary
> >>>
> >> vote alone isn't conclusive.
> >>
> >>> Vote will be open for at least 72 hours.
> 
>  Thanks,
>  Thomas
> 
>  [1] https://lists.apache.org/thread.html/
> 
>

Re: [VOTE] Major version change for Apex Library (Malhar)

2017-08-22 Thread Amol Kekre

I am -1 on option 2. There is no need to do so, as going back on versions
at this stage has consequences to Apex users.

I am for option 1, but I want to propose explicit change to the text. Based
on verbatim text, I am voting -1 on option 1. I believe in the original
discussion thread there was talk about continuing release-3 that should be
explicit in the vote.

option 3 (modified option 1)
3. Version 4.0 as major version change from 3.x. Community members can
continue with release-3 (3.9, 3.10, ...). PR merges into release-3 should
not be blocked if it is not immediately merged into master branch.

Over a longer period of time, I expect code to progressively be in version
4. Changing package names is usually not a reason for major version
upgrade. The cause is usually an API change. Currently we are moving to
version 4, without an ask for API change.

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Tue, Aug 22, 2017 at 8:31 AM, Pramod Immaneni 
wrote:

> +1 for option 1
> -1 for option 2 as I see no impending need to do this now, as in if we
> don't do this, things will fall apart. It will be a source of more
> disruption and confusion. Malhar has been around for quite some time,
> evolving and growing during this period and going to version 4.0 would be a
> natural progression. Since this is a major version change, there is more of
> a license to relegate things that are deemed unsuitable for production use
> to contrib (an area designated for that purpose), remove deprecated items,
> move things around and possibly even make backwards incompatible
> functionality changes so I don't see a need to change the artifact id and
> identity of the project.
>
> Thanks
>
> On Tue, Aug 22, 2017 at 8:16 AM, Munagala Ramanath <
> amberar...@yahoo.com.invalid> wrote:
>
> > +1 for option 2 (primary)
> > +1 for option 1 (secondary)
> > Ram
> >
> >
> > On Tuesday, August 22, 2017, 6:58:46 AM PDT, Vlad Rozov <
> vro...@apache.org>
> > wrote:
> >
> > +1 for option 2 (primary)
> > +1 for option 1 (secondary)
> >
> > Thank you,
> >
> > Vlad
> >
> > On 8/21/17 23:37, Ananth G wrote:
> > > +1 for option 2 and second vote for option 1
> > >
> > > Have we finalized the library name ? Going from Apex-malhar 3.7 to
> > Apex-malhar-1.0 would be counter intuitive. Also it would be great if we
> > have an agreed process to mark an operator from @evolving to stable
> version
> > given we are trying to address this as well as part of the proposal
> > >
> > > Regards
> > > Ananth
> > >
> > >> On 22 Aug 2017, at 11:40 am, Thomas Weise  wrote:
> > >>
> > >> +1 for option 2 (second vote +1 for option 1)
> > >>
> > >>
> > >>> On Mon, Aug 21, 2017 at 6:39 PM, Thomas Weise 
> wrote:
> > >>>
> > >>> This is to formalize the major version change for Malhar discussed in
> > [1].
> > >>>
> > >>> There are two options for major version change. Major version change
> > will
> > >>> rename legacy packages to org.apache.apex sub packages while
> retaining
> > file
> > >>> history in git. Other cleanup such as removing deprecated code is
> also
> > >>> expected.
> > >>>
> > >>> 1. Version 4.0 as major version change from 3.x
> > >>>
> > >>> 2. Version 1.0 with simultaneous change of Maven artifact IDs
> > >>>
> > >>> Please refer to the discussion thread [1] for reasoning behind both
> of
> > the
> > >>> options.
> > >>>
> > >>> Please vote on both options. Primary vote for your preferred option,
> > >>> secondary for the other. Secondary vote can be used when counting
> > primary
> > >>> vote alone isn't conclusive.
> > >>>
> > >>> Vote will be open for at least 72 hours.
> > >>>
> > >>> Thanks,
> > >>> Thomas
> > >>>
> > >>> [1] https://lists.apache.org/thread.html/
> > bd1db8a2d01e23b0c0ab98a785f6ee
> > >>> 9492a1ac9e52d422568a46e5f3@%3Cdev.apex.apache.org%3E
> > >>>
> >
> >
> > Thank you,
> >
> > Vlad
> >
>

Re: Java packages: legacy -> org.apache.apex

2017-08-17 Thread Amol Kekre

This following pull request should be taken up in 4.0.0. See my comments in
https://github.com/apache/apex-malhar/pull/664

https://github.com/apache/apex-malhar/pull/662


This merge should not be done without a consensus. This will require code
changes to existing apps.

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Mon, Aug 14, 2017 at 7:48 PM, Thomas Weise  wrote:

> Hi,
>
> I opened following PRs for the package change:
>
> https://github.com/apache/apex-malhar/pull/662
>
> Moves all classes with history retained (hence 2 commits). Also contains
> checkstyle and other mechanical changes.
>
> https://github.com/apache/apex-malhar/pull/664
>
> Adds backward compatibility jar.
>
> Once above PRs are merged the new artifact can be deployed and introduced
> as dependency in malhar-library.
>
> Please review.
>
> Thanks,
> Thomas
>
>
>
> On Sun, Jul 16, 2017 at 7:04 AM, Thomas Weise  wrote:
>
> > My original list of work items contained the b/w compatibility aspect, I
> > don't think there should be any confusion of whether it will be covered
> > here or not.
> >
> > The proposed shading will provide the old classes and they will be frozen
> > as of release 3.7. That's the same as making a copy of the code and never
> > again making changes to the original classes. This cannot be accomplished
> > by using the older 3.7 release in your project because you cannot use 2
> > different versions of Malhar in parallel unless you apply shading.
> >
> > The shaded artifact will only expose the com.datatorrent classes, and
> they
> > will be self-contained as the rest of the classes that they may depend on
> > are shaded. The shaded artifact does not evolve, there are not more
> changes
> > to com.datatorrent classes after they are relocated in master.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Sun, Jul 16, 2017 at 2:00 AM, Pramod Immaneni  >
> > wrote:
> >
> >> I don't think we can limit the topic strictly to relocation without
> having
> >> a good b/w compatibility story or at least one that goes far enough.
> >>
> >> The shading idea sounds interesting. Why not let the shaded version move
> >> forward with each release till we hit a major release. If it is going to
> >> remain pegged at 3.7.0, why shade in the first place as the regular
> 3.7.0
> >> release would do the same job and it would be same as the loss of b/w
> >> compatibility with newer releases.
> >>
> >> Thanks
> >>
> >> On Sat, Jul 15, 2017 at 7:57 AM, Thomas Weise  wrote:
> >>
> >> > Discussing what in the future might become stable needs to be a
> separate
> >> > thread, it will be a much bigger discussion.
> >> >
> >> > The topic here is to relocate the packages. With a few exceptions
> >> > relocation won't affect the semantic versioning. Semantic versioning
> is
> >> > essentially not effective for Malhar because almost everything is
> >> @Evolving
> >> > (and there are reasons for that.. -> separate topic)
> >> >
> >> > I don't really like the idea of creating bw compatibility stubs for
> the
> >> > follow up PR. It creates even more clutter in the source tree than
> there
> >> > already is and so here is an alternative suggestion:
> >> >
> >> > https://github.com/tweise/apex-malhar/blob/malhar37-
> >> > compat/shaded-malhar37/pom.xml
> >> >
> >> > Create a shaded artifact that provides the old com.datatorrent.*
> >> classes as
> >> > of release 3.7. Users can include that artifact if they don't want to
> >> > change import statements. At the same time they have an incentive to
> >> switch
> >> > to the relocated classes to take advantage of bug fixes and new
> >> > functionality.
> >> >
> >> > I will work on the first PR that does the relocate. In the meantime,
> we
> >> can
> >> > finalize what backward compatibility support we want to provide and
> how.
> >> >
> >> > Thanks,
> >> > Thomas
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jul 14, 2017 at 11:33 AM, Pramod Immaneni <
> >> pra...@datatorrent.com>
> >> > wrote:
> >> >
> >> > > How about coming up with a list of @Evolving operators that we would
> >> like
> >> > > to see in the eventual stable list and move those along with the not
> >> > > @Evolving ones in org.apache.apex with b/w stubs and leave the rest
> as
> >> > they
> >> > > are. Then have a follow up JIRA for the rest to be moved over to
> >> contrib
> >> > > and be deprecated.
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Fri, Jul 14, 2017 at 10:37 AM, Thomas Weise <
> >> thomas.we...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > We need to keep the discussion here on topic, if other things are
> >> piled
>

Re: anyone else seeing a 404

2017-07-13 Thread Amol Kekre

me too

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Thu, Jul 13, 2017 at 1:31 PM, Pramod Immaneni 
wrote:

> https://git-wip-us.apache.org/repos/asf?p=apex-core.git
>

Re: Backward compatibility issue in 3.6.0 release

2017-05-15 Thread Amol Kekre

I agree with Vlad; I would suggest option 1 too, but not a patch release.
Documentation should suffice.

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Mon, May 15, 2017 at 11:13 AM, Vlad Rozov 
wrote:

> I considered proposing 3.6.1 patch release, but rejected it as API changes
> do not qualify for the patch release. Note that this is a compile time
> backward compatibility issue, not a run-time.
>
> Instead of 3.6.1 I would propose to spell out the issue in 3.6.0
> documentation.
>
> Thank you,
> Vlad
>
> Отправлено с iPhone
>
> > On May 15, 2017, at 10:40, Munagala Ramanath
>  wrote:
> >
> > I like proposal 1 too; I also agree with Ajay about doing a 2.6.1 patch
> release.
> > Ram
> >
> >On Monday, May 15, 2017 10:18 AM, AJAY GUPTA 
> wrote:
> >
> >
> > I would vote for 1 and making variables private since it anyways breaks
> > semantic versioning.
> > I think it would it be a good idea to release a 3.6.1 patch release as
> > well.
> >
> >
> > Ajay
> >
> > On Mon, May 15, 2017 at 10:36 PM, Sanjay Pujare 
> > wrote:
> >
> >> I vote for renaming to less common names like __count. The renaming
> breaks
> >> compatibility from 3.6.0 to 3.7.0 but seems to be the best option.
> >>
> >> On Mon, May 15, 2017 at 9:53 AM, Vlad Rozov 
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> There is a possible change in operators behavior caused by changes that
> >>> were introduced in the release 3.6.0 into DefaultInputPort and
> >>> DefaultOutputPort. Please see https://issues.apache.org/jira
> >>> /browse/APEXCORE-722. We need to agree how to proceed.
> >>>
> >>> 1. Break semantic versioning for the Default Input and Output Ports in
> >> the
> >>> next release (3.7.0), declare protected variables as private and
> provide
> >>> protected access method. Another option is to rename protected
> variables
> >> to
> >>> use less common names (for example __count).
> >>> 2. Keep protected variables with the risk that the following common
> >>> operator design pattern will be used accidentally by existing operators
> >> and
> >>> newly designed operators:
> >>>
> >>> public Operator extends BaseOperator {
> >>>   private int count;
> >>>   public DefaultInputPort in = new DefaultInputPort() {
> >>> @Override
> >>> public void process(Object tuple)
> >>> {
> >>> count++;  // updates DefaultInputPort count, not Operator
> count!
> >>> }
> >>>   }
> >>> }
> >>>
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>
> >
> >
>

Re: Proposal to upgrade Apex Core dependency in Malhar to 3.6

2017-05-13 Thread Amol Kekre

+1

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, May 12, 2017 at 3:26 PM, Pramod Immaneni 
wrote:

> +1
>
> On Thu, May 11, 2017 at 5:51 AM, AJAY GUPTA  wrote:
>
> > Hi all,
> >
> > Apex Malhar currently depends on core 3.4. Custom tuple support has been
> > added to core in 3.6. Hence, malhar dependency needs to be updated.
> > This is necessary for changing the malhar operators for batch use cases.
> >
> > Let me know the community view on the same.
> >
> >
> > Thanks,
> > Ajay
> >
>

Re: binary releases

2017-05-05 Thread Amol Kekre

We should, it is good for the community.

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, May 5, 2017 at 5:03 PM, Pramod Immaneni 
wrote:

> I wanted to see how the community felt about also publishing binaries as
> part of the release. Hadoop, for example, has been doing this
>
> http://hadoop.apache.org/releases.html
> https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.8.0/
>
> Thanks
>

Re: [VOTE] Apache Apex Core Release 3.6.0 (RC1)

2017-05-02 Thread Amol Kekre

+1 (binding)

- Signatures, checksums ok
- Build successfully
- README.md, LICENSE, NOTICE, & CHANGELOG.md files

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Tue, May 2, 2017 at 11:16 AM, Tushar Gosavi 
wrote:

> Hi Thomas,
>
> I had pushed documentation to the apex-site repository, published them now
> under https://apex.apache.org/docs/apex-3.6/
>
> For javadocs had updated the buildbot configuration, somehow the java
> documentation is not prepared yet. looking into it.
>
> Thanks,
> - Tushar.
>
>
> On Tue, May 2, 2017 at 8:50 PM, Thomas Weise  wrote:
>
> > +1 (binding)
> >
> > - verified signatures and hashes
> > - build passes: mvn clean apache-rat:check install -Dlicense.skip=false
> > - successfully run the Apache Beam validate runner tests with this
> version
> >
> > Minor issue:
> >
> > The email should contain documentation links (javadoc and user
> > documentation).
> > Was the documentation published?
> >
> > Thanks,
> > Thomas
> >
> >
> > On Tue, May 2, 2017 at 6:31 AM, Bhupesh Chawda 
> > wrote:
> >
> > > +1
> > >
> > > Checked the following:
> > > 1. Signatures and checksums okay
> > > 2. Build successful with tests
> > > 3. Presence of README.md, LICENSE, NOTICE and CHANGELOG.md files
> > > 4. Could launch pi demo successfully
> > >
> > > ~ Bhupesh
> > >
> > >
> > >
> > >
> > > ___
> > >
> > > Bhupesh Chawda
> > >
> > > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> > >
> > > www.datatorrent.com  |  apex.apache.org
> > >
> > >
> > >
> > > On Tue, May 2, 2017 at 2:01 AM, Vlad Rozov 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > - verified release signature and hashes
> > > > - verified LICENSE, NOTICE, README.md and CHANGELOG.md are present
> > > > - no unexpected binary files in the source distribution
> > > > - verified build and tests run clean "mvn clean apache-rat:check
> verify
> > > > -Dlicense.skip=false install"
> > > >
> > > > Thank you,
> > > >
> > > > Vlad
> > > >
> > > >
> > > > On 5/1/17 11:49, Tushar Gosavi wrote:
> > > >
> > > >> Dear Community,
> > > >>
> > > >> Please vote on the following Apache Apex Core 3.6.0 release
> candidate
> > 1.
> > > >>
> > > >> This release adds the support for custom control tuples,
> experimental
> > > >> support for plugins
> > > >> and other improvements and important bug fixes.
> > > >>
> > > >> This is a source release with binary artifacts published to Maven.
> > > >>
> > > >> List of all issues fixed: https://s.apache.org/HQ0r
> > > >>
> > > >> Staging directory
> > > >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > core-3.6.0-RC1/
> > > >> Source zip:
> > > >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-core
> > > >> -3.6.0-RC1/apache-apex-core-3.6.0-source-release.zip
> > > >> Source tar.gz:
> > > >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-core
> > > >> -3.6.0-RC1/apache-apex-core-3.6.0-source-release.tar.gz
> > > >> Maven staging repository:
> > > >> https://repository.apache.org/content/repositories/
> orgapacheapex-1028
> > > >>
> > > >> Git source:
> > > >> https://git-wip-us.apache.org/repos/asf?p=apex-core.git;a=co
> > > >> mmit;h=refs/tags/v3.6.0-RC1
> > > >> (commit: 5a517348ae497c06150f32ce39b6915588e92510)
> > > >>
> > > >> PGP key:
> > > >> http://pgp.mit.edu:11371/pks/lookup?op=vindex=
> > tus...@apache.org
> > > >> KEYS file:
> > > >> https://dist.apache.org/repos/dist/release/apex/KEYS
> > > >>
> > > >> More information at:
> > > >> http://apex.apache.org
> > > >>
> > > >> Please try the release and vote; vote will be open for at least 72
> > > hours.
> > > >>
> > > >> [ ] +1 approve (and what verification was done)
> > > >> [ ] -1 disapprove (and reason why)
> > > >>
> > > >> http://www.apache.org/foundation/voting.html
> > > >>
> > > >> How to verify release candidate:
> > > >>
> > > >> http://apex.apache.org/verification.html
> > > >>
> > > >> Thanks,
> > > >> Tushar.
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: PR merge policy

2017-04-28 Thread Amol Kekre

That makes sense. The committer should fix upon feedback. PR policy
violation is bad, I am not defending the violation.  think if the committer
takes the feedback and promptly works on a fix (new pr) it should be ok.

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Apr 28, 2017 at 7:10 PM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> I think it is a good idea to make the committer responsible for fixing the
> situation by rolling back the commit and re-opening the PR for further
> review. IMO, committer right comes with the responsibility to respect the
> community and policies it established.
>
> I would disagree that rolling back should be used only in case of a
> disaster unless PR merge policy violation is a disaster :-) (and it
> actually is).
>
> Thank you,
>
> Vlad
>
> On 4/28/17 14:21, Amol Kekre wrote:
>
>> Strongly agree with Ilya. Lets take these events as learning opportunities
>> for folks to learn and improve. There can always be second commit to fix
>> in
>> case there is code issue. If it is a policy issue, we learn and improve.
>> Rolling back, should be used rarely and it does need to be a disaster. We
>> need to be cognizant of new contributors worrying about the cost to submit
>> code.
>>
>> I too do not think Apex is hurting from bad code getting in. We are doing
>> great with our current policies.
>>
>> Thks,
>> Amol
>>
>>
>> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>>
>> www.datatorrent.com
>>
>>
>> On Fri, Apr 28, 2017 at 1:35 PM, Ganelin, Ilya <
>> ilya.gane...@capitalone.com>
>> wrote:
>>
>> Guess we can all go home then. Our work here is done:
>>>
>>>
>>>
>>>
>>> W.R.T the discussion below, I think rolling back an improperly reviewed
>>> PR
>>> could be considered disrespectful to the committer who merged it in the
>>> first place. I think that such situations, unless they trigger a
>>> disaster,
>>> should be handled by communicating the error to the responsible party and
>>> then allowing them to resolve it. E.g. I improperly commit an unreviewed
>>> PR, someone notices and sends me an email informing me of my error, and I
>>> then have the responsibility of unrolling the change and getting the
>>> appropriate review. I think we should start with the premise that we’re
>>> here in the spirit of collaboration and we should create opportunities
>>> for
>>> individuals to learn from their mistakes, recognize the importance of
>>> particular standards (e.g. good review process leads to stable projects),
>>> and ultimately internalize these ethics.
>>>
>>>
>>>
>>> Internally to our team, we’ve had great success with a policy requiring
>>> two PR approvals and not allowing the creator of a patch to be the one to
>>> merge their PR. While this might feel a little silly, it definitely helps
>>> to build collaboration, familiarity with the code base, and intrinsically
>>> avoids PRs being merged too quickly (without a sufficient period for
>>> review).
>>>
>>>
>>>
>>>
>>>
>>> - Ilya Ganelin
>>>
>>> [image: id:image001.png@01D1F7A4.F3D42980]
>>>
>>>
>>>
>>> *From: *Pramod Immaneni <pra...@datatorrent.com>
>>> *Reply-To: *"dev@apex.apache.org" <dev@apex.apache.org>
>>> *Date: *Friday, April 28, 2017 at 10:09 AM
>>> *To: *"dev@apex.apache.org" <dev@apex.apache.org>
>>> *Subject: *Re: PR merge policy
>>>
>>>
>>>
>>>
>>> On a lighter note, looks like the powers that be have been listening on
>>> this conversation and decided to force push an empty repo or maybe
>>> github just decided that this is the best proposal ;)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 27, 2017 at 10:47 PM, Vlad Rozov <v.ro...@datatorrent.com>
>>> wrote:
>>>
>>> In this case please propose how to deal with PR merge policy violations
>>> in
>>> the future. I will -1 proposal to commit an improvement on top of a
>>> commit.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>>
>>> On 4/27/17 21:48, Pramod Immaneni wrote:
>>>
>>> I am sorry but I am -1 on the force push in this case.
>>>
>>> On Apr 27, 2017, at

Re: PR merge policy

2017-04-28 Thread Amol Kekre

Strongly agree with Ilya. Lets take these events as learning opportunities
for folks to learn and improve. There can always be second commit to fix in
case there is code issue. If it is a policy issue, we learn and improve.
Rolling back, should be used rarely and it does need to be a disaster. We
need to be cognizant of new contributors worrying about the cost to submit
code.

I too do not think Apex is hurting from bad code getting in. We are doing
great with our current policies.

Thks,
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Apr 28, 2017 at 1:35 PM, Ganelin, Ilya 
wrote:

> Guess we can all go home then. Our work here is done:
>
>
>
>
> W.R.T the discussion below, I think rolling back an improperly reviewed PR
> could be considered disrespectful to the committer who merged it in the
> first place. I think that such situations, unless they trigger a disaster,
> should be handled by communicating the error to the responsible party and
> then allowing them to resolve it. E.g. I improperly commit an unreviewed
> PR, someone notices and sends me an email informing me of my error, and I
> then have the responsibility of unrolling the change and getting the
> appropriate review. I think we should start with the premise that we’re
> here in the spirit of collaboration and we should create opportunities for
> individuals to learn from their mistakes, recognize the importance of
> particular standards (e.g. good review process leads to stable projects),
> and ultimately internalize these ethics.
>
>
>
> Internally to our team, we’ve had great success with a policy requiring
> two PR approvals and not allowing the creator of a patch to be the one to
> merge their PR. While this might feel a little silly, it definitely helps
> to build collaboration, familiarity with the code base, and intrinsically
> avoids PRs being merged too quickly (without a sufficient period for
> review).
>
>
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
>
>
> *From: *Pramod Immaneni 
> *Reply-To: *"dev@apex.apache.org" 
> *Date: *Friday, April 28, 2017 at 10:09 AM
> *To: *"dev@apex.apache.org" 
> *Subject: *Re: PR merge policy
>
>
>
> On a lighter note, looks like the powers that be have been listening on
> this conversation and decided to force push an empty repo or maybe
> github just decided that this is the best proposal ;)
>
>
>
>
>
>
>
> On Thu, Apr 27, 2017 at 10:47 PM, Vlad Rozov 
> wrote:
>
> In this case please propose how to deal with PR merge policy violations in
> the future. I will -1 proposal to commit an improvement on top of a commit.
>
> Thank you,
>
> Vlad
>
>
>
> On 4/27/17 21:48, Pramod Immaneni wrote:
>
> I am sorry but I am -1 on the force push in this case.
>
> On Apr 27, 2017, at 9:27 PM, Thomas Weise  wrote:
>
> +1 as measure of last resort.
>
> On Thu, Apr 27, 2017 at 9:25 PM, Vlad Rozov 
> wrote:
>
> IMO, force push will bring enough consequent embarrassment to avoid such
> behavior in the future.
>
> Thank you,
>
> Vlad
>
> On 4/27/17 21:16, Munagala Ramanath wrote:
>
> My thought was that leaving the bad commit would be a permanent reminder
> to
> the committer
> (and others) that a policy violation occurred and the consequent
> embarrassment would be an
> adequate deterrent.
>
> Ram
>
> On Thu, Apr 27, 2017 at 9:12 PM, Vlad Rozov 
> wrote:
>
> I also was under impression that everyone agreed to the policy that gives
>
> everyone in the community a chance to raise a concern or to propose an
> improvement to a PR. Unfortunately, it is not the case, and we need to
> discuss it again. I hope that this discussion will lead to no future
> violations so we don't need to forcibly undo such commits, but it will be
> good for the community to agree on the policy that deals with violations.
>
> Ram, committing an improvement on top of a commit should be discouraged,
> not encouraged as it eventually leads to the policy violation and lousy
> PR
> reviews.
>
> Thank you,
>
> Vlad
>
> On 4/27/17 20:54, Thomas Weise wrote:
>
> I also thought that everybody was in agreement about that after the first
>
> round of discussion and as you say it would be hard to argue against it.
> And I think we should not have to be back to the same topic a few days
> later.
>
> While you seem to be focussed on the disagreement on policy violation,
> I'm
> more interested in a style of collaboration that does not require such
> discussion.
>
> Thomas
>
> On Thu, Apr 27, 2017 at 8:45 PM, Munagala Ramanath  wrote:
>
> Everybody seems agreed on what the committers should do -- that waiting
> a
>
> day or two for
> others to have a chance to comment seems like an entirely reasonable
> thing.
>
> The disagreement is about what to do when that policy is

Re: Towards Apache Apex 3.6.0 release

2017-04-13 Thread Amol Kekre

+1 to cut a release

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Thu, Apr 13, 2017 at 9:22 AM, Pramod Immaneni 
wrote:

> +1
>
> I would like to see 699 and 700 addressed as well.
>
> On Wed, Apr 12, 2017 at 10:16 PM, Tushar Gosavi 
> wrote:
>
> > Hi,
> >
> > It has been four month since 3.5.0 Apex Core release. There are several
> new
> > features added to core after 3.5.0. I would like to propose the 3.6.0
> > release of Apex Core, to make these features available to users.
> >
> > The list of issues fixed in 3.6.0 are:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20APEXCORE%20AND%20status%20in%20(Resolved%2C%20Closed)%
> > 20AND%20fixVersion%20%3D%203.6.0%20ORDER%20BY%20status%20ASC
> >
> > Apart from above JIRAs, which bug-fixes/features people will like to see
> in
> > this release. If you feel your JIRA should be included then please set
> fix
> > version to 3.6.0 with estimated time for work completion, also discuss it
> > here. Some of the pending pull requests could be incorporated in the
> > release. I feel following JIRA should be part of release APEXCORE-649,
> > APEXCORE-678.
> >
> > Let me know about your thoughts.
> >
> > Thanks,
> > Tushar.
> >
>

Re: Why does emit require current thread to be the operator thread?

2017-04-10 Thread Amol Kekre

Not yet, But we could leverage internal structures of Apex as they do same
thing. For example in container local streams. There is a catch though -
the queue read by main thread will only happen when another data tuple
arrives in process call, or control tuple arrives for start or end window.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Mon, Apr 10, 2017 at 1:01 PM, Ganelin, Ilya <ilya.gane...@capitalone.com>
wrote:

> Thanks, Amol – that makes sense and was the solution I’d arrived at. I
> just was trying to avoid the delay between the data being ready and
> emitting it. Has anyone built a solution where it emits from the parent as
> soon as it’s ready in the child (assuming I don’t care about order).
>
> - Ilya Ganelin
>
>
> On 4/10/17, 12:45 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
>
> Ilya,
> This constraint was introduced as allowing two threads to emit data
> creates
> lots of bad situations
> 1. The emit is triggered between end_window and begin_window. This was
> a
> critical blocker
> 2. Order no longer guaranteed, upon replay getting wrong order of
> events
> within a window. This was something to worry about, but not a blocker
>
> We had users report this problem.
>
> The solution is to pass the data to main thread and have the main
> thread
> emit this data during one of start-window, process, end-window calls.
> Ideally during start-window or end-window so as to guarantee order.
> Keeping
> this code in start or end window also ensures that process call remains
> optimal.
>
> Thks
> Amol
>
>
>
> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>
> www.datatorrent.com
>
>
> On Mon, Apr 10, 2017 at 12:39 PM, Ganelin, Ilya <
> ilya.gane...@capitalone.com
> > wrote:
>
> > Hello – I’ve got an operator that runs a cleanup thread (separate
> from the
> > main event loop) and triggers a callback when an item is removed
> from an
> > internal data structure. I would like for this callback to emit data
> from
> > one of the operator’s ports, but I run into the following Exception:
> >
> >
> >
> > (From DefaultOutputPort.java, line 58)
> >
> > if (operatorThread != null && Thread.*currentThread*() !=
> operatorThread)
> > {
> >   // only under certain modes: enforce this
> >   throw new IllegalStateException("Current thread " + Thread.
> > *currentThread*().getName() +
> >   " is different from the operator thread " +
> > operatorThread.getName());
> > }
> >
> >
> >
> > I could obviously extend DefaultOperatorPort to bypass this but I’d
> like
> > to understand why that constraint is there and if there’s a good way
> to
> > work around it.
> >
> >
> >
> > Would love to hear the community’s thoughts. Thanks!
> >
> >
> >
> > - Ilya Ganelin
> >
> > [image: id:image001.png@01D1F7A4.F3D42980]
> >
> > --
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or
> entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of
> any
> > action in reliance upon this information is strictly prohibited. If
> you
> > have received this communication in error, please contact the sender
> and
> > delete the material from your computer.
> >
>
>
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Why does emit require current thread to be the operator thread?

2017-04-10 Thread Amol Kekre

Ilya,
This constraint was introduced as allowing two threads to emit data creates
lots of bad situations
1. The emit is triggered between end_window and begin_window. This was a
critical blocker
2. Order no longer guaranteed, upon replay getting wrong order of events
within a window. This was something to worry about, but not a blocker

We had users report this problem.

The solution is to pass the data to main thread and have the main thread
emit this data during one of start-window, process, end-window calls.
Ideally during start-window or end-window so as to guarantee order. Keeping
this code in start or end window also ensures that process call remains
optimal.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Mon, Apr 10, 2017 at 12:39 PM, Ganelin, Ilya  wrote:

> Hello – I’ve got an operator that runs a cleanup thread (separate from the
> main event loop) and triggers a callback when an item is removed from an
> internal data structure. I would like for this callback to emit data from
> one of the operator’s ports, but I run into the following Exception:
>
>
>
> (From DefaultOutputPort.java, line 58)
>
> if (operatorThread != null && Thread.*currentThread*() != operatorThread)
> {
>   // only under certain modes: enforce this
>   throw new IllegalStateException("Current thread " + Thread.
> *currentThread*().getName() +
>   " is different from the operator thread " +
> operatorThread.getName());
> }
>
>
>
> I could obviously extend DefaultOperatorPort to bypass this but I’d like
> to understand why that constraint is there and if there’s a good way to
> work around it.
>
>
>
> Would love to hear the community’s thoughts. Thanks!
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Apex Sandbox in Apex Dev Setup documentation

2017-04-07 Thread Amol Kekre

We should just point it to apex download page in the docs ->
http://apex.apache.org/downloads.html; I will add my comments to the jira

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Fri, Apr 7, 2017 at 1:33 PM, Dean Lockgaard 
wrote:

> Thanks for the input, Thomas and Pramod.
>
> I've created the following two tickets to track this:
>
> https://issues.apache.org/jira/browse/APEXCORE-692
> https://issues.apache.org/jira/browse/APEXCORE-693
>
> Regards,
> Dean
>
>
>
>
> On Fri, Apr 7, 2017 at 9:22 AM, Pramod Immaneni 
> wrote:
>
> > Agreed, didn't realize this was in the documentation page.
> >
> > On Fri, Apr 7, 2017 at 9:09 AM, Thomas Weise  wrote:
> >
> > > I disagree. Apex documentation is not the place to promote vendor
> > > offerings.
> > >
> > > I suggested to refer to the download page, which already contains the
> > > DataTorrent link.
> > >
> > > Thomas
> > >
> > >
> > > On Fri, Apr 7, 2017 at 8:52 AM, Pramod Immaneni <
> pra...@datatorrent.com>
> > > wrote:
> > >
> > > > Hi Dean,
> > > >
> > > > There aren't many good options out there for users to get an
> > environment
> > > up
> > > > with apex apps running easily or quickly. We all know how difficult
> it
> > > > would be for someone new to get an application up and running even if
> > > they
> > > > had a hadoop sandbox with apex. Also, for many when they download a
> > > sandbox
> > > > they not only want to be able to run something in a few steps, but
> also
> > > > have easy to use tools and see something visually. My suggestion is
> to
> > > keep
> > > > both the sandboxes. You can put the bigtop sandbox first in the list
> as
> > > it
> > > > is vendor neutral.
> > > >
> > > > Thanks
> > > >
> >
>
>
> On Thu, Apr 6, 2017 at 7:18 PM, Thomas Weise  wrote:
>
> > +1 this should be tracked in a JIRA.
> >
> > There are also some improvements that can be done to the instructions on
> > the docker hub (separate activity).
> >
> > I would also suggest to list the Apex binary build on the downloads page
> > for users that have an existing cluster:
> >
> > https://github.com/atrato/apex-cli-package/releases
> >
> > And perhaps mention in the setup tutorial that these other download
> options
> > are listed on the website.
> >
> > Thanks,
> > Thomas
>
>
>
>
>
> > > > On Thu, Apr 6, 2017 at 1:16 PM, Dean Lockgaard <
> > dean.lockga...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > All,
> > > > >
> > > > > In the Sandbox section of the Apache Apex Development Environment
> > Setup
> > > > > documentation (
> > > > > https://apex.apache.org/docs/apex/apex_development_setup/#sandbox
> ),
> > > > > instructions are provided for a vendor-specific Sandbox.
> > > > >
> > > > > I would like to propose that these instructions be changed to
> > reference
> > > > the
> > > > > Apache Apex Sandbox instead, which is an Apache Bigtop build and
> > > > available
> > > > > via docker at https://hub.docker.com/r/apacheapex/sandbox.
> > > > >
> > > > > Regards,
> > > > > Dean
> > > > >
> > > >
> > >
> >
>

Re: [Design] - Kudu Output Operator

2017-04-07 Thread Amol Kekre

Ananth,
This is good proposal. We will work with you.

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Sat, Apr 1, 2017 at 4:29 PM, ananth  wrote:

> Hello All,
>
> I would like to the community's opinion on the implementation of Kudu
> output operator.  A first cut implementation was made available in November
> last year but I guess we did not get time to discuss this thoroughly on the
> mailing list and hence the PR did not get merged.
>
> This operator would allow Apex to stream data into Kudu. A brief
> description of Kudu is here : https://kudu.apache.org/. This would allow
> at a high level the following use cases from Apex point of view:
>
> - Low latency writes into Kudu store that allows SQL queries on the Kudu
> store. This essentially means sub-second data updates available for SQL
> querying. As opposed to parquet styled data dumps which would ideally need
> a few minutes to accumulate data to take advantage of Parquet formats, this
> would make same second queries on very large datasets on Kudu with Impala.
>
> - Another very interesting use cases would be to allow Kudu as a source
> store to stream based on SQL queries. The kudu input operator is another
> JIRA(https://issues.apache.org/jira/browse/APEXMALHAR-2472) and would be
> covering mechanisms to stream data from Kudu into Apex. This will bring in
> interesting use cases like de-dupe and selective streaming and out of band
> data in a different way if Kudu is part of the eco system in a given setup.
>
> Here is the design of the Kudu output operator:
>
>
> 1. The operator would be an AbstractOperator and would allow the concrete
> implementations to set a few behavioral aspects of the operator.
>
> 2. The following are the major phases of the operator:
>
> During activate() phase of the operator : Establish a connection to the
> cluster and get the metadata about the table that is being used as the sink.
> During setup() phase of the operator: Fetch the current window information
> and use it decide if we are recovering from a failure mode. (See point 8
> below )
> During process() of Input port : Inspect the incoming ExecutionContext (
> see below ) tuple and perform one of the operations (
> Insert/Update/Delete/Upsert)
> 3. The following parameters are tunable while establishing a Kudu
> connection:
> Table name, Boss worker threads, Worker threads, Socket read time outs and
> External Consistency mode.
> 4. The user need not specify any schema outright. The pojo fields are
> automatically mapped to the table column names as identified in the schema
> parse in the activate phase.
> 5. Allow the concrete implementation of the operator to override the Pojo
> field name to the table schema column name. This would allow flexibility in
> use cases like table schema column names are not compatible with java bean
> frameworks or in situations when column names cant be controlled as POJO is
> coming from an upstream operator.
> 6. The input tuple that is to be supplied to this operator is of type
> "Kudu Execution Context". This tuple encompasses the actual Pojo that is
> going to be persisted to the Kudu store. Additionally it allows the
> upstream operator to specify the operation that needs to be performed. One
> of the following operations is permitted as part of the context : Insert,
> Upsert, Update and delete on the Pojo that is acting as the payload in the
> Execution Context.
> 7. The concrete implementation of the operator would allow the user to
> specify the actual POJO class definition that would be used to the write to
> the table. The execution context would contain this POJO as well as the
> metadata that defines the behavior of the processing that needs to be done
> on that tuple.
> 8. The operator would allow for a special case of execution mode for the
> first window that is being processed as the operator gets activated. There
> are two modes for the first window of processing of the operator :
> a. Safe Mode : Safe mode is the "happy path execution" as in no extra
> processing is required to perform the Kudu mutation.
> b. Reconciling Mode: There is an additional function that would be
> called to see if the user would like the tuple to be used for mutation.
> This mode is automatically set when OperatorContext.ACTIVATION_WINDOW_ID
> != Stateless.WINDOW_ID during the first window of processing by the
> operator.
>
> This feature is deemed to be useful when an operator is recovering from a
> crash instance of the application and we do not want to perform multiple
> mutations of the same tuple given ATLEAST_ONCE is the default semantics.
>
> 9. The operator is a stateless operator.
> 10. The operator would generate the following autometrics :
>  a. Counts of Inserts, Upserts, Deletes and Updates (separate counters
> for each mutation) for a given window
>  b. Bytes written in a given window
>  c. Write RPCs in the given

Re: open/close ports and active/inactive streams

2017-04-01 Thread Amol Kekre

+1. This has come up with Apex customers on batch use cases. This will make
batch use cases easy and robust. Today a lot of external tooling is needed.
For Apex, this means reduction in stack of tech used.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com

*Join us at Apex Big Data World Mt View
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Sat, Apr 1, 2017 at 12:23 PM, Vlad Rozov  wrote:

> Correct, a statefull downstream operator can only be undeployed at a
> checkpoint window after it consumes all data emitted by upstream operator
> on the closed port.
>
> It will be necessary to distinguish between closed port and inactive
> stream. After port is closed, stream may still be active and after port is
> open, stream may still be inactive (not yet ready).
>
> The more contributors participate in the discussion and implementation,
> the more solid the feature will be.
>
> Thank you,
> Vlad
>
> Отправлено с iPhone
>
> > On Apr 1, 2017, at 11:03, Pramod Immaneni 
> wrote:
> >
> > Generally a good idea. Care should be taken around fault tolerance and
> > idempotency. Close stream would need to stop accepting new data but still
> > can't actually close all the streams and un-deploy operators till
> > committed. Idempotency might require the close stream to take effect at
> the
> > end of the window. What would it then mean for re-opening streams within
> a
> > window? Also, looks like a larger undertaking, as Ram suggested would be
> > good to understand the use cases and I also suggest that multiple folks
> > participate in the implementation effort to ensure that we are able to
> > address all the scenarios and minimize chances of regression in existing
> > behavior.
> >
> > Thanks
> >
> >> On Sat, Apr 1, 2017 at 8:12 AM, Vlad Rozov 
> wrote:
> >>
> >> All,
> >>
> >> Currently Apex assumes that an operator can emit on any defined output
> >> port and all streams defined by a DAG are active. I'd like to propose an
> >> ability for an operator to open and close output ports. By default all
> >> ports defined by an operator will be open. In the case an operator for
> any
> >> reason decides that it will not emit tuples on the output port, it may
> >> close it. This will make the stream inactive and the application master
> may
> >> undeploy the downstream (for that input stream) operators. If this
> leads to
> >> containers that don't have any active operators, those containers may be
> >> undeployed as well leading to better cluster resource utilization and
> >> better Apex elasticity. Later, the operator may be in a state where it
> >> needs to emit tuples on the closed port. In this case, it needs to
> re-open
> >> the port and wait till the stream becomes active again before emitting
> >> tuples on that port. Making inactive stream active again, requires the
> >> application master to re-allocate containers and re-deploy the
> downstream
> >> operators.
> >>
> >> It should be also possible for an application designer to mark streams
> as
> >> inactive when an application starts. This will allow the application
> master
> >> avoid reserving all containers when the application starts. Later, the
> port
> >> can be open and inactive stream become active.
> >>
> >> Thank you,
> >>
> >> Vlad
> >>
> >>
>

Re: [VOTE] Apache Apex Malhar Release 3.7.0 (RC1)

2017-03-31 Thread Amol Kekre

+1 (binding)

Verified signatures and file integrity
Verified build successfully
Verified presence of README.md, CHANGELOG.md, LICENSE, NOTICE files

Thks,
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com

*Join us at Apex Big Data World Mt View
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Thu, Mar 30, 2017 at 1:00 AM, Bhupesh Chawda 
wrote:

> +1
>
> Verified signatures
> Verified build successfully
> Verified presence of README.md, CHANGELOG.md, LICENSE, NOTICE files
>
> ~ Bhupesh
>
>
>
> ___
>
> Bhupesh Chawda
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Thu, Mar 30, 2017 at 12:35 PM, Tushar Gosavi 
> wrote:
>
> > +1
> >
> > Verified file and builder integrity.
> > Built source package successfully.
> >
> > Regards,
> > - Tushar.
> >
> >
> > On Thu, Mar 30, 2017 at 11:39 AM, AJAY GUPTA 
> wrote:
> >
> > > In the release notes, the S3 Line by line module was in Bug. Should
> have
> > > been in new feature. I have updated the JIRA type to new feature.
> > >
> > > On Thu, Mar 30, 2017 at 9:49 AM, Pramod Immaneni <
> pra...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > +1 binding
> > > >
> > > > Verified file and builder integrity.
> > > > Verified licenses.
> > > > Built source package successfully
> > > > Launched and ran pi demo successfully.
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Mar 27, 2017 at 11:53 PM, Thomas Weise 
> wrote:
> > > >
> > > > > Dear Community,
> > > > >
> > > > > Please vote on the following Apache Apex Malhar 3.7.0 release
> > > candidate.
> > > > >
> > > > > This is a source release with binary artifacts published to Maven.
> > > > >
> > > > > This release is based on Apex Core 3.4 and resolves 69 issues.
> > > > >
> > > > > List of all issues fixed:
> > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > > version=12338771=12318824
> > > > > User documentation: http://apex.apache.org/docs/malhar-3.7/
> > > > >
> > > > > Staging directory:
> > > > > https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > > > malhar-3.7.0-RC1/
> > > > > Source zip:
> > > > > https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > > > > malhar-3.7.0-RC1/apache-apex-malhar-3.7.0-source-release.zip
> > > > > Source tar.gz:
> > > > > https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > > > > malhar-3.7.0-RC1/apache-apex-malhar-3.7.0-source-release.tar.gz
> > > > > Maven staging repository:
> > > > > https://repository.apache.org/content/repositories/
> > orgapacheapex-1022/
> > > > >
> > > > > Git source:
> > > > > https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> > > > > commit;h=refs/tags/v3.7.0-RC1
> > > > >  (commit: 207b373f2f828636a03294bb388a279d03c5593b)
> > > > >
> > > > > PGP key:
> > > > > http://pgp.mit.edu:11371/pks/lookup?op=vindex=thw@
> apache.org
> > > > > KEYS file:
> > > > > https://dist.apache.org/repos/dist/release/apex/KEYS
> > > > >
> > > > > More information at:
> > > > > http://apex.apache.org
> > > > >
> > > > > Please try the release and vote; vote will be open for at least 72
> > > hours.
> > > > >
> > > > > [ ] +1 approve (and what verification was done)
> > > > > [ ] -1 disapprove (and reason why)
> > > > >
> > > > > http://www.apache.org/foundation/voting.html
> > > > >
> > > > > How to verify release candidate:
> > > > >
> > > > > http://apex.apache.org/verification.html
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > >
> > >
> >
>

Re: Dependencies on libraries licensed as Category X

2017-03-29 Thread Amol Kekre

+1, need to take this as a way to reduce cost of submitting code by new
contributors, along with the fact that this is a must do.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com

*Join us at Apex Big Data World Mt View
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Wed, Mar 29, 2017 at 10:27 AM, Pramod Immaneni 
wrote:

> It will be good to mention in the guidelines, known compatible licenses and
> incompatible ones. Additional guidelines on what are the general
> characteristics of a compatible license e.g., free to use, no restriction
> of making code open etc.. because there may be depdencies that don't use
> well known licenses but still fit the bill of a compatible license.
>
> Thanks
>
> On Wed, Mar 29, 2017 at 9:43 AM, Thomas Weise  wrote:
>
> > +1 in general more attention to licensing is needed and we may need to
> > update the contributor guidelines also.
> >
> > Can you create a JIRA with fix version set to 3.8.0 ?
> >
> >
> > On Wed, Mar 29, 2017 at 9:34 AM, Vlad Rozov 
> > wrote:
> >
> > > There are few samples and the benchmark application in Malhar that
> depend
> > > on libraries licensed under Category X. All such dependencies need to
> be
> > > either optional, be replaced with libraries that are compatible with
> > Apache
> > > license or be removed. Any newly introduced dependency should be either
> > > compatible with the Apache license or be optional.
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > >
> >
>

Re: PR merge policy

2017-03-24 Thread Amol Kekre

Pramod,
That is a good idea. A timeout will help PR be more efficient.

Thks
Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com

*Join us at Apex Big Data World Mt View
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Fri, Mar 24, 2017 at 11:42 AM, Pramod Immaneni 
wrote:

> For the PR part with multiple reviewers, I suggest we pick a convention
> like the main reviewer has to wait till all the other reviewers say
> something like LGTM or a timeout period like 2 days before merging it. This
> will remove ambiguity, especially in cases where reviewers come in
> later after review has been happening for a while and it is unclear whether
> they have started the review and will have more comments or are just are
> making singular comments. We, of course, do not want to encourage the
> behavior and would prefer interested folks join the review process early
> but in reality, these situations happen.
>
> On Fri, Mar 24, 2017 at 8:38 AM, Thomas Weise  wrote:
>
> > +1
> >
> > There are also cases where PRs have been under review by multiple people
> > that are suddenly unilaterally rebased and merged.
> >
> > Furthermore those that review and merge should follow the contributor
> > guidelines (or improve them). For example, JIRAs are supposed to be
> > resolved and marked with the fix version when the PR is merged.
> >
> > Thomas
> >
> >
> >
> >
> >
> >
> > On Thu, Mar 23, 2017 at 12:58 PM, Vlad Rozov 
> > wrote:
> >
> > > Lately there were few instances where PR open against apex-core and
> > > apex-malhar were merged just few hours after it being open and JIRA
> being
> > > raised without giving chance for other contributors to review and
> > comment.
> > > I'd suggest that we stop such practice no matter how trivial those
> > changes
> > > are. This equally applies to documentation. In a rear cases where PR is
> > > urgent (for example one that fixes compilation error), I'd suggest
> that a
> > > committer who plans to merge the PR sends an explicit notification to
> > > dev@apex and gives others a reasonable time to respond.
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > >
> >
>

Re: Operator Node Affinity

2017-03-02 Thread Amol Kekre

Ilya,
Put all nodes on the node-balancer list, and only the ones that get the
operator-jvm will respond to load-balancer's status url. One place where
you have to tweek "do not depend on host/port of distributed OS" is the
port number. I believe the load-balancer will use is fixed. You could use a
proxy that periodically figures out the port,host and redirects, but then
you have an extra hardware hop in between (uptime issue?) that negates the
load-balancer play a little. You could do two-proxy servers solution.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Thu, Mar 2, 2017 at 8:59 AM, Ganelin, Ilya <ilya.gane...@capitalone.com>
wrote:

> Thanks – the solution I’m leaning towards is to deploy a load balancer
> with a list of the nodes in the cluster, once Apex spins up, the load
> balancer should be able to establish connections to the deployed operators
> and route data appropriately.
>
> - Ilya Ganelin
>
>
> On 3/2/17, 8:34 AM, "Amol Kekre" <a...@datatorrent.com> wrote:
>
> Ilya,
> As Thomas says, attaching a JVM to an operator is do-able, but is
> against
> the norm in a distributed cluster. A distributed OS cannot guarantee a
> node. It could be down or not have resources,  ZK way or any other
> way
> to discover post deployment is the way to go. I think a webservice call
> through Stram to get the specifics will work too.
>
> Thks
> Amol
>
>
>
> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>
> www.datatorrent.com  |  apex.apache.org
>
> *Join us at Apex Big Data World-San Jose
> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> [image: http://www.apexbigdata.com/san-jose-register.html]
> <http://www.apexbigdata.com/san-jose-register.html>
>
> On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise <t...@apache.org> wrote:
>
> > If I understand it correctly you want to run a server in an operator,
> > discover its endpoint and push data to it? The preferred way of
> doing that
> > would be to announce the endpoint through a discovery mechanism
> (such as
> > ZooKeeper or a shared file) that the upstream entity can use to find
> the
> > endpoint.
> >
> > If you are looking for a way to force deploy on a specific node,
> then have
> > a look at the OperatorContext.LOCALITY_HOST attribute (and also
> > AffinityRulesTest). AFAIK you can use a specific host name and the
> > scheduler will make best effort to get a container on that host, but
> there
> > isn't a guarantee. Generally, services running on the cluster
> shouldn't
> > make assumptions about hosts and ports and use discovery instead.
> >
> > HTH,
> > Thomas
> >
> >
> > On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <
> ilya.gane...@capitalone.com
> > >
> > wrote:
> >
> > > Hello, all – is there any way to deploy a given operator to a
> specific
> > > Node? E.g. if I’m trying to create a listener for a TCP socket
> that can
> > > then pipe data to a DAG, is there any way for the location of that
> > listener
> > > to be deterministic so an upstream entity knows what to connect to?
> > >
> > >
> > >
> > > - Ilya Ganelin
> > >
> > > [image: id:image001.png@01D1F7A4.F3D42980]
> > >
> > > --
> > >
> > > The information contained in this e-mail is confidential and/or
> > > proprietary to Capital One and/or its affiliates and may only be
> used
> > > solely in performance of work or services for Capital One. The
> > information
> > > transmitted herewith is intended only for use by the individual or
> entity
> > > to which it is addressed. If the reader of this message is not the
> > intended
> > > recipient, you are hereby notified that any review, retransmission,
> > > dissemination, distribution, copying or other use of, or taking of
> any
> > > action in reliance upon this information is strictly prohibited.
> If you
> > > have received this communication in error, please contact the
> sender and
> >

Re: Operator Node Affinity

2017-03-02 Thread Amol Kekre

Ilya,
As Thomas says, attaching a JVM to an operator is do-able, but is against
the norm in a distributed cluster. A distributed OS cannot guarantee a
node. It could be down or not have resources,  ZK way or any other way
to discover post deployment is the way to go. I think a webservice call
through Stram to get the specifics will work too.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise  wrote:

> If I understand it correctly you want to run a server in an operator,
> discover its endpoint and push data to it? The preferred way of doing that
> would be to announce the endpoint through a discovery mechanism (such as
> ZooKeeper or a shared file) that the upstream entity can use to find the
> endpoint.
>
> If you are looking for a way to force deploy on a specific node, then have
> a look at the OperatorContext.LOCALITY_HOST attribute (and also
> AffinityRulesTest). AFAIK you can use a specific host name and the
> scheduler will make best effort to get a container on that host, but there
> isn't a guarantee. Generally, services running on the cluster shouldn't
> make assumptions about hosts and ports and use discovery instead.
>
> HTH,
> Thomas
>
>
> On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya  >
> wrote:
>
> > Hello, all – is there any way to deploy a given operator to a specific
> > Node? E.g. if I’m trying to create a listener for a TCP socket that can
> > then pipe data to a DAG, is there any way for the location of that
> listener
> > to be deterministic so an upstream entity knows what to connect to?
> >
> >
> >
> > - Ilya Ganelin
> >
> > [image: id:image001.png@01D1F7A4.F3D42980]
> >
> > --
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>

Re: APEXCORE-619 Recovery windowId in future during application relaunch.

2017-03-01 Thread Amol Kekre

hmm! the fact that commitWindowId has moved up (right now in memory of
Stram) should mean that a complete set of checkpoints are available, i.e
commitWindowId can be derived. Lets say that next checkpoint window also
gets checkpointed across the app, commitwindowID is in memory but not
written to stram-state yet, then upon relaunch the latest commitwindowID
should get computed correctly.

This may be just about setting stateless operators to commitWindowid on
re-launch? aka bug/feature?

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Wed, Mar 1, 2017 at 1:41 PM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

> Do we need to save committedWindowId? Can't it be computed from existing
> checkpoints by walking through the DAG. We probably do this anyway and I
> suspect there is a minor bug somewhere in there. If an operator is
> stateless you could assume checkpoint as long max for sake of computation
> and compute the committed window to be the lowest common checkpoint. If
> they are all stateless and you end up with long max you can start with
> window id that reflects the current timestamp.
>
> Thanks
>
> On Wed, Mar 1, 2017 at 1:09 PM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > CommitWindowId could be computed from the existing checkpoints. That
> > solution still needs purge to be done after commitWindowId is confirmed
> to
> > be saved in Stram state. Without ths the commitWindowId computed from the
> > checkpoints may have some checkpoints missing.
> >
> > Thks
> > Amol
> >
> >
> >
> > E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
> >
> > www.datatorrent.com  |  apex.apache.org
> >
> > *Join us at Apex Big Data World-San Jose
> > <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> > [image: http://www.apexbigdata.com/san-jose-register.html]
> > <http://www.apexbigdata.com/san-jose-register.html>
> >
> > On Wed, Mar 1, 2017 at 12:36 PM, Pramod Immaneni <pra...@datatorrent.com
> >
> > wrote:
> >
> > > Can't the commitedWindowId be calculated by looking at the physical
> plan
> > > and the existing checkpoints?
> > >
> > > On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi <tus...@apache.org>
> wrote:
> > >
> > > > Help Needed for APEXCORE-619
> > > >
> > > > Issue : When application is relaunched after long time with stateless
> > > > opeartors at the end of the DAG, the stateless operators starts with
> a
> > > very
> > > > high windowId. In this case the stateless operator ignors all the
> data
> > > > received till upstream operator catches up with it. This breaks the
> > > > *at-least-once* gaurantee while relaunch of the opeartor or when
> master
> > > is
> > > > killed and application is restarted.
> > > >
> > > > Solutions:
> > > > - Fix windowId for stateless leaf operators from upstream opeartor.
> But
> > > it
> > > > has some issues when we have a join with two upstrams operators at
> > > > different windowId. If we set the windowID to min(upstream windowId),
> > > then
> > > > we need to again recalulate the new recovery window ids for upstream
> > > paths
> > > > from this operators.
> > > >
> > > > - Other solution is to create a empty file in checkpoint directory
> for
> > > > stateless operators. This will help us to identify the checkpoints of
> > > > stateless operators during relaunch instead of computing from latest
> > > > timestamp.
> > > >
> > > > - Bring the entire DAG to committedWindowId. This could be achived
> > using
> > > > writing committedWindowId in a journal. we need to make sure that we
> > are
> > > > not puring the checkpointed state until the committedWundowId is
> saved
> > in
> > > > journal.
> > > >
> > > > Let me know your thoughs on this and preferred solution.
> > > >
> > > > Regards,
> > > > -Tushar.
> > > >
> > >
> >
>

Re: APEXCORE-619 Recovery windowId in future during application relaunch.

2017-03-01 Thread Amol Kekre

CommitWindowId could be computed from the existing checkpoints. That
solution still needs purge to be done after commitWindowId is confirmed to
be saved in Stram state. Without ths the commitWindowId computed from the
checkpoints may have some checkpoints missing.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Wed, Mar 1, 2017 at 12:36 PM, Pramod Immaneni 
wrote:

> Can't the commitedWindowId be calculated by looking at the physical plan
> and the existing checkpoints?
>
> On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi  wrote:
>
> > Help Needed for APEXCORE-619
> >
> > Issue : When application is relaunched after long time with stateless
> > opeartors at the end of the DAG, the stateless operators starts with a
> very
> > high windowId. In this case the stateless operator ignors all the data
> > received till upstream operator catches up with it. This breaks the
> > *at-least-once* gaurantee while relaunch of the opeartor or when master
> is
> > killed and application is restarted.
> >
> > Solutions:
> > - Fix windowId for stateless leaf operators from upstream opeartor. But
> it
> > has some issues when we have a join with two upstrams operators at
> > different windowId. If we set the windowID to min(upstream windowId),
> then
> > we need to again recalulate the new recovery window ids for upstream
> paths
> > from this operators.
> >
> > - Other solution is to create a empty file in checkpoint directory for
> > stateless operators. This will help us to identify the checkpoints of
> > stateless operators during relaunch instead of computing from latest
> > timestamp.
> >
> > - Bring the entire DAG to committedWindowId. This could be achived using
> > writing committedWindowId in a journal. we need to make sure that we are
> > not puring the checkpointed state until the committedWundowId is saved in
> > journal.
> >
> > Let me know your thoughs on this and preferred solution.
> >
> > Regards,
> > -Tushar.
> >
>

Re: APEXCORE-619 Recovery windowId in future during application relaunch.

2017-03-01 Thread Amol Kekre

The third option should be it.
1. On relaunch the DAG should start at commitWindowId
2. Pruning of checkpoints should only happen after committedWindowId is
written by Stram state

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi  wrote:

> Help Needed for APEXCORE-619
>
> Issue : When application is relaunched after long time with stateless
> opeartors at the end of the DAG, the stateless operators starts with a very
> high windowId. In this case the stateless operator ignors all the data
> received till upstream operator catches up with it. This breaks the
> *at-least-once* gaurantee while relaunch of the opeartor or when master is
> killed and application is restarted.
>
> Solutions:
> - Fix windowId for stateless leaf operators from upstream opeartor. But it
> has some issues when we have a join with two upstrams operators at
> different windowId. If we set the windowID to min(upstream windowId), then
> we need to again recalulate the new recovery window ids for upstream paths
> from this operators.
>
> - Other solution is to create a empty file in checkpoint directory for
> stateless operators. This will help us to identify the checkpoints of
> stateless operators during relaunch instead of computing from latest
> timestamp.
>
> - Bring the entire DAG to committedWindowId. This could be achived using
> writing committedWindowId in a journal. we need to make sure that we are
> not puring the checkpointed state until the committedWundowId is saved in
> journal.
>
> Let me know your thoughs on this and preferred solution.
>
> Regards,
> -Tushar.
>

Re: example applications in malhar

2017-02-23 Thread Amol Kekre

Yes, should merge samples into examples. Ideally the names of the examples
should be more descriptive in terms of "how to" as opposed to a title. PI
demo for example has lots of ways to do compute. So it could be named as
"pi - distributed compute" in examples. Similarly if others can be named to
bring out features to look at from the examples, it would make examples
more useful to readers.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Thu, Feb 23, 2017 at 10:07 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> + for renaming to examples. While we are at it, how about merging "samples"
> also in the new "examples" ?
>
> On Thu, Feb 23, 2017 at 9:47 AM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > +1 for renaming to "examples"
> >
> > Ram
> >
> > On Thu, Feb 23, 2017 at 9:12 AM, Lakshmi Velineni <
> laks...@datatorrent.com
> > >
> > wrote:
> >
> > > I am ready to bring the examples over into the demos folder. I was
> > > wondering if anybody has any input on Thomas's suggestion to rename the
> > > demos folder to examples. I would rather do that first and then bring
> the
> > > examples over instead of doing it the other way around as that would
> lead
> > > to refactoring the new examples again.
> > >
> > > Thanks
> > >
> > > On Wed, Jan 25, 2017 at 8:12 AM, Lakshmi Velineni <
> > laks...@datatorrent.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Since the examples have little history I was planning to have two
> > > > commits for every example, one for the code as the primary author of
> > > > the example and another containing pom.xml and other changes to make
> > > > it work under malhar.
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Nov 2, 2016 at 9:49 PM, Lakshmi Velineni
> > > > <laks...@datatorrent.com> wrote:
> > > > > Thanks for the suggestions and I am working on the process to
> migrate
> > > the
> > > > > examples with the guidelines you mentioned. I will send out a list
> of
> > > > > examples and the destination modules very soon.
> > > > >
> > > > >
> > > > > On Thu, Oct 27, 2016 at 1:43 PM, Thomas Weise <
> > thomas.we...@gmail.com>
> > > > > wrote:
> > > > >>
> > > > >> Maybe a good first step would be to identify which examples to
> bring
> > > > over
> > > > >> and where appropriate how to structure them in Malhar (for
> example,
> > I
> > > > see
> > > > >> multiple hdfs related apps that could go into the same Maven
> > module).
> > > > >>
> > > > >>
> > > > >> On Tue, Oct 25, 2016 at 1:00 PM, Thomas Weise <t...@apache.org>
> > wrote:
> > > > >>
> > > > >> > That would be great. There are a few things to consider when
> > working
> > > > on
> > > > >> > it:
> > > > >> >
> > > > >> > * preserve attribtion
> > > > >> > * ensure there is a test that runs the application in the CI
> > > > >> > * check that dependencies are compatible license
> > > > >> > * maybe extract common boilerplate code from pom.xml
> > > > >> >
> > > > >> > etc.
> > > > >> >
> > > > >> > Existing examples are under https://github.com/apache/
> > > > >> > apex-malhar/tree/master/demos
> > > > >> >
> > > > >> > Perhaps we should rename it to "examples"
> > > > >> >
> > > > >> > I also propose that each app has a README and we add those for
> > > > existing
> > > > >> > apps as well.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Thomas
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Oct 2

Re: example applications in malhar

2017-02-23 Thread Amol Kekre

+1 on renaming it to examples. Additionally we should ensure that the word
"demo" is not used at all. The word "demos" is very sales oriented and
should not be used. "Examples" are what folks learn from.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Thu, Feb 23, 2017 at 9:47 AM, Munagala Ramanath <r...@datatorrent.com>
wrote:

> +1 for renaming to "examples"
>
> Ram
>
> On Thu, Feb 23, 2017 at 9:12 AM, Lakshmi Velineni <laks...@datatorrent.com
> >
> wrote:
>
> > I am ready to bring the examples over into the demos folder. I was
> > wondering if anybody has any input on Thomas's suggestion to rename the
> > demos folder to examples. I would rather do that first and then bring the
> > examples over instead of doing it the other way around as that would lead
> > to refactoring the new examples again.
> >
> > Thanks
> >
> > On Wed, Jan 25, 2017 at 8:12 AM, Lakshmi Velineni <
> laks...@datatorrent.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Since the examples have little history I was planning to have two
> > > commits for every example, one for the code as the primary author of
> > > the example and another containing pom.xml and other changes to make
> > > it work under malhar.
> > >
> > > Thanks
> > >
> > > On Wed, Nov 2, 2016 at 9:49 PM, Lakshmi Velineni
> > > <laks...@datatorrent.com> wrote:
> > > > Thanks for the suggestions and I am working on the process to migrate
> > the
> > > > examples with the guidelines you mentioned. I will send out a list of
> > > > examples and the destination modules very soon.
> > > >
> > > >
> > > > On Thu, Oct 27, 2016 at 1:43 PM, Thomas Weise <
> thomas.we...@gmail.com>
> > > > wrote:
> > > >>
> > > >> Maybe a good first step would be to identify which examples to bring
> > > over
> > > >> and where appropriate how to structure them in Malhar (for example,
> I
> > > see
> > > >> multiple hdfs related apps that could go into the same Maven
> module).
> > > >>
> > > >>
> > > >> On Tue, Oct 25, 2016 at 1:00 PM, Thomas Weise <t...@apache.org>
> wrote:
> > > >>
> > > >> > That would be great. There are a few things to consider when
> working
> > > on
> > > >> > it:
> > > >> >
> > > >> > * preserve attribtion
> > > >> > * ensure there is a test that runs the application in the CI
> > > >> > * check that dependencies are compatible license
> > > >> > * maybe extract common boilerplate code from pom.xml
> > > >> >
> > > >> > etc.
> > > >> >
> > > >> > Existing examples are under https://github.com/apache/
> > > >> > apex-malhar/tree/master/demos
> > > >> >
> > > >> > Perhaps we should rename it to "examples"
> > > >> >
> > > >> > I also propose that each app has a README and we add those for
> > > existing
> > > >> > apps as well.
> > > >> >
> > > >> > Thanks,
> > > >> > Thomas
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Oct 25, 2016 at 12:49 PM, Lakshmi Velineni <
> > > >> > laks...@datatorrent.com> wrote:
> > > >> >
> > > >> >>   Can i work on this?
> > > >> >>
> > > >> >> Thanks
> > > >> >> Lakshmi Prasanna
> > > >> >>
> > > >> >> On Mon, Sep 12, 2016 at 9:41 PM, Ashwin Chandra Putta <
> > > >> >> ashwinchand...@gmail.com> wrote:
> > > >> >>
> > > >> >> > Here is the JIRA:
> > > >> >> > https://issues.apache.org/jira/browse/APEXMALHAR-2233
> > > >> >> >
> > > >> >> > On Tue, Sep 6, 2016 at 10:20 PM, Amol Kekre <
> > a...@datatorrent.com>
> > > >> &g

Re: Redshift Output Operator

2017-02-21 Thread Amol Kekre

Chaitanya,
This is good first cut. Post this work, do take a look at loading data
before file rotation.

Thks
Amol


*Follow @amolhkekre*
*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Mon, Feb 20, 2017 at 10:56 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Created JIRA for this task: APEXMALHAR-2416
>
> On Mon, Feb 13, 2017 at 4:14 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
> > Hi All,
> >
> >   I am proposing Amazon Redshift output module.
> >   Please refer below link about the Redshift: https://aws.amazon.com/
> > redshift/
> >
> >   Primary functionality of this module is load data into Redshift tables
> > from data files using copy command. Refer the below link about the copy
> > command:
> > http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
> >
> > Input type to this module is byte[].
> >
> >   I am proposing the below design:
> > 1) Write the tuples into EMR/S3. By default, it writes to S3.
> > 2) Once the file is rolled, upload the file into Redshift using copy
> > command.
> >
> > Please share your thoughts on design.
> >
> > Regards,
> > Chaitanya
> >
>
>
>
> --
>
> *Chaitanya*
>
> Software Engineer
>
> E: chaita...@datatorrent.com | Twitter: @chaithu1403
>
> www.datatorrent.com  |  apex.apache.org
>

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-18 Thread Amol Kekre

Bhupesh,
That is true, but in reality watermarks do not solve a design problem in
the DAG where data is getting mixed up. All the watermarks do is to convey
"start" and "end" within the stream. The start and end control tuples
should have the physical operator id, + a monotonically increasing number.
Both these are inserted by engine and are not user supplied, i.e. engine
takes up the guarantees of idenfying these watermarks. This concept is same
as our current start-window and end-window (which has worked well).

Today Apex does not have watermarks, and lets say I am sending "start
something", "end something" through another port. I will still need to not
mix data in a transform operator down stream. That problem exist today and
will continue. Putting filename on every tuple is too much of a performance
hit. Secondly a lot of batch operations are not file related (i.e. file to
file), they are collection of "data" split into part files (due to
performance reason) and grouping/dimensions/event time/... are done based
on internals of the file. In case of file to file copy, user should be
expected to route the data properly (parallel partition?).

Event-time based watermarks needs a separate thread. I am certain that
engine will need to be event-time aware, and will need to take this into
account for proper layout.

Thks
Amol

*Follow @amolhkekre*
*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Sat, Feb 18, 2017 at 8:17 AM, Bhupesh Chawda <bhup...@datatorrent.com>
wrote:

> Amol, agreed. We can address event time based watermarks once file batch is
> done.
> Regarding, file batch support: by allowing to partition an input (file)
> operator, we are implicitly mixing multiple batches. Even if the user does
> not do any transformations, we should be able to write the correct data to
> right files at the destination.
>
> ~ Bhupesh
>
>
> ___
>
> Bhupesh Chawda
>
> Software Engineer
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Sat, Feb 18, 2017 at 12:26 PM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > Thomas,
> > The watermarks we have in Apex (start-window and end-window) are working
> > good. It is fine to take a look at event time, but basic file I/O does
> not
> > need anything more than start and end. Lets say they are start-something,
> > end-something. The main difference here is that the tuples are user
> > generated, other than that they should follow similar principle as
> > start-window & end-window. The commonality includes
> > - dedup of start-st and end-st
> > - First start-st passes through
> > - Last end-st passes through
> > - Engine indentifies them with chronologically increasing number and
> source
> >
> > The only main difference is that an emit of these is user controlled and
> > cannot be guaranteed to happen as such. BTW, part files are rarely done
> > based on event time, they are almost always split by size. A vast
> majority
> > of batch cases have hourly files bound by arrival time and not event
> time.
> >
> > Bhupesh,
> > Attaching file names to tuples does not scale. If user mixes two batches,
> > then the user would need to handle the transformations. Post file batch
> > support, we should look at event time support. Unlike file based batches,
> > event time will overlap each other, i.e. at a given time at least two (if
> > not more) event times will be active. I think the engine will need to be
> > event time aware.
> >
> > Thks
> > Amol
> >
> >
> >
> > *Follow @amolhkekre*
> > *Join us at Apex Big Data World-San Jose
> > <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> > [image: http://www.apexbigdata.com/san-jose-register.html]
> > <http://www.apexbigdata.com/san-jose-register.html>
> >
> > On Wed, Feb 15, 2017 at 9:07 PM, Thomas Weise <t...@apache.org> wrote:
> >
> > > I don't think this should be designed based on a simplistic file
> > > input-output scenario. It would be good to include a stateful
> > > transformation based on event time.
> > >
> > > More complex pipelines contain stateful transformations that depend on
> > > windowing and watermarks. I think we need a watermark concept that is
> > based
> > > on progress in event time (or other monotonic increasing sequence) that
> > > other operators

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-18 Thread Amol Kekre

Thomas,
I believe Bhupesh's proposal is to have a monotonically increasing
watermark and filename as extra information. The usage of "file start" may
have caused confusion. I agree, we do not need explicit "file start"
watermark. I am at loss of words, maybe "start "->"end
"; and then a "final-all done" watermarks.

Thks
Amol


*Follow @amolhkekre*
*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Sat, Feb 18, 2017 at 8:54 AM, Thomas Weise  wrote:

> Hi Bhupesh,
>
> I think this needs a generic watermark concept that is independent of
> source and destination and can be understood by intermediate
> transformations. File names don't meet this criteria.
>
> One possible approach is to have a monotonic increasing file sequence
> (instead of time, if it is not applicable) that can be mapped to watermark.
> You can still tag on the file name to the control tuple as extra
> information so that a file output operator that understands it can do
> whatever it wants with it. But it should also work without it, let's say
> when we write the output to the console.
>
> The key here is that you can demonstrate that an intermediate stateful
> transformation will work. I would suggest to try wordcount per input file
> with the window operator that emits the counts at file boundary, without
> knowing anything about files.
>
> Thanks,
> Thomas
>
>
> On Sat, Feb 18, 2017 at 8:04 AM, Bhupesh Chawda 
> wrote:
>
> > Hi Thomas,
> >
> > For an input operator which is supposed to generate watermarks for
> > downstream operators, I can think about the following watermarks that the
> > operator can emit:
> > 1. Time based watermarks (the high watermark / low watermark)
> > 2. Number of tuple based watermarks (Every n tuples)
> > 3. File based watermarks (Start file, end file)
> > 4. Final watermark
> >
> > File based watermarks seem to be applicable for batch (file based) as
> well,
> > and hence I thought of looking at these first. Does this seem to be in
> line
> > with the thought process?
> >
> > ~ Bhupesh
> >
> >
> >
> > ___
> >
> > Bhupesh Chawda
> >
> > Software Engineer
> >
> > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> >
> > www.datatorrent.com  |  apex.apache.org
> >
> >
> >
> > On Thu, Feb 16, 2017 at 10:37 AM, Thomas Weise  wrote:
> >
> > > I don't think this should be designed based on a simplistic file
> > > input-output scenario. It would be good to include a stateful
> > > transformation based on event time.
> > >
> > > More complex pipelines contain stateful transformations that depend on
> > > windowing and watermarks. I think we need a watermark concept that is
> > based
> > > on progress in event time (or other monotonic increasing sequence) that
> > > other operators can generically work with.
> > >
> > > Note that even file input in many cases can produce time based
> > watermarks,
> > > for example when you read part files that are bound by event time.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Wed, Feb 15, 2017 at 4:02 AM, Bhupesh Chawda <
> bhup...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > For better understanding the use case for control tuples in batch, I
> > am
> > > > creating a prototype for a batch application using File Input and
> File
> > > > Output operators.
> > > >
> > > > To enable basic batch processing for File IO operators, I am
> proposing
> > > the
> > > > following changes to File input and output operators:
> > > > 1. File Input operator emits a watermark each time it opens and
> closes
> > a
> > > > file. These can be "start file" and "end file" watermarks which
> include
> > > the
> > > > corresponding file names. The "start file" tuple should be sent
> before
> > > any
> > > > of the data from that file flows.
> > > > 2. File Input operator can be configured to end the application
> after a
> > > > single or n scans of the directory (a batch). This is where the
> > operator
> > > > emits the final watermark (the end of application control tuple).
> This
> > > will
> > > > also shutdown the application.
> > > > 3. The File output operator handles these control tuples. "Start
> file"
> > > > initializes the file name for the incoming tuples. "End file"
> watermark
> > > > forces a finalize on that file.
> > > >
> > > > The user would be able to enable the operators to send only those
> > > > watermarks that are needed in the application. If none of the options
> > are
> > > > configured, the operators behave as in a streaming application.
> > > >
> > > > There are a few challenges in the implementation where the input
> > operator
> > > > is partitioned. In this case, the correlation between the start/end
> > for a
> > > > file and the data tuples for that file is lost. Hence we need to
>

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-17 Thread Amol Kekre

Thomas,
The watermarks we have in Apex (start-window and end-window) are working
good. It is fine to take a look at event time, but basic file I/O does not
need anything more than start and end. Lets say they are start-something,
end-something. The main difference here is that the tuples are user
generated, other than that they should follow similar principle as
start-window & end-window. The commonality includes
- dedup of start-st and end-st
- First start-st passes through
- Last end-st passes through
- Engine indentifies them with chronologically increasing number and source

The only main difference is that an emit of these is user controlled and
cannot be guaranteed to happen as such. BTW, part files are rarely done
based on event time, they are almost always split by size. A vast majority
of batch cases have hourly files bound by arrival time and not event time.

Bhupesh,
Attaching file names to tuples does not scale. If user mixes two batches,
then the user would need to handle the transformations. Post file batch
support, we should look at event time support. Unlike file based batches,
event time will overlap each other, i.e. at a given time at least two (if
not more) event times will be active. I think the engine will need to be
event time aware.

Thks
Amol



*Follow @amolhkekre*
*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Wed, Feb 15, 2017 at 9:07 PM, Thomas Weise  wrote:

> I don't think this should be designed based on a simplistic file
> input-output scenario. It would be good to include a stateful
> transformation based on event time.
>
> More complex pipelines contain stateful transformations that depend on
> windowing and watermarks. I think we need a watermark concept that is based
> on progress in event time (or other monotonic increasing sequence) that
> other operators can generically work with.
>
> Note that even file input in many cases can produce time based watermarks,
> for example when you read part files that are bound by event time.
>
> Thanks,
> Thomas
>
>
> On Wed, Feb 15, 2017 at 4:02 AM, Bhupesh Chawda 
> wrote:
>
> > For better understanding the use case for control tuples in batch, I am
> > creating a prototype for a batch application using File Input and File
> > Output operators.
> >
> > To enable basic batch processing for File IO operators, I am proposing
> the
> > following changes to File input and output operators:
> > 1. File Input operator emits a watermark each time it opens and closes a
> > file. These can be "start file" and "end file" watermarks which include
> the
> > corresponding file names. The "start file" tuple should be sent before
> any
> > of the data from that file flows.
> > 2. File Input operator can be configured to end the application after a
> > single or n scans of the directory (a batch). This is where the operator
> > emits the final watermark (the end of application control tuple). This
> will
> > also shutdown the application.
> > 3. The File output operator handles these control tuples. "Start file"
> > initializes the file name for the incoming tuples. "End file" watermark
> > forces a finalize on that file.
> >
> > The user would be able to enable the operators to send only those
> > watermarks that are needed in the application. If none of the options are
> > configured, the operators behave as in a streaming application.
> >
> > There are a few challenges in the implementation where the input operator
> > is partitioned. In this case, the correlation between the start/end for a
> > file and the data tuples for that file is lost. Hence we need to maintain
> > the filename as part of each tuple in the pipeline.
> >
> > The "start file" and "end file" control tuples in this example are
> > temporary names for watermarks. We can have generic "start batch" / "end
> > batch" tuples which could be used for other use cases as well. The Final
> > watermark is common and serves the same purpose in each case.
> >
> > Please let me know your thoughts on this.
> >
> > ~ Bhupesh
> >
> >
> >
> > On Wed, Jan 18, 2017 at 12:22 AM, Bhupesh Chawda <
> bhup...@datatorrent.com>
> > wrote:
> >
> > > Yes, this can be part of operator configuration. Given this, for a user
> > to
> > > define a batch application, would mean configuring the connectors
> (mostly
> > > the input operator) in the application for the desired behavior.
> > Similarly,
> > > there can be other use cases that can be achieved other than batch.
> > >
> > > We may also need to take care of the following:
> > > 1. Make sure that the watermarks or control tuples are consistent
> across
> > > sources. Meaning an HDFS sink should be able to interpret the watermark
> > > tuple sent out by, say, a JDBC source.
> > > 2. In addition to I/O connectors, we should also look at the need

Re: PojoInnerJoin Accumulation emitting Map

2017-02-15 Thread Amol Kekre

yes it should be POJO

Thks
Amol

*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Wed, Feb 15, 2017 at 7:34 AM, AJAY GUPTA  wrote:

> Yes, it should be emitting a POJO. This POJO can then be further used to
> join with a third POJO stream, thus behaving like a DB join.
>
> It would be best if we can incorporate this into Schema Discovery design.
>
>
> Ajay
>
> On Wed, 15 Feb 2017 at 5:30 PM, Chinmay Kolhatkar 
> wrote:
>
> Dear Community,
>
> Currently PojoInnerJoin accumulation is accepting 2 POJOs but emitting a
> Map.
>
> I think it should be emitting POJO instead of Map.
>
> Please share your thoughts about this.
>
> Thanks,
> Chinmay.
>

Re: [DISCUSS] Custom Control Tuples Design

2017-02-15 Thread Amol Kekre

ith propagation of control tuples is the
> > > operator
> > > > >> developer. I think the clear way for the operator developer to
> > > override
> > > > >> the
> > > > >> propagation behavior is in code and if that is possible there is
> no
> > > need
> > > > >> for other things such as attributes or other port level settings.
> > > > >>
> > > > >> Thomas
> > > > >>
> > > > >>
> > > > >> On Wed, Jan 4, 2017 at 10:20 PM, Bhupesh Chawda <
> > > > bhup...@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >>> I think we all agree on the use case for selective propagation.
> The
> > > > >>> question is about where to have the control - at the operator
> level
> > > or
> > > > at
> > > > >>> the port level.
> > > > >>>
> > > > >>> For this ability, we have the following options:
> > > > >>>
> > > > >>> 1. Operator disables the propagation on selected output
> ports.
> > > > Other
> > > > >>> output ports propagate by default.
> > > > >>> 2. Operator disables propagation for the entire operator (by
> > > means
> > > > of
> > > > >>
> > > > >> an
> > > > >>>
> > > > >>> attribute). Operator developer explicitly emits the received
> > > > control
> > > > >>> tuples
> > > > >>> on selected output ports.
> > > > >>>
> > > > >>> If the decision is to completely block the propagation, then
> > Option 2
> > > > is
> > > > >>> easier to use as just an attribute needs to be set, as opposed to
> > > > Option
> > > > >>> 1
> > > > >>> where user needs to set the annotation on each output port.
> > > > >>>
> > > > >>> However, if selective propagation is needed, Option 1 would just
> > need
> > > > the
> > > > >>> user to disable propagation on certain ports; rest are propagated
> > by
> > > > >>> default, while Option 2 requires the user to explicitly emit the
> > > > control
> > > > >>> tuples.
> > > > >>> ~ Bhupesh
> > > > >>>
> > > > >>>
> > > > >>> On Thu, Jan 5, 2017 at 3:46 AM, Thomas Weise <t...@apache.org>
> > wrote:
> > > > >>>
> > > > >>>> Yes, I think that for any of these cases the operator developer
> > will
> > > > >>
> > > > >> turn
> > > > >>>>
> > > > >>>> of implicit propagation for the operator and then write the code
> > to
> > > > >>
> > > > >> route
> > > > >>>>
> > > > >>>> or create control tuples as needed.
> > > > >>>>
> > > > >>>> Thomas
> > > > >>>>
> > > > >>>> On Wed, Jan 4, 2017 at 12:59 PM, Amol Kekre <
> a...@datatorrent.com
> > >
> > > > >>>
> > > > >>> wrote:
> > > > >>>>>
> > > > >>>>> I agree that by default the propagation must be implicit, i.e.
> if
> > > the
> > > > >>>>> operator does nothing, the control tuple propagates. I do think
> > > users
> > > > >>>>> should have control on deciding to "not propagate" or "create
> > new"
> > > > and
> > > > >>>
> > > > >>> in
> > > > >>>>>
> > > > >>>>> these cases they would need to do something explicit
> (override)?
> > > > >>>>>
> > > > >>>>> The following cases come to mind
> > > > >>>>> 1. Sole consumer of a particular control signal (for example
> end
> > of
> > > > >>>
> > > > >>> file)
> > > > >>>>>
> > > > >>>>> 2. Creator of a particular control signal (start of file, or a
> > > signal
> > >

Re: [DISCUSS] Tweets from ApacheApex handle

2017-02-05 Thread Amol Kekre

Thomas,
The proposal is for credentials to remain in PMC, and in-effect committers
sending a request to tweet. Here some relevant advice on do's and don'ts
from ASF for social media. I have pasted it as is.

   - Feel free to share stories about your project, whether they come from
   the community, tech press, or folks outside of the press and community.
   Avoid sharing negative stories about "competing" projects.
   - Keep posts/reposts relevant to your project community and technology.
   Most people love LOLCats, but it's probably best not to share them from
   your project's social media accounts.
   - Please share event information so long as it's related directly to
   your project. Promoting an event where there are talks about CloudStack
   (for example) is spot-on. Promoting an event only because a vendor that has
   an interest in CloudStack is participating would be outside the scope of
   CloudStack social media accoun

So lets say we find a vendor/customer/someone saying "I used Apex for ...;
here is my use case; .. Apex worked for me"; Or a blog on some nuances of
Apex. These are stories about Apex, why would we not want to tweet/retweet
these? For adoption of Apex, it is good for everyone to know about such
stories. Moreover it is enpowering for committers to know that they can be
eyes-ears of Apex stories.

Thks,
Amol


*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Sun, Feb 5, 2017 at 12:11 PM, Thomas Weise <t...@apache.org> wrote:

> The link that I provided gives context about how projects overall work. I
> find that useful to understand before embarking into further discussion.
> And I also found some more guidance regarding social media activity:
>
> http://www.apache.org/foundation/marks/socialmedia
>
> My preference is that the handle will only tweet official communication
> from the PMC. I looked at a number of other projects that may serve as good
> role models and found that to be the case also.
>
> For retweets, we should probably come up with similar guidelines as for
> listing events, including clean representation of Apache Apex as project,
> vendor neutrality, unrelated to commercial interests.
>
> Thomas
>
>
> On Thu, Feb 2, 2017 at 12:22 PM, Pramod Immaneni <pra...@datatorrent.com>
> wrote:
>
> > I prefer option 2 at least till there is a critical mass of social
> activity
> > and awareness happening outside around apex that we don't need to tweet
> > actively on the community page.
> >
> > Thanks
> >
> > On Thu, Feb 2, 2017 at 10:31 AM, Amol Kekre <a...@datatorrent.com>
> wrote:
> >
> > > Thomas,
> > > The discussion is not about handle being operated outside PMC. It is
> > > perfectly fine to have handle credentials accessible to only PMC
> > members. I
> > > started this discussion to get Apex community opinion on guidelines on
> > what
> > > gets tweeted. The ASF does not have a formal guideline on twitter. So
> far
> > > there are two proposals
> > >
> > > 1. Only do release tweets, and do retweet of tweets on individual
> handle
> > by
> > > requesting PMC on dev@; no special consideration to committers
> > > 2. Have a more open policy on original tweets, and in addition retweet
> of
> > > tweets on individual handle by request PMC on dev@; take up
> committer's
> > > request as semi-binding.
> > >
> > > In #1, the release-tweets are clear, so nothing else go through. In #2
> > PMC
> > > member can do a trust-but-verify that the tweet is relevant to Apex;
> > which
> > > makes is semi-binding, not binding.
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > > On Wed, Feb 1, 2017 at 7:34 PM, Thomas Weise <t...@apache.org> wrote:
> > >
> > > > Note this is follow-up from a previous discussion on the PMC list.
> > > >
> > > > I think the decision on activity for the ApacheApex handle should be
> > with
> > > > the PMC and the handle should be operated by PMC members.
> > > >
> > > > See http://apache.org/foundation/how-it-works.html for more context
> > > about
> > > > roles and responsibilities.
> > > >
> > > > What gets posted on the handle should be evaluated with ASF hat on.
> > > >
> > > > I also think that direct tweets (read on for retweets) should be
> > > restricted
> > > > to official PMC communication (releases etc.).

Re: [DISCUSS] Tweets from ApacheApex handle

2017-02-02 Thread Amol Kekre

Thomas,
The discussion is not about handle being operated outside PMC. It is
perfectly fine to have handle credentials accessible to only PMC members. I
started this discussion to get Apex community opinion on guidelines on what
gets tweeted. The ASF does not have a formal guideline on twitter. So far
there are two proposals

1. Only do release tweets, and do retweet of tweets on individual handle by
requesting PMC on dev@; no special consideration to committers
2. Have a more open policy on original tweets, and in addition retweet of
tweets on individual handle by request PMC on dev@; take up committer's
request as semi-binding.

In #1, the release-tweets are clear, so nothing else go through. In #2 PMC
member can do a trust-but-verify that the tweet is relevant to Apex; which
makes is semi-binding, not binding.

Thks,
Amol

On Wed, Feb 1, 2017 at 7:34 PM, Thomas Weise <t...@apache.org> wrote:

> Note this is follow-up from a previous discussion on the PMC list.
>
> I think the decision on activity for the ApacheApex handle should be with
> the PMC and the handle should be operated by PMC members.
>
> See http://apache.org/foundation/how-it-works.html for more context about
> roles and responsibilities.
>
> What gets posted on the handle should be evaluated with ASF hat on.
>
> I also think that direct tweets (read on for retweets) should be restricted
> to official PMC communication (releases etc.). All other tweets should come
> from individual handles. I like the idea of retweet suggestions on the dev@
> list.
>
> Thomas
>
>
> On Wed, Feb 1, 2017 at 4:43 PM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > I want to see what the community feels about posting tweets on ApacheApex
> > handle. My thoughts are that the committers should have the right to
> post a
> > tweek on ApacheApex. For contributors and anyone else, we could have a
> > mechanism where they request it on dev@apex.apache.org and a committer
> can
> > help out. In essence we treat tweet same as code. I would expect
> committers
> > to ensure that the tweet is related to Apex, I believe that will more or
> > less remain true.
> >
> > Aside from this, I think the release-manager should be expected to post
> > release tweets on ApacheApex handle.
> >
> > Thoughts?
> >
> > Thks
> > Amol
> >
>

Re: relevant conferences and meetups on website

2017-02-01 Thread Amol Kekre

I am ok if there is a volunteer.

Thks
Amol


On Wed, Feb 1, 2017 at 4:34 PM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

> How about asking for volunteers in the community to manually maintain the
> list. How about the specific request to add the conference link to the main
> page?
>
> Thanks
>
> On Wed, Feb 1, 2017 at 10:33 AM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > I agree with Thomas on not putting meetup events. meetup.com does "index
> > by
> > Apex", and the search would simply pick up anyone putting Apex in the
> event
> > and at times they are not relevant to Apex. We get too much clutter. So
> the
> > only way out is to manually select the events, and then take them out
> after
> > they are done. This does not scale well.
> >
> > I however do not agree with a hard filter on vendor content. Apex as a
> > community is losing out on a lot of content that is being created by
> > vendor(s), which helps with adoption. In longer run google search will
> > become more relevant to Apex than apex.apache.org. This is something
> that
> > Apex community should discuss and vote on.
> >
> > Thks
> > Amol
> >
> >
> > On Wed, Feb 1, 2017 at 8:55 AM, Thomas Weise <t...@apache.org> wrote:
> >
> > > Since the conference was beforehand discussed with the PMC I think you
> > > could list it under:
> > >
> > > http://apex.apache.org/community.html#events
> > >
> > > There is already a link to all meetup groups that have "Apache Apex" as
> > > topic.
> > >
> > > As for other meetup references, IMO direct vendor meetup group
> references
> > > don't belong on the project web site.
> > >
> > > Thomas
> > >
> > >
> > >
> > > On Wed, Feb 1, 2017 at 8:27 AM, Pramod Immaneni <
> pra...@datatorrent.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I wanted to discuss a couple of items. First, as many of you are
> > aware, a
> > > > conference specifically for apex, apex big data world 2017, is coming
> > up
> > > in
> > > > the next couple of months. There is one in Pune India and another in
> > San
> > > > Jose, USA. I wanted to see if we can put links/information about it
> on
> > a
> > > > section of the website. Here is the information about the event
> > > >
> > > > http://www.apexbigdata.com/
> > > >
> > > > Other projects like flink, for example, has the details about their
> > > > conference flink forward on their website.
> > > >
> > > > Second, we used to have a section on our website that showed upcoming
> > > > meetups and conferences. Looks like it has been removed. How about we
> > > bring
> > > > it back. It can be in a different form, if there were issues with how
> > we
> > > > did it before.
> > > >
> > > > While we are discussing the second point can we still go ahead and
> put
> > up
> > > > the links to the above conference on the website as soon as possible
> as
> > > the
> > > > events are very close.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

[DISCUSS] Tweets from ApacheApex handle

2017-02-01 Thread Amol Kekre

I want to see what the community feels about posting tweets on ApacheApex
handle. My thoughts are that the committers should have the right to post a
tweek on ApacheApex. For contributors and anyone else, we could have a
mechanism where they request it on dev@apex.apache.org and a committer can
help out. In essence we treat tweet same as code. I would expect committers
to ensure that the tweet is related to Apex, I believe that will more or
less remain true.

Aside from this, I think the release-manager should be expected to post
release tweets on ApacheApex handle.

Thoughts?

Thks
Amol

Re: relevant conferences and meetups on website

2017-02-01 Thread Amol Kekre

I agree with Thomas on not putting meetup events. meetup.com does "index by
Apex", and the search would simply pick up anyone putting Apex in the event
and at times they are not relevant to Apex. We get too much clutter. So the
only way out is to manually select the events, and then take them out after
they are done. This does not scale well.

I however do not agree with a hard filter on vendor content. Apex as a
community is losing out on a lot of content that is being created by
vendor(s), which helps with adoption. In longer run google search will
become more relevant to Apex than apex.apache.org. This is something that
Apex community should discuss and vote on.

Thks
Amol

On Wed, Feb 1, 2017 at 8:55 AM, Thomas Weise  wrote:

> Since the conference was beforehand discussed with the PMC I think you
> could list it under:
>
> http://apex.apache.org/community.html#events
>
> There is already a link to all meetup groups that have "Apache Apex" as
> topic.
>
> As for other meetup references, IMO direct vendor meetup group references
> don't belong on the project web site.
>
> Thomas
>
>
>
> On Wed, Feb 1, 2017 at 8:27 AM, Pramod Immaneni 
> wrote:
>
> > Hi,
> >
> > I wanted to discuss a couple of items. First, as many of you are aware, a
> > conference specifically for apex, apex big data world 2017, is coming up
> in
> > the next couple of months. There is one in Pune India and another in San
> > Jose, USA. I wanted to see if we can put links/information about it on a
> > section of the website. Here is the information about the event
> >
> > http://www.apexbigdata.com/
> >
> > Other projects like flink, for example, has the details about their
> > conference flink forward on their website.
> >
> > Second, we used to have a section on our website that showed upcoming
> > meetups and conferences. Looks like it has been removed. How about we
> bring
> > it back. It can be in a different form, if there were issues with how we
> > did it before.
> >
> > While we are discussing the second point can we still go ahead and put up
> > the links to the above conference on the website as soon as possible as
> the
> > events are very close.
> >
> > Thanks
> >
>

Re: Upgrade Apache Bigtop to Apex Core 3.5.0

2017-01-20 Thread Amol Kekre

+1

Thks
Amol


On Thu, Jan 19, 2017 at 8:55 PM, Priyanka Gugale 
wrote:

> +1
>
> On Fri, Jan 20, 2017 at 9:25 AM, Chinmay Kolhatkar <
> chin...@datatorrent.com>
> wrote:
>
> > Sanjay,
> > Its not a lot of work. Just a version change, but primarily follwing
> apache
> > process for bigtop.
> > Powered by Page of bigtop is here:
> > https://cwiki.apache.org/confluence/display/BIGTOP/Powered+By+Bigtop
> > Looking at the existing content, I would like to know what we can add
> > there.
> >
> > All,
> > Thanks for feedback. I'll start communication on bigtop mailing list for
> > version upgrade.
> >
> >
> >
> > On Fri, Jan 20, 2017 at 12:09 AM, Sanjay Pujare 
> > wrote:
> >
> > > +1 assuming not a lot of work is involved.
> > >
> > > What does it take to add a mention of Apex in
> http://bigtop.apache.org/
> > ?
> > >
> > > On 1/19/17, 8:12 AM, "Pramod Immaneni"  wrote:
> > >
> > > +1
> > >
> > > On Thu, Jan 19, 2017 at 12:53 AM, Chinmay Kolhatkar <
> > > chin...@datatorrent.com
> > > > wrote:
> > >
> > > > Dear Community,
> > > >
> > > > Now that Apex core 3.5.0 is released, is it good time to upgrade
> > > Apache
> > > > Bigtop for Apex to 3.5.0?
> > > >
> > > > -Chinmay.
> > > >
> > >
> > >
> > >
> > >
> >
>

Re: Shutdown of an Apex app

2017-01-18 Thread Amol Kekre

Agreed on input adapter way.

Enforcing all input adapters send shutdown on the same window cannot be
guaranteed, so there will be a corner case where windows are not aligned.
An ask from Stram to send shutdown way in the future reduces the scope of
the corner case, but then defeats the purpose. Even for a single logical
input adapter, all its partitions will need to issue shutdown control tuple.

So a best case attempt to align within at a window should be done.

Thks
Amol


On Wed, Jan 18, 2017 at 12:14 AM, Bhupesh Chawda <bhup...@datatorrent.com>
wrote:

> Yes, Amol. That will be the case for a kill type of shutdown.
> But for graceful shutdown, it is important that each operator in the DAG
> processes exactly the same amount of data. That is the reason the shutdown
> needs to start at input operators and stop their functioning as the first
> step. Thereafter the control tuple can propagate down the DAG and shutdown
> operators as and when they encounter the control tuple.
>
> ~ Bhupesh
>
> On Wed, Jan 18, 2017 at 1:10 PM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > Can be done by sending "shutdown" message via heartbeat to Stram. Then on
> > stram can shutdown the entire app
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Jan 17, 2017 at 11:05 PM, Bhupesh Chawda <
> bhup...@datatorrent.com>
> > wrote:
> >
> > > Yes Ajay, for a graceful shutdown, the data sent out should be
> processed.
> > >
> > > On Wed, Jan 18, 2017 at 12:19 PM, AJAY GUPTA <ajaygit...@gmail.com>
> > wrote:
> > >
> > > > +1 to idea.
> > > >
> > > > Will this ensure downstream operators to process all data received
> > before
> > > > shutdown is called?
> > > > Also, how do we plan to handle cases where 2 sub-DAGs merge to a
> single
> > > > operator somewhere downstream, and an operator in one of the sub-DAGs
> > > sends
> > > > ShutdownException.
> > > >
> > > >
> > > > Ajay
> > > >
> > > > On Wed, Jan 18, 2017 at 12:00 PM, Bhupesh Chawda <
> > > bhup...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > This JIRA is to stop the DAG in a crude manner, based on an error
> > > > > condition. I think this might also need similar functionality as an
> > > error
> > > > > condition can occur anywhere in the DAG.
> > > > >
> > > > > Perhaps we can modify the same JIRA to include a graceful +
> > ungraceful
> > > > > (kill) shutdown from any operator in the DAG.
> > > > >
> > > > > ~ Bhupesh
> > > > >
> > > > > On Wed, Jan 18, 2017 at 11:55 AM, Tushar Gosavi <
> > > tus...@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > I think this would be a great addition for batch use cases or use
> > > > > > cases were DAG needs to be shutdown after detecting some
> > > > > > completion/error condition through the operator. We have one Jira
> > > > > > Opened for such functionality
> > > > > > https://issues.apache.org/jira/browse/APEXCORE-503.
> > > > > >
> > > > > > - Tushar.
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 18, 2017 at 11:45 AM, Bhupesh Chawda
> > > > > > <bhup...@datatorrent.com> wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > Currently we can shutdown an Apex app in the following ways:
> > > > > > > 1. Throw ShutdownException() from *all* the input operators
> > > > > > > 2. Use Apex CLI to shutdown an app using the YARN App Id
> > > > > > >
> > > > > > > I think we should have some way of shutting down an application
> > > from
> > > > > > within
> > > > > > > an operator. It is not always true that the trigger for
> shutdown
> > is
> > > > > sent
> > > > > > by
> > > > > > > the input operator only. Sometimes, an end condition may be
> > > detected
> > > > by
> > > > > > > some operator in the DAG which wants the processing to end.
> Such
> > a
> > > > > > > shutdown, although triggered from some intermediate operator in
> > the
> > > > > DAG,
> > > > > > > should guarantee graceful shut down of the application.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > ~ Bhupesh
> > > > > >
> > > > >
> > > >
> > >
> >
>

[jira] [Commented] (APEXCORE-503) support KillException

2017-01-17 Thread Amol Kekre (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXCORE-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827550#comment-15827550
 ] 

Amol Kekre commented on APEXCORE-503:
-


This is a good idea. We could have a way for operator to inform the Stram to 
initiate graceful shutdown. Then Stram can orchestrate that. A possile ways 
could be to

1. Initiate shutdown via control tuple through input operators. But this causes 
slow operators which are behind to lag till they catch up and then shutdown.
2. Stram issues a shutdown to StramChild (this exists today, aka same result as 
if the shutdown came from a webservice).

> support KillException
> -
>
> Key: APEXCORE-503
> URL: https://issues.apache.org/jira/browse/APEXCORE-503
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sandesh
>
> Current way for Operators to stop the whole app is to use 
> "ShutdownException". But it is considered as a graceful stop. To stop the 
> whole App when an error condition happens, new exception should be supported, 
> called "KillException" or "KillAppException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Sharing jars among different Apex apps on cluster

2017-01-16 Thread Amol Kekre

Bhupesh,
If stram changes the logical dag (runs only the sub-dag) on each runs, it
should be ok.

Thks
Amol


On Mon, Jan 16, 2017 at 9:18 AM, Bhupesh Chawda 
wrote:

> Yes, I thought of that. That way it will be a single application
> throughout.
> Just wanted to see if there can be other options.
>
> ~ Bhupesh
>
> On Mon, Jan 16, 2017 at 9:17 PM, Thomas Weise  wrote:
>
> > Sounds like a fit for https://github.com/apache/apex-core/pull/410 ?
> >
> > On Mon, Jan 16, 2017 at 3:27 AM, Bhupesh Chawda  >
> > wrote:
> >
> > > Hi All,
> > >
> > > We have a use case where I need to launch a number of DAGs on the
> cluster
> > > one after the other in sequence programatically.
> > >
> > > We are using the StramAppLauncher and StramAppFactory classes to
> launch a
> > > DAG programatically on the cluster and adding any third party
> > dependencies
> > > as part of the configuration.
> > >
> > > It is working fine except for the following issue:
> > > Every time a DAG is launched, it copies the dependencies to the
> > application
> > > folder and hence spends a good amount of time before the app actually
> > > starts running. All of the apps I run belong to the same project and
> > hence
> > > don't actually need separate set of jars.
> > >
> > > Is there any way I can make all the applications "share" the jars which
> > are
> > > uploaded when the first application is run?
> > >
> > > Thanks.
> > >
> > > ~ Bhupesh
> > >
> >
>

Re: Contribution Process before PR

2017-01-16 Thread Amol Kekre

I do see folks discussing issues for most part now. For me this does not
look like an issue that is hurting Apex community. I do however want to
discuss cases where code gets blocked and get spilled into larger context.
As a culture and a process, we have a very high overhead that deters
contributions.

Thks
Amol


On Sun, Jan 15, 2017 at 8:50 PM, Pramod Immaneni 
wrote:

> Yes, it will be good to have these points added to the contributor
> guidelines but I also see for the most part folks do bring up issues for
> discussion, try to address concerns and come to a consensus and in
> generally participate in the community. I also think we should have some
> latitude in the process when it comes to bug fixes that are contained and
> don't spill into re-design of components otherwise, the overhead will deter
> contributions especially from folks who are new to the project and want to
> start contributing by fixing low hanging bugs.
>
> Thanks
>
> On Sun, Jan 15, 2017 at 7:50 PM, Thomas Weise  wrote:
>
> > Hi,
> >
> > I want to propose additions to the contributor guidelines that place
> > stronger emphasis on open collaboration and the early part of the
> > contribution process.
> >
> > Specifically, I would like to suggest that *thought process* and *design
> > discussion* are more important than the final code produced. It is
> > necessary to develop the community and invest in the future of the
> project.
> >
> > I start this discussion based on observation over time. I have seen cases
> > (non trivial changes) where code and JIRAs appear at the same time, where
> > the big picture is discussed after the PR is already open, or where
> > information that would be valuable to other contributors or users isn't
> on
> > record.
> >
> > Let's consider a non-trivial change or a feature. It would normally start
> > with engagement on the mailing list to ensure time is well spent and the
> > proposal is welcomed by the community, does not conflict with other
> > initiatives etc.
> >
> > Once that is cleared, we would want to think about design, the how in the
> > larger picture. In many cases that would involve discussion, questions,
> > suggestions, consensus building towards agreed approach. Or maybe it is
> > done through prototyping. In any case, before a PR is raised, it will be
> > good to have as prerequisite that *thought process and approach have been
> > documented*. I would prefer to see that on the JIRA, what do others
> think?
> >
> > Benefits:
> >
> > * Contributor does not waste time and there is no frustration due to a PR
> > being turned down for reasons that could be avoided with upfront
> > communication.
> >
> > * Contributor benefits from suggestions, questions, guidance of those
> with
> > in depth knowledge of particular areas.
> >
> > * Other community members have an opportunity to learn from discussion,
> the
> > knowledge base broadens.
> >
> > * Information gets indexed, user later looking at JIRAs will find
> valuable
> > information on how certain problems were solved that they would never
> > obtain from a PR.
> >
> > The ASF and "Apache Way", a read for the bigger picture with more links
> in
> > it:
> > http://krzysztof-sobkowiak.net/blog/celebrating-17-years-
> > of-the-apache-software-foundation/
> >
> > Looking forward to feedback and discussion,
> > Thomas
> >
>

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Amol Kekre

I agree that by default the propagation must be implicit, i.e. if the
operator does nothing, the control tuple propagates. I do think users
should have control on deciding to "not propagate" or "create new" and in
these cases they would need to do something explicit (override)?

The following cases come to mind
1. Sole consumer of a particular control signal (for example end of file)
2. Creator of a particular control signal (start of file, or a signal to
pause on something etc.)
3. One port on a data pipeline and other port for meta-data pipeline

In the above cases emit will be decided on an output port. #1 is only place
where all output ports will disable the tuple, #2 and #3 most likely will
be selective.

Thks
Amol


On Wed, Jan 4, 2017 at 12:25 PM, Thomas Weise  wrote:

> I think there is (1) implicit propagation just like other control tuples
> where the operator code isn't involved and (2) where the operator developer
> wants to decide how control tuples are created or routed and will receive
> and emit them on the output ports as desired.
>
> I don't see a use case for hybrid approaches? Maybe propagation does not
> need to be tied to ports at all, maybe just by annotation at the operator
> level?
>
> Thomas
>
>
> On Wed, Jan 4, 2017 at 10:59 AM, Bhupesh Chawda 
> wrote:
>
> > Wouldn't having this with output ports give a finer control on the
> > propagation of control tuples?
> > We might have an operator with two output ports each of which creates two
> > different pipelines downstream. We would be able to say that one pipeline
> > gets the control tuples and the other doesn't.
> >
> > ~ Bhupesh
> >
> >
> > On Jan 4, 2017 11:55 PM, "Thomas Weise"  wrote:
> >
> > I'm referring to the operator that needs to make the decision to
> propagate
> > or not. The tuples come from an input port, so it seems appropriate to
> say
> > "don't propagate control tuples from this port". No matter how many
> output
> > ports there are.
> >
> > Output ports are there for an operator to emit new tuples, in the case
> you
> > are discussing you don't emit new control tuples.
> >
> > Thomas
> >
> >
> > On Wed, Jan 4, 2017 at 9:39 AM, Bhupesh Chawda 
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > Are you suggesting an attribute on the input port for controlling the
> > > propagation of control tuples to downstream operators?
> > > I think it should be better to do it on the output port since the
> > decision
> > > to block the propagation will be made at the upstream operator rather
> > than
> > > at the downstream.
> > > Also, we need another way of controlling the propagation at run time
> and
> > > hence I was thinking about the method call on the output port, in
> > addition
> > > to the annotation on the output port (which is the static way).
> > >
> > > Please correct me if I have misunderstood your question.
> > >
> > > ~ Bhupesh
> > >
> > > On Wed, Jan 4, 2017 at 7:26 PM, Thomas Weise  wrote:
> > >
> > > > Wouldn't it be more intuitive to control this with an attribute on
> the
> > > > input port?
> > > >
> > > >
> > > > On Tue, Jan 3, 2017 at 11:06 PM, Bhupesh Chawda <
> > bhup...@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Pramod,
> > > > >
> > > > > I was thinking of a method setPropagateControlTuples(boolean
> > > propagate)
> > > > on
> > > > > the output port of the operator.
> > > > > The operator could disable this in the code at any point of time.
> > > > > Note however that this is to block the propagation of control
> tuples
> > > from
> > > > > upstream. Any control tuples emitted explicitly by the operator
> would
> > > > still
> > > > > be emitted and sent to the downstream operators.
> > > > >
> > > > > Please see
> > > > > https://github.com/apache/apex-core/pull/440/files#diff-
> > > > > 8aa0ca1a3e645fa60e9b376c118c00a3R68
> > > > > in the PR.
> > > > >
> > > > > ~ Bhupesh
> > > > >
> > > > > On Wed, Jan 4, 2017 at 6:53 AM, Pramod Immaneni <
> > > pra...@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > 2 sounds good. Have you thought about what the method would look
> > > like.
> > > > > >
> > > > > > On Sat, Dec 31, 2016 at 8:29 PM, Bhupesh Chawda <
> > > > bhup...@datatorrent.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Yes, that makes sense.
> > > > > > > We have following options:
> > > > > > > 1. Make the annotation false by default and force the user to
> > > forward
> > > > > the
> > > > > > > control tuples explicitly.
> > > > > > > 2. Annotation is true by default and static way of blocking
> stays
> > > as
> > > > it
> > > > > > is.
> > > > > > > We provide another way for blocking programmatically, perhaps
> by
> > > > means
> > > > > of
> > > > > > > another method call on the port.
> > > > > > >
> > > > > > > ~ Bhupesh
> > > > > > >
> > > > > > > On Dec 30, 2016 00:09, "Pramod Immaneni" <
> pra...@datatorrent.com
> > >
> > > > > wrote:
> > > > >

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Amol Kekre

Yes, there is a chance that two output ports will have different send
requirements.

Thks
Amol


On Wed, Jan 4, 2017 at 10:59 AM, Bhupesh Chawda 
wrote:

> Wouldn't having this with output ports give a finer control on the
> propagation of control tuples?
> We might have an operator with two output ports each of which creates two
> different pipelines downstream. We would be able to say that one pipeline
> gets the control tuples and the other doesn't.
>
> ~ Bhupesh
>
>
> On Jan 4, 2017 11:55 PM, "Thomas Weise"  wrote:
>
> I'm referring to the operator that needs to make the decision to propagate
> or not. The tuples come from an input port, so it seems appropriate to say
> "don't propagate control tuples from this port". No matter how many output
> ports there are.
>
> Output ports are there for an operator to emit new tuples, in the case you
> are discussing you don't emit new control tuples.
>
> Thomas
>
>
> On Wed, Jan 4, 2017 at 9:39 AM, Bhupesh Chawda 
> wrote:
>
> > Hi Thomas,
> >
> > Are you suggesting an attribute on the input port for controlling the
> > propagation of control tuples to downstream operators?
> > I think it should be better to do it on the output port since the
> decision
> > to block the propagation will be made at the upstream operator rather
> than
> > at the downstream.
> > Also, we need another way of controlling the propagation at run time and
> > hence I was thinking about the method call on the output port, in
> addition
> > to the annotation on the output port (which is the static way).
> >
> > Please correct me if I have misunderstood your question.
> >
> > ~ Bhupesh
> >
> > On Wed, Jan 4, 2017 at 7:26 PM, Thomas Weise  wrote:
> >
> > > Wouldn't it be more intuitive to control this with an attribute on the
> > > input port?
> > >
> > >
> > > On Tue, Jan 3, 2017 at 11:06 PM, Bhupesh Chawda <
> bhup...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi Pramod,
> > > >
> > > > I was thinking of a method setPropagateControlTuples(boolean
> > propagate)
> > > on
> > > > the output port of the operator.
> > > > The operator could disable this in the code at any point of time.
> > > > Note however that this is to block the propagation of control tuples
> > from
> > > > upstream. Any control tuples emitted explicitly by the operator would
> > > still
> > > > be emitted and sent to the downstream operators.
> > > >
> > > > Please see
> > > > https://github.com/apache/apex-core/pull/440/files#diff-
> > > > 8aa0ca1a3e645fa60e9b376c118c00a3R68
> > > > in the PR.
> > > >
> > > > ~ Bhupesh
> > > >
> > > > On Wed, Jan 4, 2017 at 6:53 AM, Pramod Immaneni <
> > pra...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > 2 sounds good. Have you thought about what the method would look
> > like.
> > > > >
> > > > > On Sat, Dec 31, 2016 at 8:29 PM, Bhupesh Chawda <
> > > bhup...@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Yes, that makes sense.
> > > > > > We have following options:
> > > > > > 1. Make the annotation false by default and force the user to
> > forward
> > > > the
> > > > > > control tuples explicitly.
> > > > > > 2. Annotation is true by default and static way of blocking stays
> > as
> > > it
> > > > > is.
> > > > > > We provide another way for blocking programmatically, perhaps by
> > > means
> > > > of
> > > > > > another method call on the port.
> > > > > >
> > > > > > ~ Bhupesh
> > > > > >
> > > > > > On Dec 30, 2016 00:09, "Pramod Immaneni"  >
> > > > wrote:
> > > > > >
> > > > > > > Bhupesh,
> > > > > > >
> > > > > > > Annotation seems like a static way to stop propagation. Give
> > these
> > > > are
> > > > > > > programmatically generated I would think the operators should
> be
> > > able
> > > > > to
> > > > > > > stop (consume without propagating) programmatically as well.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Thu, Dec 29, 2016 at 8:48 AM, Bhupesh Chawda <
> > > > > bhup...@datatorrent.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Vlad, I am trying out the approach you mentioned
> > regarding
> > > > > > having
> > > > > > > > another interface which allows sinks to put a control tuple.
> > > > > > > >
> > > > > > > > Regarding the delivery of control tuples, here is what I am
> > > > planning
> > > > > to
> > > > > > > do:
> > > > > > > > All the control tuples which are emitted in a particular
> window
> > > are
> > > > > > > > delivered after all the data tuples have been delivered to
> the
> > > > > > respective
> > > > > > > > ports, but before the endWindow() call. The operator can then
> > > > process
> > > > > > the
> > > > > > > > control tuples in that window and can do any finalization in
> > the
> > > > end
> > > > > > > window
> > > > > > > > call. There will be no delivery of control tuples after
> > > endWindow()
> > > > > and
> > > > > > > > before the next

Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Amol Kekre

Stram exclude node should be via Yarn, poison pill is not a good way as it
induces a terminate for wrong reasons.

Thks
Amol


On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath <r...@datatorrent.com>
wrote:

> Could STRAM include a poison pill where it simply exits with diagnostic if
> its host name is blacklisted ?
>
> Ram
>
> On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> > attribute within the app un-enforceable in terms of not deploying master
> on
> > a node.
> >
> > Thks
> > Amol
> >
> >
> > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve <mili...@gmail.com> wrote:
> >
> > > Additionally, this would apply to Stram as well i.e. the master should
> > also
> > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > operators.
> > >
> > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve <mili...@gmail.com>
> wrote:
> > >
> > > > My previous mail explains it, but just forgot to add : -1 to cover
> this
> > > > under anti affinity.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve <mili...@gmail.com>
> > wrote:
> > > >
> > > >> While it is possible to extend anti-affinity to take care of this, I
> > > feel
> > > >> it will cause confusion from a user perspective. As a user, when I
> > think
> > > >> about anti-affinity, what comes to mind right away is a relative
> > > relation
> > > >> between operators.
> > > >>
> > > >> On the other hand, the current ask is not that, but a relation at an
> > > >> application level w.r.t. a node. (Further, we might even think of
> > > extending
> > > >> this at an operator level - which would mean do not deploy an
> operator
> > > on a
> > > >> particular node)
> > > >>
> > > >> We would be better off clearly articulating and allowing users to
> > > >> configure it seperately as against using anti-affinity.
> > > >>
> > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > bhup...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >>> Okay, I think that serves an alternate purpose of detecting any
> newly
> > > >>> gone
> > > >>> bad node and excluding it.
> > > >>>
> > > >>> +1 for covering the original scenario under anti-affinity.
> > > >>>
> > > >>> ~ Bhupesh
> > > >>>
> > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > r...@datatorrent.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > It only takes effect after failures -- no way to exclude from the
> > > >>> get-go.
> > > >>> >
> > > >>> > Ram
> > > >>> >
> > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> bhup...@datatorrent.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > As suggested by Sandesh, the parameter
> > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > exactly
> > > >>> > what
> > > >>> > > is needed.
> > > >>> > > Why would this not work?
> > > >>> > >
> > > >>> > > ~ Bhupesh
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> ~Milind bee at gee mail dot com
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
>

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Amol Kekre

Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
attribute within the app un-enforceable in terms of not deploying master on
a node.

Thks
Amol


On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:

> Additionally, this would apply to Stram as well i.e. the master should also
> not be deployed on these nodes. Not sure if anti-affinity goes beyond
> operators.
>
> On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve  wrote:
>
> > My previous mail explains it, but just forgot to add : -1 to cover this
> > under anti affinity.
> >
> > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve  wrote:
> >
> >> While it is possible to extend anti-affinity to take care of this, I
> feel
> >> it will cause confusion from a user perspective. As a user, when I think
> >> about anti-affinity, what comes to mind right away is a relative
> relation
> >> between operators.
> >>
> >> On the other hand, the current ask is not that, but a relation at an
> >> application level w.r.t. a node. (Further, we might even think of
> extending
> >> this at an operator level - which would mean do not deploy an operator
> on a
> >> particular node)
> >>
> >> We would be better off clearly articulating and allowing users to
> >> configure it seperately as against using anti-affinity.
> >>
> >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> bhup...@datatorrent.com>
> >> wrote:
> >>
> >>> Okay, I think that serves an alternate purpose of detecting any newly
> >>> gone
> >>> bad node and excluding it.
> >>>
> >>> +1 for covering the original scenario under anti-affinity.
> >>>
> >>> ~ Bhupesh
> >>>
> >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath  >
> >>> wrote:
> >>>
> >>> > It only takes effect after failures -- no way to exclude from the
> >>> get-go.
> >>> >
> >>> > Ram
> >>> >
> >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" 
> >>> wrote:
> >>> >
> >>> > > As suggested by Sandesh, the parameter
> >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> exactly
> >>> > what
> >>> > > is needed.
> >>> > > Why would this not work?
> >>> > >
> >>> > > ~ Bhupesh
> >>> > >
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> ~Milind bee at gee mail dot com
> >>
> >
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
> --
> ~Milind bee at gee mail dot com
>

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Amol Kekre

sing
> > > > > > problems for their
> > > > > > app, having a simple way to exclude it would be very helpful
> > > since
> > > > > it gives
> > > > > > them a way
> > > > > > to bypass communication and process issues within their own
> > > > > organization.
> > > > > >
> > > > > > Ram
> > > > > >
> > > > > > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > > > > san...@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > To me both use cases appear to be generic resource
> management
> > > use
> > > > > cases.
> > > > > > > For example, a randomly rebooting node is not good for any
> > > > purpose
> > > > > esp.
> > > > > > > long running apps so it is a bit of a stretch to imagine
> that
> > > > > these nodes
> > > > > > > will be acceptable for some batch jobs in Yarn. So such a
> > node
> > > > > should be
> > > > > > > marked “Bad” or Unavailable in Yarn itself.
> > > > > > >
> > > > > > > Second use case is also typical anti-affinity use case
> which
> > > > > ideally
> > > > > > > should be implemented in Yarn – Milind’s example can also
> > apply
> > > > to
> > > > > > non-Apex
> > > > > > > batch jobs. In any case it looks like Yarn still doesn’t
> have
> > > it
> > > > (
> > > > > > > https://issues.apache.org/jira/browse/YARN-1042) so if
> Apex
> > > > needs
> > > > > it we
> > > > > > > will need to do it ourselves.
> > > > > > >
> > > > > > > On 11/30/16, 10:39 AM, "Munagala Ramanath" <
> > > r...@datatorrent.com>
> > > > > wrote:
> > > > > > >
> > > > > > > But then, what's the solution to the 2 problem
> scenarios
> > > that
> > > > > Milind
> > > > > > > describes ?
> > > > > > >
> > > > > > > Ram
> > > > > > >
> > > > > > > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > > > > > > san...@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I think “exclude nodes” and such is really the job of
> > the
> > > > > resource
> > > > > > > manager
> > > > > > > > i.e. Yarn. So I am not sure taking over some of these
> > > tasks
> > > > > in Apex
> > > > > > > would
> > > > > > > > be very useful.
> > > > > > > >
> > > > > > > > I agree with Amol that apps should be node neutral.
> > > > Resource
> > > > > > > management in
> > > > > > > > Yarn together with fault tolerance in Apex should
> > > minimize
> > > > > the need
> > > > > > > for
> > > > > > > > this feature although I am sure one can find use
> cases.
> > > > > > > >
> > > > > > > >
> > > > > > > > On 11/29/16, 10:41 PM, "Amol Kekre" <
> > > a...@datatorrent.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > We do have this feature in Yarn, but that applies
> > to
> > > > all
> > > > > > > applications.
> > > > > > > > I am
> > > > > > > > not sure if Yarn has anti-affinity. This feature
> > may
> > > be
> > > > > used,
> > > > > > > but in
> > > > > > > > general there is danger is an application taking
> > over
> > > > > resource
> > > > > > > > allocation.
> > > > > > > > Another

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Amol Kekre

I agree, Randomly rebooting node is Yarn issue. Even anti-affinity between
apps should be Yarn in long run. We could contribute the above jira.

Thks
Amol


On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>

Re: "ExcludeNodes" for an Apex application

2016-11-29 Thread Amol Kekre

We do have this feature in Yarn, but that applies to all applications. I am
not sure if Yarn has anti-affinity. This feature may be used, but in
general there is danger is an application taking over resource allocation.
Another quirk is that big data apps should ideally be node-neutral. This is
a good idea, if we are able to carve out something where need is app
specific.

Thks
Amol


On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve  wrote:

> We have seen 2 cases mentioned below, where, it would have been nice if
> Apex allowed us to exclude a node from the cluster for an application.
>
> 1. A node in the cluster had gone bad (was randomly rebooting) and so an
> Apex app should not use it - other apps can use it as they were batch jobs.
> 2. A node is being used for a mission critical app (Could be an Apex app
> itself), but another Apex app which is mission critical should not be using
> resources on that node.
>
> Can we have a way in which, Stram and YARN can coordinate between each
> other to not use a set of nodes for the application. It an be done in 2 way
> s-
>
> 1. Have a list of "exclude" nodes with Stram- when YARN allcates resources
> on either of these, STRAM rejects and gets resources allocated again frm
> YARN
> 2. Have a list of nodes that can be used for an app - This can be a part of
> config. Hwever, I don't think this would be a right way to do so as we will
> need support from YARN as well. Further, this might be difficult to change
> at runtim if need be.
>
> Any thoughts?
>
>
> --
> ~Milind bee at gee mail dot com
>

Re: Apex internal documentation.

2016-11-29 Thread Amol Kekre

+1

Thks
Amol


On Tue, Nov 29, 2016 at 2:31 AM, Mohit Jotwani 
wrote:

> +1
>
> Regards,
> Mohit
>
> On Tue, Nov 29, 2016 at 12:30 PM, Aniruddha Thombare <
> anirud...@datatorrent.com> wrote:
>
> > +1
> > I would also request graphical representation for ease of understanding.
> > I could not find any illustrations about Apex internals.
> > If someone knows, please point me to it.
> > If it's not present, we should have illustrations.
> >
> >
> > Thanks,
> >
> >
> > Aniruddha
> >
> > _
> > Always finding your faults, just like your Mom!
> > #QA
> >
> > On Tue, Nov 29, 2016 at 12:20 PM, Pradeep A. Dalvi 
> > wrote:
> >
> > > +1
> > >
> > > Following might also be good add in Startup of Application:
> > >  - Handling initial communication with StrAM before & after application
> > is
> > > in Running state (trackingURL w/o & w/ SSL and non-secure & secure
> mode)
> > >
> >
>

Re: JSON License and Apache Projects

2016-11-28 Thread amol kekre

Chinmay,
You can do the honor of responding to them so that they know we acted on
their request.

Thks
Amol


On Mon, Nov 28, 2016 at 1:04 PM, David Yan <da...@datatorrent.com> wrote:

> As far as I know, we don't use anything from json.org directly. We use the
> json library from jettison:
> https://mvnrepository.com/artifact/org.codehaus.jettison/jettison.
> From the dependency tree in apex-core, I don't see anything that says
> json.org.
>
> David
>
> On Fri, Nov 25, 2016 at 10:22 AM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > Chinmay
> > +1, Do you want to drive :)
> >
> > Thks
> > Amol
> >
> > On Thu, Nov 24, 2016 at 10:02 PM, Chinmay Kolhatkar <chin...@apache.org>
> > wrote:
> >
> > > Yes... That's the mail.. There are couple if related conversations can
> be
> > > seen here too:
> > > https://lists.apache.org/list.html?legal-disc...@apache.org
> > >
> > > I suggest we take a look at it and do the needful from our end too.
> > >
> > > -Chinmay.
> > >
> > >
> > > On Fri, Nov 25, 2016 at 10:15 AM, Amol Kekre <a...@datatorrent.com>
> > wrote:
> > >
> > > > Chinmay,
> > > > Is this the thread you were looking for?
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > > -- Forwarded message --
> > > > From: Ted Dunning <ted.dunn...@gmail.com>
> > > > Date: Thu, Nov 24, 2016 at 2:28 PM
> > > > Subject: Re: JSON License and Apache Projects
> > > > To: "gene...@incubator.apache.org" <gene...@incubator.apache.org>
> > > >
> > > >
> > > > Stephan,
> > > >
> > > > What you suggest should work (if you add another dependency to
> provide
> > > the
> > > > needed classes).
> > > >
> > > > You have to be careful, however, because your consumers may expect to
> > get
> > > > the full json.org API.
> > > >
> > > > I would suggest that exclusions like this should only be used while
> > your
> > > > direct dependency still has the dependency on json.org. When they
> fix
> > > it,
> > > > you can drop the exclusion and all will be good.
> > > >
> > > >
> > > >
> > > > On Thu, Nov 24, 2016 at 2:21 AM, Stephan Ewen <se...@apache.org>
> > wrote:
> > > >
> > > > > Just to be on the safe side:
> > > > >
> > > > > If project X depends on another project Y that uses json.org (and
> > thus
> > > > > project X has json.org as a transitive dependency) is it
> sufficient
> > to
> > > > > exclude the transitive json.org dependency in the reference to
> > project
> > > > Y?
> > > > >
> > > > > Something like that:
> > > > >
> > > > > 
> > > > >   org.apache.hive.hcatalog
> > > > >   hcatalog-core
> > > > >   0.12.0
> > > > >   
> > > > > 
> > > > >   org.json
> > > > >   json
> > > > > 
> > > > >   
> > > > > 
> > > > >
> > > > > Thanks,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Thu, Nov 24, 2016 at 10:00 AM, Jochen Theodorou <
> > blackd...@gmx.org>
> > > > > wrote:
> > > > >
> > > > > > is that library able to deal with the jdk9 module system?
> > > > > >
> > > > > >
> > > > > > On 24.11.2016 02:16, James Bognar wrote:
> > > > > >
> > > > > >> Shameless plug for Apache Juneau that has a cleanroom
> > implementation
> > > > of
> > > > > a
> > > > > >> JSON serializer and parser in context of a common serialization
> > API
> > > > that
> > > > > >> includes a variety of serialization languages for POJOs.
> > > > > >>
> > > > > >> On Wed, Nov 23, 2016 at 8:10 PM Ted Dunning <
> > ted.dunn...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> The VP Legal for Apache has determined that the JSON processing
> > > > library
> > > > > >>> from json.org <https://github.com/

Re: Megh operator library

2016-11-28 Thread Amol Kekre

I am not sure where we are on this. As Apex community we need to take a
hard look before we reject code that passes license and normal pull request
requirements. A lot of Megh code is in production, and is stable. Is there
a reason why we cannot accomodate Megh code in a directory that clarifies
its origins. Assuming there is duplicates, the code can still be taken in a
directory that marks is as such.

Without this a lof customer custom code that they want to contribute to
Malhar will be stuck in "replace with ...". That will not happen as folks
do not change production code once it works and stabalized. If the word
"contrib" is an issue, I suggest we get a new name. HDHT etc. are in
production and it makes sense to let them reside in a directory (to be
named) in Malhar.

Thks
Amol


On Mon, Sep 26, 2016 at 10:22 PM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

> Added a section for flume based on the feedback.
>
> Thanks
>
> On Mon, Sep 26, 2016 at 8:51 AM, Pramod Immaneni <pra...@datatorrent.com>
> wrote:
>
> > Hi Thomas,
> >
> > My responses are inline
> >
> > On Sun, Sep 25, 2016 at 11:39 AM, Thomas Weise <thomas.we...@gmail.com>
> > wrote:
> >
> >> Thanks for putting it together. It looks like there are really only 2
> >> operators?
> >>
> >
> > There were others but looked like they were already good implementations
> > or alternatives for it in Malhar. For example, enrichment and deduper
> have
> > implementations already, for laggards operator looked like the concept is
> > already covered in the new windowing work.
> >
> >
> >>
> >> +1 for the Flume connector. It would be good to also look what has
> changed
> >> in Flume since it was written. It needs its own Maven module and
> >> documentation is also needed.
> >>
> >
> > Yes in the table in the document I have it going to its own module and
> > path. Will make a note in the document about checking against newer flume
> > versions and documentation.
> >
> >
> >> I don't agree with the proposed "as-is" move for the dimension compute
> >> operator into contrib. It does not belong there. Contrib is for new,
> >> incomplete work ("immature" and under the radar WRT CI etc.), with
> >> particular focus to provide an easier entry path for new contributors.
> >>
> >> I would like to see the following changes to dimension computation:
> >> * Replace HDHT with managed state (or spillable DS)
> >> * Move to org.apache.apex.malhar.lib.*
> >> * Documentation (your draft is a good start towards that), it also needs
> >> to
> >> cover query support.
> >>
> >> I think it is a very valuable operator that should be a first class
> >> citizen
> >> and the folks familiar with the operator and state management should
> take
> >> up the work to port it. Tim indicated he may be able to take it up.
> >>
> >> In the meantime, the operator can remain in the Megh repository under
> >> existing name and consumed from there.
> >>
> >
> > I thought it could eventually have its own module under Malhar but
> > suggested contrib as an intermediate location till any porting is
> > completed. I agree with the documentation, I just wrote up something
> quick
> > to highlight the operator, Tim has more detailed docs for it I think.
> Since
> > the operator(s) are readily usable in production applications, implement
> > quite a bit of functionality and provide valuable functionality, I am of
> > the opinion that we do the minimal now to make it available and parallely
> > start the work on porting some of the internal subsystems to newer
> > components.
> >
> > Thanks
> >
> >
> >>
> >> Thomas
> >>
> >> On Sat, Sep 24, 2016 at 12:29 PM, Pramod Immaneni <
> pra...@datatorrent.com
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Here is the initial proposal. Please go through it and you can comment
> >> > right on the document. Regarding the discussions around Dimensional
> >> > operators, there is a specific section for it and future plans. After
> >> the
> >> > comments are addressed, I can start with one of the components such as
> >> > flume and document the steps involved. Then others can take up the
> other
> >> > components and use the steps in a similar fashion.
> >> >
> >> > https://docs.google.com/document/d/1B

Re: StramEvents for AppDataTracker for debugging and monitoring.

2016-11-28 Thread Amol Kekre

Deepak,
This is great idea. Given that AppDataTracker is not part of Apex; dev@ may
not be a place to take up this thread.

Thks
Amol


On Mon, Nov 28, 2016 at 8:53 AM, Deepak Narkhede 
wrote:

> Dear Community,
>
> Is it worth for addition of StramEvents for AppDataTracker for debugging
> and monitoring ?
>
> Use case(s):
> 1) Debugging Operators and Containers while events kill/stop/start etc.
> 2) Monitoring the events and have some alerts or corrective measurements
> based on events.
> 3) With extending current transport might be able to push objects to third
> party or other services for event analysis like kafka etc.
>
> Solution Approach:
> 1) Planning to add listener similar to eventrecorder  which will generate
> the stram events. As existing eventrecorder(s) are currently are tied up
> with eventBus which are registered directly to WebSockets, so avoiding use
> as we don't want necessarily always push it through gateway.
> 2) Add functionality/handler to AppDataPush agent to handle stram events
> coming from new listener.
> 3) Push the Stram events with existing transport interface in form of JSON
> similar to stats.
> 4) Also we can add aggregators for these events.
>
> Extending  further if we add other transport to existing transport
> interface we can achieve Use case #3.
>
> Please let me know your feedback.
>
> --
> Thanks & Regards
>
> Deepak Narkhede
>

Re: JSON License and Apache Projects

2016-11-25 Thread Amol Kekre

Chinmay
+1, Do you want to drive :)

Thks
Amol

On Thu, Nov 24, 2016 at 10:02 PM, Chinmay Kolhatkar <chin...@apache.org>
wrote:

> Yes... That's the mail.. There are couple if related conversations can be
> seen here too:
> https://lists.apache.org/list.html?legal-disc...@apache.org
>
> I suggest we take a look at it and do the needful from our end too.
>
> -Chinmay.
>
>
> On Fri, Nov 25, 2016 at 10:15 AM, Amol Kekre <a...@datatorrent.com> wrote:
>
> > Chinmay,
> > Is this the thread you were looking for?
> >
> > Thks
> > Amol
> >
> > -- Forwarded message --
> > From: Ted Dunning <ted.dunn...@gmail.com>
> > Date: Thu, Nov 24, 2016 at 2:28 PM
> > Subject: Re: JSON License and Apache Projects
> > To: "gene...@incubator.apache.org" <gene...@incubator.apache.org>
> >
> >
> > Stephan,
> >
> > What you suggest should work (if you add another dependency to provide
> the
> > needed classes).
> >
> > You have to be careful, however, because your consumers may expect to get
> > the full json.org API.
> >
> > I would suggest that exclusions like this should only be used while your
> > direct dependency still has the dependency on json.org. When they fix
> it,
> > you can drop the exclusion and all will be good.
> >
> >
> >
> > On Thu, Nov 24, 2016 at 2:21 AM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > Just to be on the safe side:
> > >
> > > If project X depends on another project Y that uses json.org (and thus
> > > project X has json.org as a transitive dependency) is it sufficient to
> > > exclude the transitive json.org dependency in the reference to project
> > Y?
> > >
> > > Something like that:
> > >
> > > 
> > >   org.apache.hive.hcatalog
> > >   hcatalog-core
> > >   0.12.0
> > >   
> > > 
> > >   org.json
> > >   json
> > > 
> > >   
> > > 
> > >
> > > Thanks,
> > > Stephan
> > >
> > >
> > > On Thu, Nov 24, 2016 at 10:00 AM, Jochen Theodorou <blackd...@gmx.org>
> > > wrote:
> > >
> > > > is that library able to deal with the jdk9 module system?
> > > >
> > > >
> > > > On 24.11.2016 02:16, James Bognar wrote:
> > > >
> > > >> Shameless plug for Apache Juneau that has a cleanroom implementation
> > of
> > > a
> > > >> JSON serializer and parser in context of a common serialization API
> > that
> > > >> includes a variety of serialization languages for POJOs.
> > > >>
> > > >> On Wed, Nov 23, 2016 at 8:10 PM Ted Dunning <ted.dunn...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> The VP Legal for Apache has determined that the JSON processing
> > library
> > > >>> from json.org <https://github.com/stleary/JSON-java> is not usable
> > as
> > > a
> > > >>> dependency by Apache projects. This is because the license
> includes a
> > > >>> line
> > > >>> that places a field of use condition on downstream users in a way
> > that
> > > is
> > > >>> not compatible with Apache's license.
> > > >>>
> > > >>> This decision is, unfortunately, a change from the previous
> > situation.
> > > >>> While the current decision is correct, it would have been nice if
> we
> > > had
> > > >>> had this decision originally.
> > > >>>
> > > >>> As such, some existing projects may be impacted because they
> assumed
> > > that
> > > >>> the json.org dependency was OK to use.
> > > >>>
> > > >>> Incubator projects that are currently using the json.org library
> > have
> > > >>> several courses of action:
> > > >>>
> > > >>> 1) just drop it. Some projects like Storm have demos that use
> > twitter4j
> > > >>> which incorporates the problematic code. These demos aren't core
> and
> > > >>> could
> > > >>> just be dropped for a time.
> > > >>>
> > > >>> 2) help dependencies move away from problem code. I have sent a
> pull
> > > >>> request to twitter4 <https://github.com/yusuke/twitter4j/pull/254
&

Fwd: JSON License and Apache Projects

2016-11-24 Thread Amol Kekre

Chinmay,
Is this the thread you were looking for?

Thks
Amol

-- Forwarded message --
From: Ted Dunning 
Date: Thu, Nov 24, 2016 at 2:28 PM
Subject: Re: JSON License and Apache Projects
To: "gene...@incubator.apache.org" 


Stephan,

What you suggest should work (if you add another dependency to provide the
needed classes).

You have to be careful, however, because your consumers may expect to get
the full json.org API.

I would suggest that exclusions like this should only be used while your
direct dependency still has the dependency on json.org. When they fix it,
you can drop the exclusion and all will be good.



On Thu, Nov 24, 2016 at 2:21 AM, Stephan Ewen  wrote:

> Just to be on the safe side:
>
> If project X depends on another project Y that uses json.org (and thus
> project X has json.org as a transitive dependency) is it sufficient to
> exclude the transitive json.org dependency in the reference to project Y?
>
> Something like that:
>
> 
>   org.apache.hive.hcatalog
>   hcatalog-core
>   0.12.0
>   
> 
>   org.json
>   json
> 
>   
> 
>
> Thanks,
> Stephan
>
>
> On Thu, Nov 24, 2016 at 10:00 AM, Jochen Theodorou 
> wrote:
>
> > is that library able to deal with the jdk9 module system?
> >
> >
> > On 24.11.2016 02:16, James Bognar wrote:
> >
> >> Shameless plug for Apache Juneau that has a cleanroom implementation of
> a
> >> JSON serializer and parser in context of a common serialization API
that
> >> includes a variety of serialization languages for POJOs.
> >>
> >> On Wed, Nov 23, 2016 at 8:10 PM Ted Dunning 
> >> wrote:
> >>
> >> The VP Legal for Apache has determined that the JSON processing library
> >>> from json.org  is not usable as
> a
> >>> dependency by Apache projects. This is because the license includes a
> >>> line
> >>> that places a field of use condition on downstream users in a way that
> is
> >>> not compatible with Apache's license.
> >>>
> >>> This decision is, unfortunately, a change from the previous situation.
> >>> While the current decision is correct, it would have been nice if we
> had
> >>> had this decision originally.
> >>>
> >>> As such, some existing projects may be impacted because they assumed
> that
> >>> the json.org dependency was OK to use.
> >>>
> >>> Incubator projects that are currently using the json.org library have
> >>> several courses of action:
> >>>
> >>> 1) just drop it. Some projects like Storm have demos that use
twitter4j
> >>> which incorporates the problematic code. These demos aren't core and
> >>> could
> >>> just be dropped for a time.
> >>>
> >>> 2) help dependencies move away from problem code. I have sent a pull
> >>> request to twitter4 j,
> for
> >>> example, that eliminates the problem. If they accept the pull, then
all
> >>> would be good for the projects that use twitter4j (and thus json.org)
> >>>
> >>> 3) replace the json.org artifact with a compatible one that is open
> >>> source.
> >>> I have created and published an artifact based on clean-room Android
> code
> >>>  that replicates the most
> >>> important
> >>> parts of the json.org code. This code is compatible, but lacks some
> >>> coverage. It also could lead to jar hell if used unjudiciously because
> it
> >>> uses the org.json package. Shading and exclusion in a pom might help.
> Or
> >>> not. Go with caution here.
> >>>
> >>> 4) switch to safer alternatives such as Jackson. This requires code
> >>> changes, but is probably a good thing to do. This option is the one
> that
> >>> is
> >>> best in the long-term but is also the most expensive.
> >>>
> >>>
> >>> -- Forwarded message --
> >>> From: Jim Jagielski 
> >>> Date: Wed, Nov 23, 2016 at 6:10 AM
> >>> Subject: JSON License and Apache Projects
> >>> To: ASF Board 
> >>>
> >>>
> >>> (forwarded from legal-discuss@)
> >>>
> >>> As some of you may know, recently the JSON License has been
> >>> moved to Category X (https://www.apache.org/legal/resolved#category-x
> ).
> >>>
> >>> I understand that this has impacted some projects, especially
> >>> those in the midst of doing a release. I also understand that
> >>> up until now, really, there has been no real "outcry" over our
> >>> usage of it, especially from end-users and other consumers of
> >>> our projects which use it.
> >>>
> >>> As compelling as that is, the fact is that the JSON license
> >>> itself is not OSI approved and is therefore not, by definition,
> >>> an "Open Source license" and, as such, cannot be considered as
> >>> one which is acceptable as related to categories.
> >>>
> >>> Therefore, w/ my VP Legal hat on, I am making the following
> >>> statements:
> >>>
> >>>  o No new project, sub-project or codebase, which has not
> >>>used

Re: Proposing a new feature to persist logical and physical plan snapshots in HDFS

2016-11-23 Thread Amol Kekre

Persisted plan on DFS is good. I am +1 for it. This could be both of the
following

1. Attribute : If set, then upon change in plan persist to DFS
2. On demand

Thks
Amol


On Wed, Nov 23, 2016 at 4:15 PM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> Okay, but this “state” is gone after the app is “dead” isn’t that true?
> Also the reason for this enhancement is debuggability/troubleshooting of
> Apex apps so it is good to have separate explicit user visible files that
> contain the plan information instead of overloading the state for this
> purpose (in my opinion).
>
> In terms of on-demand, it sounds like a good idea - I didn’t think of it.
> But I would like to drill down the use cases. In most cases,
> logical/physical plan changes are spontaneous or rather internal to the app
> so an external entity making a REST call to save the plan on demand might
> not sync up with when the plan changes took place inside the app. So saving
> the plan JSON files on events described previously seems to be the most
> efficient thing to do (as discussed with @Ashwin Putta) but if there are
> use cases I think it is a good idea to do it on demand as well.
>
> On 11/23/16, 3:00 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
>
> Good idea. Stram does save state, and maybe a script that translates
> may
> work. But explicit plan saving is also a good idea. Could this be "on
> demand"? a rest call that writes out the plan(s) to specifid hdfs
> files?
>
> We could do both (write on any change/set call) and/or on-demand.
>
> Thks
> Amol
>
>
> On Wed, Nov 23, 2016 at 2:40 PM, Sanjay Pujare <san...@datatorrent.com
> >
> wrote:
>
> > To help Apex developers/users with debugging or troubleshooting
> “dead”
> > applications, I am proposing a new feature to persist logical and
> physical
> > plan snapshots in HDFS.
> >
> >
> >
> > Similar to how the Apex engine persists container data per
> application
> > attempt in HDFS as containers_NNN.json (where NNN is 1 for first app
> > attempt, 2 for the second app attempt and so on), we will create 2
> more
> > sets of files under the …/apps/{appId} directory for an application:
> >
> >
> >
> > logicalPlan_NNN_MMM.json
> >
> > physicalPlan_NNN_MMM.json
> >
> >
> >
> > where NNN stands for the app attempt index (similar to NNN above 1,
> 2, 3
> > and so on) and MMM is a running index starting at 1 which stands for
> a
> > snapshot within an app attempt. Note that a logical or physical plan
> may
> > change within an app attempt for any number of reasons.
> >
> >
> >
> > The StreamingContainerManager class maintains the current
> logical/physical
> > plans in the “plan” member variable. New methods will be added in
> > StreamingContainerManager to save the logical or physical plan as
> JSON
> > representations in the app directory (as described above). The logic
> is
> > similar to com.datatorrent.stram.webapp.StramWebServices.
> getLogicalPlan(String)
> > and com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan()
> used
> > inside the Stram Web service. There will be running indexes in
> > StreamingContainerManager to keep track of MMM for the logical plan
> and
> > physical plan. The appropriate save method will be called on the
> occurrence
> > of any event that updates the logical or physical plan for example:
> >
> >
> >
> > inside com.datatorrent.stram.StreamingContainerManager.
> > LogicalPlanChangeRunnable.call()  for logical plan change event
> >
> >
> >
> > inside com.datatorrent.stram.plan.physical.PhysicalPlan.
> redoPartitions(PMapping,
> > String) for physical plan change event (i.e. redoing partitioning)
> >
> >
> >
> > Once these files are created, any user or a tool (such as the Apex
> CLI or
> > the DT Gateway) can look up these files for
> troubleshooting/researching of
> > “dead” applications and significant events in their lifetime in
> terms of
> > logical or physical plan changes. Pls send me your feedback.
> >
> >
> >
> > Sanjay
> >
> >
> >
> >
>
>
>
>

Re: Proposing a new feature to persist logical and physical plan snapshots in HDFS

2016-11-23 Thread Amol Kekre

Good idea. Stram does save state, and maybe a script that translates may
work. But explicit plan saving is also a good idea. Could this be "on
demand"? a rest call that writes out the plan(s) to specifid hdfs files?

We could do both (write on any change/set call) and/or on-demand.

Thks
Amol


On Wed, Nov 23, 2016 at 2:40 PM, Sanjay Pujare 
wrote:

> To help Apex developers/users with debugging or troubleshooting “dead”
> applications, I am proposing a new feature to persist logical and physical
> plan snapshots in HDFS.
>
>
>
> Similar to how the Apex engine persists container data per application
> attempt in HDFS as containers_NNN.json (where NNN is 1 for first app
> attempt, 2 for the second app attempt and so on), we will create 2 more
> sets of files under the …/apps/{appId} directory for an application:
>
>
>
> logicalPlan_NNN_MMM.json
>
> physicalPlan_NNN_MMM.json
>
>
>
> where NNN stands for the app attempt index (similar to NNN above 1, 2, 3
> and so on) and MMM is a running index starting at 1 which stands for a
> snapshot within an app attempt. Note that a logical or physical plan may
> change within an app attempt for any number of reasons.
>
>
>
> The StreamingContainerManager class maintains the current logical/physical
> plans in the “plan” member variable. New methods will be added in
> StreamingContainerManager to save the logical or physical plan as JSON
> representations in the app directory (as described above). The logic is
> similar to 
> com.datatorrent.stram.webapp.StramWebServices.getLogicalPlan(String)
> and com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan() used
> inside the Stram Web service. There will be running indexes in
> StreamingContainerManager to keep track of MMM for the logical plan and
> physical plan. The appropriate save method will be called on the occurrence
> of any event that updates the logical or physical plan for example:
>
>
>
> inside com.datatorrent.stram.StreamingContainerManager.
> LogicalPlanChangeRunnable.call()  for logical plan change event
>
>
>
> inside 
> com.datatorrent.stram.plan.physical.PhysicalPlan.redoPartitions(PMapping,
> String) for physical plan change event (i.e. redoing partitioning)
>
>
>
> Once these files are created, any user or a tool (such as the Apex CLI or
> the DT Gateway) can look up these files for troubleshooting/researching of
> “dead” applications and significant events in their lifetime in terms of
> logical or physical plan changes. Pls send me your feedback.
>
>
>
> Sanjay
>
>
>
>

Re: Visitor API for DAG

2016-11-17 Thread amol kekre

+1. Opening up the API for users to put in their own code is good. In
general we should enable users to register their code in a lot of scenerios.

Thks
Amol

On Thu, Nov 17, 2016 at 9:06 AM, Tushar Gosavi 
wrote:

> Yes, It could happen after current DAG validation and before the
> application master is launched.
>
> - Tushar.
>
>
> On Thu, Nov 17, 2016 at 8:32 PM, Munagala Ramanath 
> wrote:
> > When would the visits happen ? Just before normal validation ?
> >
> > Ram
> >
> > On Wed, Nov 16, 2016 at 9:50 PM, Tushar Gosavi 
> wrote:
> >
> >> Hi All,
> >>
> >> How about adding visitor like API for DAG in Apex, and an api to
> >> register visitor for the DAG.
> >> Possible use cases are
> >> -  Validator visitor which could validate the dag
> >> -  Visitor to inject properties/attribute in the operator/streams from
> >> some external sources.
> >> -  Platform does not support validation of individual operators.
> >> developer could write a validator visitor which would call validate
> >> function of operator if it implements Validator interface.
> >> - generate output schema based on operator config and input schema,
> >> and set the schema on output stream.
> >>
> >> Sample API :
> >>
> >> dag.registerVisitor(DAGVisitor visitor);
> >>
> >> Call order of visitorFunctions.
> >> - preVisitDAG(Attributes) // dag attributes
> >>   for all operators
> >>   - visitOperator(OperatorMeta meta) // access to operator, name,
> >> attributes, properties
> >>  ports
> >>   - visitStream(StreamMeta meta) // access to
> >> stream/name/attributes/properties/ports
> >> - postVisitDAG()
> >>
> >> Regards,
> >> -Tushar.
> >>
>

Re: Proposal for apex/malhar extensions

2016-11-16 Thread Amol Kekre

+1

Thks
Amol

On Wed, Nov 16, 2016 at 5:37 AM, AJAY GUPTA  wrote:

> +1
> This is a good idea.
>
> Ajay
>
> On Wed, Nov 16, 2016 at 4:47 PM, Chinmay Kolhatkar 
> wrote:
>
> > Dear Community,
> >
> > This is in relation to malhar cleanup work that is ongoing.
> >
> > In one of the talks during Apache BigData Europe, I got to know about
> > Spark-Packages (https://spark-packages.org/) (I believe lot of you must
> be
> > aware of it).
> > Spark package is basically functionality over and above and using Spark
> > core functionality. The spark packages can initially present in someone's
> > public repository and one could register that with
> > https://spark-packages.org/ and later on as it matures and finds more
> use,
> > it gets consumed in mainstream Spark repository and releases.
> >
> > I found this idea quite interesting to keep our apex-malhar releases
> > cleaner.
> >
> > One could have extension to apex-malhar in their own repository and just
> > register itself with Apache Apex. As it matures and find more and more
> use
> > we can consume that in mainstream releases.
> > Advantages to this are multiple:
> > 1. The entry point for registering extensions with Apache Apex can be
> > minimal. This way we get more indirect contributions.
> > 2. Faster way to add more feature in the project.
> > 3. We keep our releases cleaner.
> > 4. One could progress on feature-set faster balancing both Apache Way as
> > well as their own Enterprise Interests.
> >
> > Please share your thoughts on this.
> >
> > Thanks,
> > Chinmay.
> >
>

Re: Malhar release 3.6

2016-10-27 Thread Amol Kekre

+1

Thks
Amol


On Wed, Oct 26, 2016 at 11:22 PM, Milind Barve  wrote:

> +1
>
> On Thu, Oct 27, 2016 at 11:21 AM, Chinmay Kolhatkar 
> wrote:
>
> > +1.
> >
> > On Thu, Oct 27, 2016 at 1:41 AM, Thomas Weise  wrote:
> >
> > > Hi,
> > >
> > > I'm proposing another release of Malhar in November. There are 49
> issues
> > > marked for the release, including important bug fixes, new
> documentation,
> > > SQL support and the work for windowed operator state management:
> > >
> > > https://issues.apache.org/jira/issues/?jql=fixVersion%
> > > 20%3D%203.6.0%20AND%20project%20%3D%20APEXMALHAR%20ORDER%
> > > 20BY%20status%20ASC
> > >
> > > Currently there is at least one blocker, the join operator is broken
> > after
> > > change in managed state. It also affects the SQL feature.
> > >
> > > Thanks,
> > > Thomas
> > >
> >
>
>
>
> --
> ~Milind bee at gee mail dot com
>

Re: can operators emit on a different from the operator itself thread?

2016-10-13 Thread Amol Kekre

Vlad,
I agree that the check should be ON by default. Ability to turn it off for
entire app is fine, per port is not needed.

Thks
Amol

On Wed, Oct 12, 2016 at 10:34 PM, Tushar Gosavi <tus...@datatorrent.com>
wrote:

> +1 for on by default and ability to turn it off for entire application.
>
> - Tushar.
>
>
> On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi <p...@apache.org>
> wrote:
> > +1 for ON by default
> > +1 for disabling it for all output ports
> >
> > With the kind of issues we have observed being faced by developers in the
> > past, I strongly believe this check should be ON by default.
> > However at the same time I feel, it shall be one-time check, mostly in
> > Development phase and before going into Production. Having said that, if
> > disabling it at application level i.e. for all operators and their
> > respective output ports would it make implementation simpler, then that
> can
> > be targeted first. Thoughts?
> >
> > --prad
> >
> > On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
> >
> >> I run jmh test and check takes 1ns on my MacBook Pro and on the lab
> >> machine. This corresponds to 3% degradation at 30 million
> events/second. I
> >> think we can move forward with the check ON by default. Do we need an
> >> ability to turn OFF check for a specific operator and/or port? My
> thought
> >> is that such ability is not necessary and it should be OK to disable
> check
> >> for all output ports in an application.
> >>
> >> Vlad
> >>
> >>
> >> On 10/12/16 11:56, Amol Kekre wrote:
> >>
> >>> In case there turns out to be a penalty, we can introduce a "check for
> >>> thread affinity" mode that triggers this check. My initial thought is
> to
> >>> make this check ON by default. We should wait till benchmarks are
> >>> available
> >>> before discussing adding this check.
> >>>
> >>> Thks
> >>> Amol
> >>>
> >>>
> >>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare <
> san...@datatorrent.com>
> >>> wrote:
> >>>
> >>> A JIRA has been created for adding this thread affinity check
> >>>> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
> >>>> enhancement in a branch
> >>>> https://github.com/sanjaypujare/apex-core/tree/malhar-510.
> >>>> thread_affinity
> >>>> and I have been benchmarking the performance with this change. I will
> be
> >>>> publishing the results in the above JIRA where we can discuss them and
> >>>> hopefully agree on merging this change.
> >>>>
> >>>> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare <
> san...@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> You are right, I was subconsciously thinking about the THREAD_LOCAL
> case
> >>>>> with a single container and a simple DAG and in that case Vlad’s
> >>>>>
> >>>> assumption
> >>>>
> >>>>> might not be valid but may be it is.
> >>>>>
> >>>>> On 8/11/16, 11:47 AM, "Munagala Ramanath" <r...@datatorrent.com>
> wrote:
> >>>>>
> >>>>>  If I understand Vlad correctly, what he is saying is that each
> >>>>>
> >>>> operator
> >>>>
> >>>>>  saves currentThread in
> >>>>>  its own setup() and checks it in its own output methods. The
> >>>>> threads
> >>>>>
> >>>> in
> >>>>
> >>>>>  different operators are
> >>>>>  running potentially on different nodes and/or processes and
> there
> >>>>>
> >>>> will
> >>>>
> >>>>> be
> >>>>>  no connection between them.
> >>>>>
> >>>>>  Ram
> >>>>>
> >>>>>  On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> >>>>> san...@datatorrent.com>
> >>>>>  wrote:
> >>>>>
> >>>>>  > Name check is expensive, agreed, but there isn’t anything else
> >>>>> currently.
> >>>>>  > Ideally the stram engine (considering that it is an engine
> >>>>>
> &g

Re: can operators emit on a different from the operator itself thread?

2016-10-12 Thread Amol Kekre

In case there turns out to be a penalty, we can introduce a "check for
thread affinity" mode that triggers this check. My initial thought is to
make this check ON by default. We should wait till benchmarks are available
before discussing adding this check.

Thks
Amol


On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> A JIRA has been created for adding this thread affinity check
> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
> enhancement in a branch
> https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity
> and I have been benchmarking the performance with this change. I will be
> publishing the results in the above JIRA where we can discuss them and
> hopefully agree on merging this change.
>
> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare <san...@datatorrent.com>
> wrote:
>
> > You are right, I was subconsciously thinking about the THREAD_LOCAL case
> > with a single container and a simple DAG and in that case Vlad’s
> assumption
> > might not be valid but may be it is.
> >
> > On 8/11/16, 11:47 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
> >
> > If I understand Vlad correctly, what he is saying is that each
> operator
> > saves currentThread in
> > its own setup() and checks it in its own output methods. The threads
> in
> > different operators are
> > running potentially on different nodes and/or processes and there
> will
> > be
> > no connection between them.
> >
> > Ram
> >
> > On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> > wrote:
> >
> > > Name check is expensive, agreed, but there isn’t anything else
> > currently.
> > > Ideally the stram engine (considering that it is an engine
> providing
> > > resources like threads etc) should use a ThreadFactory or a
> > ThreadGroup to
> > > create operator threads so identification and adding functionality
> is
> > > easier.
> > >
> > > The idea of checking for the same thread between setup() and emit()
> > won’t
> > > work because the emit() check will have to be in the Sink hierarchy
> > and
> > > AFAIK a Sink object doesn’t have access to the corresponding
> > operator,
> > > right? Another more fundamental problem probably is that these
> > threads
> > > don’t have to match. The emit() for any operator (or rather a Sink
> > related
> > > to an operator) is ultimately triggered by an emitTuple() on the
> > topmost
> > > input operator in that path which happens in that input operator’s
> > thread
> > > which doesn’t have to match the thread calling setup() in the
> > downstream
> > > operators, right?
> > >
> > >
> > > On 8/11/16, 10:59 AM, "Vlad Rozov" <v.ro...@datatorrent.com>
> wrote:
> > >
> > > Name verification is too expensive, it will be sufficient to
> > store
> > > currentThread during setup() and verify that it is the same
> > during
> > > emit.
> > > Checks should be supported not only for DefaultOutputPort, so
> we
> > may
> > > have it implemented in various Sinks.
> > >
> > > Vlad
> > >
> > > On 8/11/16 10:21, Sanjay Pujare wrote:
> > > > Thinking more about this – all of the “operator” threads are
> > created
> > > by the Stram engine with appropriate names. So we can put checks in
> > the
> > > DefaultOutputPort.emit() or in the various implementations of
> > Sink.put()
> > > that the current-thread is one created by the Stram engine (by
> > verifying
> > > the name).
> > > >
> > > > We can even use a special Thread object for operator threads
> > so the
> > > above detection is easier.
> > > >
> > > >
> > > >
> > > > On 8/10/16, 6:11 PM, "Amol Kekre" <a...@datatorrent.com>
> > wrote:
> > > >
> > > >  +1 on debug proposal. Even if tuples lands up within the
> > > window, it breaks
> > > >  all guarantees. A rerun (after restart from a
> checkpoint)
> > can
> > > have tuples
> > > >  in different windows from

Re: google analytics for apex website?

2016-10-11 Thread Amol Kekre

Yes, it should be accessible to project. How do we restrict it to dev@
only? Is there a convention to share this data open outside dev@?

Thks
Amol


On Mon, Oct 10, 2016 at 6:59 PM, Thomas Weise  wrote:

> +1 this will be very valuable
>
> How would the data be accessible to the project?
>
>
> On Mon, Oct 10, 2016 at 6:19 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
> > I do not see a UA tracking code on the apex.apache.org webpages. Can we
> > add
> > it? Google analytics is very valuable in learning about the website
> traffic
> > and what users are looking for.
> >
> > I looked around other apache projects and they do have it. Spark, storm,
> > flink etc.
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>

Re: Kudu store operators

2016-10-02 Thread Amol Kekre

Ananth,
This would be great to have. +1

Thks
Amol

On Sun, Oct 2, 2016 at 8:38 AM, Munagala Ramanath 
wrote:

> +1
>
> Kudu looks impressive from the overview, though it seems to still be
> maturing.
>
> Ram
>
>
> On Sat, Oct 1, 2016 at 11:42 PM, ananth  wrote:
>
> > Hello All,
> >
> > I was wondering if it would be worthwhile for the community to consider
> > support for Apache Kudu as a store ( as a contrib operator inside Apache
> > Malhar ) .
> >
> > Here are some benefits I see:
> >
> > 1. Kudu is just declared 1.0 and has just been declared production ready.
> > 2. Kudu as a store might a good a fit for many architectures in the
> >years to come because of its capabilities to provide mutability of
> >data ( unlike HDFS ) and optimized storage formats for scans.
> > 3. It seems to also withstand high-throughput write patterns which
> >makes it a stable sink for Apex workflows which operate at very high
> >volumes.
> >
> >
> > Here are some links
> >
> >  *  From the recent Strata conference
> >https://kudu.apache.org/2016/09/26/strata-nyc-kudu-talks.html
> >  * https://kudu.apache.org/overview.html
> >
> > I can implement this operator if the community feels it is worth adding
> it
> > to our code base. If so, could someone please assign the JIRA to me. I
> have
> > created this JIRA to track this : https://issues.apache.org/jira
> > /browse/APEXMALHAR-2278
> >
> >
> > Regards,
> >
> > Ananth
> >
> >
>

Re: checkpoint statistics

2016-09-24 Thread Amol Kekre

+1. Very important stat for deciding a crucial question -> "Whether to
checkpoint an operator?". It affects SLA, design, ...

Thks
Amol


On Sat, Sep 24, 2016 at 10:01 AM, Vlad Rozov 
wrote:

> IMO, it may be useful to provide checkpoint statistics for example, total
> size of checkpoint for particular window or average size of checkpoints for
> a particular operator. Also, how long it takes to write checkpoints to
> storage.
>
> Thank you,
>
> Vlad
>

Re: example applications in malhar

2016-09-06 Thread Amol Kekre

Good idea to consolidate them into Malhar. We should bring in as many from
this gitHub as possible.

Thks
Amol


On Tue, Sep 6, 2016 at 6:02 PM, Thomas Weise  wrote:

> I'm also for consolidating these different example locations. We should
> also look if all of it is still relevant.
>
> The stuff from the DT repository needs to be brought into shape wrt
> licensing, checkstyle, CI support etc.
>
>
> On Tue, Sep 6, 2016 at 4:34 PM, Pramod Immaneni 
> wrote:
>
> > Sounds like a good idea. How about merging demos with apps as well?
> >
> > On Tue, Sep 6, 2016 at 4:30 PM, Ashwin Chandra Putta <
> > ashwinchand...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > We have a lot of examples for apex malhar operators in the following
> > > repository which resides outside of malhar.
> > > https://github.com/DataTorrent/examples/tree/master/tutorials
> > >
> > > Now that it has grown quite a bit, does it make sense to bring some of
> > the
> > > most common examples to malhar repository? Probably under apps
> directory?
> > >
> > > That way folks looking at malhar repository will have some samples to
> > look
> > > at without having to search elsewhere.
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>

Re: anti-affinity - parameter to control the containers of an operator on a node

2016-08-12 Thread Amol Kekre

Eventually in long run Yarn (or other distributed OS) should handle
isolation for all resources. Currently I/O is not isolated by Yarn, and
will need Apex to help out for a year or two. Secondly if Apex works on
another OS, we will also need to check if isolation is available for CPU,
Memory. Overall IMHO this feature though temporary, may exist longer than
we anticipated.

Thks
Amol


On Fri, Aug 12, 2016 at 12:04 PM, Venkatesh Kottapalli <
venkat...@datatorrent.com> wrote:

> Yes Thomas.
>
> YARN allocates based on “availability” and it also depends on the other
> jobs running in the cluster.
>
> As an example, in a 20 node cluster with 64 partitions of Operator A, I
> want to set a rule saying - not more than 4 containers of Operator A should
> be deployed on the same node because it is processing intensive. Is this
> possible with the current versions?
>
> It is possible that there will be other jobs running in the cluster and 6
> to 7 containers of the operator A get deployed on the same node leading to
> a CPU utilization of 90% on that node while other nodes have cpu
> utilization less than 20 which is not optimal. Instead the job can
> wait/preempt for the resources to get allocated as per the configuration
> above.
>
> I think it is good to provide handle to the users to configure such things
> as they might decide depending on the job priorities. Let me know if I am
> missing something here.
>
> -Venkatesh.
>
>
> > On Aug 12, 2016, at 11:15 AM, Thomas Weise 
> wrote:
> >
> > Venky,
> >
> > Please think about this in terms of resources ("capable of handling").
> YARN
> > controls memory and CPU and you define how much memory and CPU your
> > operator needs. The scheduler uses that information to find a suitable
> node
> > and it is already clear how many containers of an operator fit on a node.
> >
> > Maybe you are thinking of another resource that is not managed by YARN?
> >
> > Thomas
> >
> > On Wed, Aug 10, 2016 at 11:27 AM, Venkatesh Kottapalli <
> > venkat...@datatorrent.com> wrote:
> >
> >> Hi team,
> >>
> >> On the anti-affinity rules while deploying containers, do we have a
> >> feature which can control the number of containers of the same operator
> >> that get deployed on the same node. If the environment is capable of
> >> handling, then this will be a good feature to have as it is possible
> that
> >> certain operators could be resource hungry and this would distribute the
> >> load uniformly on all the nodes.
> >>
> >> Please share your thoughts on this.
> >>
> >> -Venkatesh.
>
>

Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre

+1 on debug proposal. Even if tuples lands up within the window, it breaks
all guarantees. A rerun (after restart from a checkpoint) can have tuples
in different windows from this thread. A separate thread simply exposes
users to unwarranted risks.

Thks
Amol


On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov  wrote:

> Tuples emitted between end and begin windows is only one of possible
> behaviors that emitting tuples on a separate from the operator thread may
> introduce. It will be good to have both checks in place at run-time and if
> checking for the operator thread for every emitted tuple is too expensive,
> we may have it enabled only in DEBUG or mode with more checks in place.
>
> Vlad
>
>
> Sanjay just reminded me of my typo -> I meant between end_window and
>> start_window :)
>>
>> Thks
>> Amol
>>
>> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
>> wrote:
>>
>> If the goal is to do this validation through static analysis of operator
>>> code, I guess it is possible but is going to be non-trivial. And there
>>> could be false positives and false negatives.
>>>
>>> Also I suppose this discussion applies to processor operators (those
>>> having both in and out ports) so Ram’s example of JdbcPollInputOperator
>>> may
>>> not be applicable here?
>>>
>>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
>>> wrote:
>>>
>>>  In a separate thread I mean.
>>>
>>>  Regards,
>>>  Ashwin.
>>>
>>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
>>>  ashwinchand...@gmail.com> wrote:
>>>
>>>  > + dev@apex.apache.org
>>>  > - us...@apex.apache.org
>>>  >
>>>  > This is one of those best practices that we learn by experience
>>> during
>>>  > operator development. It will save a lot of time during operator
>>>  > development if we can catch and throw validation error when
>>> someone
>>> emits
>>>  > tuples in a non separate thread.
>>>  >
>>>  > Regards,
>>>  > Ashwin
>>>  >
>>>  > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
>>> r...@datatorrent.com>
>>>  > wrote:
>>>  >
>>>  >> For cases where use of a different thread is needed, it can write
>>> tuples
>>>  >> to a queue from where the operator thread pulls them --
>>>  >> JdbcPollInputOperator in Malhar has an example.
>>>  >>
>>>  >> Ram
>>>  >>
>>>  >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
>>> hsy...@gmail.com
>>>  >> wrote:
>>>  >>
>>>  >>> Hey Vlad,
>>>  >>>
>>>  >>> Thanks for bringing this up. Is there an easy way to detect
>>> unexpected
>>>  >>> use of emit method without hurt the performance. Or at least if
>>> we
>>> can
>>>  >>> detect this in debug mode.
>>>  >>>
>>>  >>> Regards,
>>>  >>> Siyuan
>>>  >>>
>>>  >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
>>> v.ro...@datatorrent.com>
>>>  >>> wrote:
>>>  >>>
>>>   The short answer is no, creating worker thread to emit tuples
>>> is
>>> not
>>>   supported by Apex and will lead to an undefined behavior.
>>> Operators in Apex
>>>   have strong thread affinity and all interaction with the
>>> platform
>>> must
>>>   happen on the operator thread.
>>>  
>>>   Vlad
>>>  
>>>  >>>
>>>  >>>
>>>  >>
>>>  >
>>>  >
>>>  > --
>>>  >
>>>  > Regards,
>>>  > Ashwin.
>>>  >
>>>
>>>
>>>
>>>  --
>>>
>>>  Regards,
>>>  Ashwin.
>>>
>>>
>>>
>>>
>>>
>

Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre

Sanjay just reminded me of my typo -> I meant between end_window and
start_window :)

Thks
Amol

On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
wrote:

> If the goal is to do this validation through static analysis of operator
> code, I guess it is possible but is going to be non-trivial. And there
> could be false positives and false negatives.
>
> Also I suppose this discussion applies to processor operators (those
> having both in and out ports) so Ram’s example of JdbcPollInputOperator may
> not be applicable here?
>
> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
> wrote:
>
> In a separate thread I mean.
>
> Regards,
> Ashwin.
>
> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
> > + dev@apex.apache.org
> > - us...@apex.apache.org
> >
> > This is one of those best practices that we learn by experience
> during
> > operator development. It will save a lot of time during operator
> > development if we can catch and throw validation error when someone
> emits
> > tuples in a non separate thread.
> >
> > Regards,
> > Ashwin
> >
> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
> r...@datatorrent.com>
> > wrote:
> >
> >> For cases where use of a different thread is needed, it can write
> tuples
> >> to a queue from where the operator thread pulls them --
> >> JdbcPollInputOperator in Malhar has an example.
> >>
> >> Ram
> >>
> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com  >
> >> wrote:
> >>
> >>> Hey Vlad,
> >>>
> >>> Thanks for bringing this up. Is there an easy way to detect
> unexpected
> >>> use of emit method without hurt the performance. Or at least if we
> can
> >>> detect this in debug mode.
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
> v.ro...@datatorrent.com>
> >>> wrote:
> >>>
>  The short answer is no, creating worker thread to emit tuples is
> not
>  supported by Apex and will lead to an undefined behavior.
> Operators in Apex
>  have strong thread affinity and all interaction with the platform
> must
>  happen on the operator thread.
> 
>  Vlad
> 
> >>>
> >>>
> >>
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>
>
>
>

Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre

Send too soon. A quicker way would be to catch emit happening between
start_window and end_window and flag an error. Catching "another thread"
for every tuple may have a huge performance hit.

Thks
Amol


On Wed, Aug 10, 2016 at 2:31 PM, Amol Kekre <a...@datatorrent.com> wrote:

>
> Currently user can code it that way. IMHO Apex should catch this and flag
> error.
>
> Thks
> Amol
>
>
> On Wed, Aug 10, 2016 at 2:04 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
>> In a separate thread I mean.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
>> ashwinchand...@gmail.com> wrote:
>>
>> > + dev@apex.apache.org
>> > - us...@apex.apache.org
>> >
>> > This is one of those best practices that we learn by experience during
>> > operator development. It will save a lot of time during operator
>> > development if we can catch and throw validation error when someone
>> emits
>> > tuples in a non separate thread.
>> >
>> > Regards,
>> > Ashwin
>> >
>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <r...@datatorrent.com
>> >
>> > wrote:
>> >
>> >> For cases where use of a different thread is needed, it can write
>> tuples
>> >> to a queue from where the operator thread pulls them --
>> >> JdbcPollInputOperator in Malhar has an example.
>> >>
>> >> Ram
>> >>
>> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <hsy...@gmail.com>
>> >> wrote:
>> >>
>> >>> Hey Vlad,
>> >>>
>> >>> Thanks for bringing this up. Is there an easy way to detect unexpected
>> >>> use of emit method without hurt the performance. Or at least if we can
>> >>> detect this in debug mode.
>> >>>
>> >>> Regards,
>> >>> Siyuan
>> >>>
>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <v.ro...@datatorrent.com
>> >
>> >>> wrote:
>> >>>
>> >>>> The short answer is no, creating worker thread to emit tuples is not
>> >>>> supported by Apex and will lead to an undefined behavior. Operators
>> in Apex
>> >>>> have strong thread affinity and all interaction with the platform
>> must
>> >>>> happen on the operator thread.
>> >>>>
>> >>>> Vlad
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>> >
>> > --
>> >
>> > Regards,
>> > Ashwin.
>> >
>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre

Currently user can code it that way. IMHO Apex should catch this and flag
error.

Thks
Amol


On Wed, Aug 10, 2016 at 2:04 PM, Ashwin Chandra Putta <
ashwinchand...@gmail.com> wrote:

> In a separate thread I mean.
>
> Regards,
> Ashwin.
>
> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
> > + dev@apex.apache.org
> > - us...@apex.apache.org
> >
> > This is one of those best practices that we learn by experience during
> > operator development. It will save a lot of time during operator
> > development if we can catch and throw validation error when someone emits
> > tuples in a non separate thread.
> >
> > Regards,
> > Ashwin
> >
> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath 
> > wrote:
> >
> >> For cases where use of a different thread is needed, it can write tuples
> >> to a queue from where the operator thread pulls them --
> >> JdbcPollInputOperator in Malhar has an example.
> >>
> >> Ram
> >>
> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com 
> >> wrote:
> >>
> >>> Hey Vlad,
> >>>
> >>> Thanks for bringing this up. Is there an easy way to detect unexpected
> >>> use of emit method without hurt the performance. Or at least if we can
> >>> detect this in debug mode.
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov 
> >>> wrote:
> >>>
>  The short answer is no, creating worker thread to emit tuples is not
>  supported by Apex and will lead to an undefined behavior. Operators
> in Apex
>  have strong thread affinity and all interaction with the platform must
>  happen on the operator thread.
> 
>  Vlad
> 
> >>>
> >>>
> >>
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: [ANNOUNCE] New Apache Apex Committer: Devendra Tagare

2016-08-10 Thread Amol Kekre

Dev,
Welcome aboard

Thks
Amol


On Wed, Aug 10, 2016 at 1:13 PM, Siyuan Hua  wrote:

> Welcome, Devendra!
>
> On Wed, Aug 10, 2016 at 12:28 PM, Thomas Weise  wrote:
>
> > The Project Management Committee (PMC) for Apache Apex has asked Devendra
> > Tagare to become a committer and we are pleased to announce that he has
> > accepted.
> >
> > Devendra has been contributing to Apex for several months now, for
> example
> > the Avro support and JDBC poll. He also did a few Apex meetup
> presentations
> > and developed sample applications.
> >
> > Welcome, Devendra, and congratulations!
> > Thomas, for the Apache Apex PMC.
> >
>

Re: anti-affinity - parameter to control the containers of an operator on a node

2016-08-10 Thread Amol Kekre

Good idea.

Thks
Amol


On Wed, Aug 10, 2016 at 11:27 AM, Venkatesh Kottapalli <
venkat...@datatorrent.com> wrote:

> Hi team,
>
> On the anti-affinity rules while deploying containers, do we have a
> feature which can control the number of containers of the same operator
> that get deployed on the same node. If the environment is capable of
> handling, then this will be a good feature to have as it is possible that
> certain operators could be resource hungry and this would distribute the
> load uniformly on all the nodes.
>
> Please share your thoughts on this.
>
> -Venkatesh.

Re: empty operator/stream/module names

2016-08-05 Thread Amol Kekre

Agreed, we either do this such that we handle all corner cases of simply
disallow empty string. I am now inclined to simply disallow empty string as
ramification of characters like "@" could be bad.

Thks
Amol


On Fri, Aug 5, 2016 at 9:37 AM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> Introducing "invalid" characters into user provided names will lead to
> - possible incompatibility with the existing applications as currently "@"
> and other special characters are considered to be valid
> - confusion with end user, why it is not possible to reuse system
> generated name
>
> I am more inclined to disallowing null and empty strings and not providing
> system generated names. As long as DAG is created by a developer, DAG is
> likely to have only handful number of operators and it is not a big deal to
> provide meaningful names. Once we talk about automatic DAG generation by
> higher level API or an execution plan generator, it will be responsibility
> of the corresponding system to generate meaningful names.
>
> Vlad
>
>
> On 8/5/16 09:04, Amol Kekre wrote:
>
>> Pradeep,
>> The clash is if an user explicitly names the operator later with same
>> formula we have (say a typo by user). This is a name scoping issue, and
>> one
>> way to solve it is by Stram using a character/delimiter in auto generated
>> name, that is explicitly disallowed in user specified names.
>>
>> Anyone knows how compilers do name mangling for function signatures? Same
>> issue. I am guessing it is "disallowed delimiter/character" method (for eg
>> "@").
>>
>> Thks
>> Amol
>>
>> On Fri, Aug 5, 2016 at 12:04 AM, Pradeep A. Dalvi <p...@apache.org>
>> wrote:
>>
>> Just curious. If we choose an approach where system generated name for
>>> operator/module =~ operator/module class name + some identifier (index of
>>> operator in DAG), how difficult would that be?
>>>
>>> As it is done elsewhere, we certainly would have to pick user defined
>>> names
>>> first and then work on system generated names.
>>>
>>> Also another possible approach could be of having system-generated
>>> identifiers and (user definable) names. If name is not given by user,
>>> system generated identifier would be used as name.
>>>
>>> --prad
>>>
>>> On Thu, Aug 4, 2016 at 11:50 PM, Tushar Gosavi <tus...@datatorrent.com>
>>> wrote:
>>>
>>> System generated names can also be problematic. User given name
>>>> collides with system generated names. we can not generate the name
>>>> when component is added
>>>> in the DAG. we will have to wait till all components are added then
>>>> generate the names. I am -0 on system generated names. Providing a
>>>> name to operator/stream/module is
>>>> not much of an effort. +1 on not supporting null/empty
>>>> operator/stream/module names.
>>>>
>>>> -Tushar.
>>>>
>>>>
>>>> On Fri, Aug 5, 2016 at 12:06 PM, Yogi Devendra <yogideven...@apache.org
>>>> >
>>>> wrote:
>>>>
>>>>> 1. I am not clear how end user will configure properties for
>>>>>
>>>> operators
>>>
>>>> with system generated names.
>>>>> 2. If we are going for system generated names we should make sure
>>>>>
>>>> that
>>>
>>>> names are deterministic and consistent. An operator should get same
>>>>>
>>>> system
>>>>
>>>>> generated name for multiple runs.
>>>>> 3. System generated names should be human readable and reflect
>>>>> underlying operator. For example, name should be something like
>>>>> HDFSOutput_019 rather than Operator_019.
>>>>>
>>>>>
>>>>>
>>>>> ~ Yogi
>>>>>
>>>>> On 5 August 2016 at 10:47, Tushar Gosavi <tus...@datatorrent.com>
>>>>>
>>>> wrote:
>>>
>>>> When we need to change plan dynamically through dtcli, we need name to
>>>>>> delete or attach to existing operator/port. I am fine with using
>>>>>> system generated name when user do not provide name while adding
>>>>>> operator/module/stream.
>>>>>>
>>>>>> -Tushar.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 201

Re: [Proposal] Named Checkpoints

2016-08-04 Thread Amol Kekre

hmm! actually it may be a good debugging tool too. Keep the named
checkpoints around. The feature is to keep checkpoints around, which can be
done by giving a feature to not delete checkpoints, but then naming them
makes it more operational. Send a command from cli->get checkpoint -> know
it is the one you need as the file name has your string you send with the
command -> debug. This is different that querying a state as this gives
entire app checkpoint to debug with.

Thks
Amol


On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
venkat...@datatorrent.com> wrote:

> + 1 for the idea.
>
> It might be helpful to developers as well when dealing with variety of
> data in large volumes if this can help them run from the checkpointed state
> rather than rerunning the application altogether in case of issues.
>
> I have seen cases where the application runs for more than 10 hours and
> some partitions fail because of the variety of data that it is dealing
> with. In such cases, the application has to be restarted and it will be
> helpful to developers with a feature of this kind.
>
>  The ease of enabling/disabling this feature to run the app will also be
> important.
>
> -Venkatesh.
>
>
> > On Aug 4, 2016, at 10:29 AM, Amol Kekre <a...@datatorrent.com> wrote:
> >
> > We had an user who wanted roll-back and restart from audit purposes. That
> > time we did not have timed-window. Names checkpoint would have helped a
> > little bit..
> >
> > Problem statement: Auditors ask for rerun of yesterday's computations for
> > verification. Assume that these computations depend on previous state
> (i.e
> > data from day before yesterday).
> >
> > Solution
> > 1. Have named checkpoints at 12 in the night (an input adapter triggers
> it)
> > every day
> > 2. The app spools raw logs into hdfs along with window ids and event
> times
> > 3. The re-run is a separate app that starts off on a named checkpoint (12
> > night yesterday)
> >
> > Technically the solution will not as simple and "new audit app" will
> need a
> > lot of other checks (dedups, drop events not in yesterday's window, wait
> > for late arrivals, ...), but names checkpoint helps.
> >
> > I do agree with Pramod's that replay within the same running app is not
> > viable within a data-in-motion architecture. But it helps somewhat in a
> new
> > audit app. Named checkpoints help data-in-motion architectures handle
> batch
> > apps better. In the above case #2 spooling done with event time
> stamp+state
> > suffices. The state part comes from names checkpoint.
> >
> > Thks,
> > Amol
> >
> >
> >
> >
> > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <san...@datatorrent.com>
> > wrote:
> >
> >> I agree. A specific use-case will be useful to support this feature.
> Also
> >> the ability to replay from the named checkpoint will be limited because
> of
> >> various factors, isn’t it?
> >>
> >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pra...@datatorrent.com> wrote:
> >>
> >>There is a problem here, keeping old checkpoints and recovering from
> >> them
> >>means preserving the old input data along with the state. This is
> more
> >> than
> >>the mechanism of actually creating named checkpoints, it means having
> >> the
> >>ability for operators to move forward (a.k.a committed and dropping
> >>committed states and buffer data) while still having the ability to
> >> replay
> >>from that point from the input source and providing a way for
> >> operators (at
> >>first look input operators) to distinguish that. Why would someone
> need
> >>this with idempotent processing? Is there a specific use case you are
> >>looking at? Suppose we go do this, for the mechanism, I would be in
> >> favor
> >>of reusing existing tuple.
> >>
> >>On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v.ro...@datatorrent.com>
> >> wrote:
> >>
> >>> +1 for the feature. At first look I am more in favor of reusing
> >> existing
> >>> control tuple.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 8/4/16 08:17, Sandesh Hegde wrote:
> >>>
> >>>> @Chinmay
> >>>> We can enhance the existing checkpoint tuple but that one is more
> >>>> frequently used than this feature, so why burden Checkpoint tuple

Re: empty operator/stream/module names

2016-08-04 Thread Amol Kekre

I agree with Sanjay. Error should be what the engine cannot do without. The
lesser work an user can get away with (a choice for a forced option) the
better. So if users want they can name these objects, but if they don't it
still works.

Thks
Amol


On Thu, Aug 4, 2016 at 10:03 AM, Sanjay Pujare 
wrote:

> I differ. For the UI to render a DAG the names are useful, but if the name
> is not required by the engine i.e. the engine is able to execute your
> application fine with empty or null strings as names, is there any reason
> to make them mandatory?
>
> On the other hand, we can come up with a scheme for system generated names
> when the caller doesn’t provide a name. I have some ideas.
>
>
> On 8/4/16, 9:48 AM, "Munagala Ramanath"  wrote:
>
> I don't see any reason to allow either.
>
> Ram
>
> On Thu, Aug 4, 2016 at 8:51 AM, Vlad Rozov 
> wrote:
>
> > Currently addOperator/addStream/addModule allows both null and empty
> > string in the operator/stream/module names. Is there any reason to
> allow
> > empty string? Should empty string and null be disallowed in those
> APIs?
> >
> > Vlad
> >
>
>
>
>

Re: [Proposal] Support storing apps in a Configuration Package

2016-07-21 Thread Amol Kekre

+1. Good idea

Thks
Amol

On Thu, Jul 21, 2016 at 5:28 PM, Sandesh Hegde 
wrote:

> Configuration packages will contain JSON apps.
> During an app launch users can choose to see and launch only the apps
> present in the Config package.
>
> It is like a hub and spoke model, single AppPackage and multiple custom
> views.
>
> Initial work is here,
> https://github.com/apache/apex-core/pull/360
>
> On Thu, Jul 21, 2016 at 5:02 PM Sasha Parfenov  wrote:
>
> > Sounds promising.  Perhaps you can elaborate if it mean that we're adding
> > JSON apps to the Configuration Packages spec?  Or that we're providing
> > support to link Configuration Packages to existing App Package Apps?  Or
> > something else?
> >
> > Thanks,
> > Sasha
> >
> >
> >
> > On Tue, Jul 19, 2016 at 5:37 PM, Sandesh Hegde 
> > wrote:
> >
> > > Hi All,
> > >
> > > Apex supports configuration package, separates application package from
> > the
> > > actual configuration. (
> > http://docs.datatorrent.com/configuration_packages/
> > > )
> > >
> > > We want to enhance the configuration package by adding support to "add
> > > Apps" (json format).
> > >
> > > UseCase: Multiple users sharing the same app package, but have a
> > different
> > > view of the golden copy of the app package.
> > >
> > > Note: This feature is requested by an Apex user.
> > >
> > > Thanks
> > >
> >
>

Re: Bleeding edge branch ?

2016-07-20 Thread Amol Kekre

Sandesh,
Not worrying about EOL is a big deal. It creates problems for current
users, and also sends a message to new users (pre-adoption) on how we will
take care of them. Two branches, etc. need to be thought through by all of
us in terms of our ability to support. IMHO, we are rushing on this topic.

Thks,
Amol


On Wed, Jul 20, 2016 at 8:30 AM, Sandesh Hegde 
wrote:

> Our current model of supporting the oldest supported Hadoop, penalizes the
> users of latest Hadoop versions by favoring the slow movers.
> Also, we won't benefit from the increased maturity of the Hadoop platform,
> as we will be working on the many years old version of Hadoop.
> We also need to incentivize our customers to upgrade their Hadoop version,
> by making use of new features.
>
> My vote goes to start the work on the Hadoop 2.6 ( or any other version )
> in a different branch, without waiting for the EOL policies.
>
> On Tue, Jul 12, 2016 at 1:16 AM Thomas Weise 
> wrote:
>
> > -0
> >
> > I read the thread twice, it is not clear to me what benefit Apex users
> > derive from this exercise. A branch normally contains development work
> that
> > is eventually brought back to the main line and into a release. Here, the
> > suggestion seems to be an open ended effort to play with latest tech,
> isn't
> > that something anyone (including a group of folks) can do in a fork. I
> > don't see value in a permanent branch for that, who is going to maintain
> > such code and who will ever use it?
> >
> > There was a point that we can find out about potential problems with
> later
> > versions. The way to find such issues is to take the releases and run
> them
> > on these later versions (that's what users do), not by changing the code!
> >
> > Regarding Java version: Our users don't use Apex in a vacuum. Please
> have a
> > look at ASF Hadoop and the distros EOL policies. That will answer the
> > question what Java version is appropriate. I would be surprised if
> > something that works on Java 7 falls flat on the face with Java 8 as a
> lot
> > of diligence goes into backward compatibility. Again the way to tests
> this
> > is to run verification with existing Apex releases on Java 8 based stack.
> >
> > Regarding Hadoop version: This has been discussed off record several
> times
> > and there are actual JIRA tickets marked accordingly so that the work is
> > done when we move. It is a separate discussion, no need to mix Java
> > versions and branching with it. I agree with what David said, if someone
> > can show that we can move up to 2.6 based on EOL policies and what known
> > Apex users have in production, then we should work on that upgrade. The
> way
> > I imagine it would work is that we have a Hadoop-2.6 (or whatever
> version)
> > branch, make all the upgrade related changes there (which should be a
> list
> > of JIRAs) and then merge it back to master when we are satisfied. After
> > that, the branch can be deleted.
> >
> > Thomas
> >
> >
> >
> > On Tue, Jul 12, 2016 at 8:36 AM, Chinmay Kolhatkar <
> > chin...@datatorrent.com>
> > wrote:
> >
> > > I'm -0 on this idea.
> > >
> > > Here is the reason:
> > > Unless we see a real case where users want to see everything on latest,
> > > this branch might quickly become low hanging fruit and eventually get
> > > obsolete because its anyway a "no gaurantee" branch.
> > >
> > > We have a bunch of dependencies which we'll have to take care of to
> > really
> > > make it bleeding edge. Specially about malhar, its a long list. That
> > looks
> > > like quite significant work.
> > > Moreover, if this branch is going to be in "may or may not work" state;
> > I,
> > > as a user or developer, would bank on what certainly works.
> > >
> > > I also think that, if its going to be "no gaurantee" then its worth
> > > spending time contributions towards master rather than bleeding-edge
> > > branch.
> > >
> > > If a question of "should we upgrade?" comes, the community is mature to
> > > take that call then and work accordingly.
> > >
> > > -Chinmay.
> > >
> > >
> > >
> > > On Tue, Jul 12, 2016 at 11:42 AM, Priyanka Gugale 
> > > wrote:
> > >
> > > > +1 for creating such branch.
> > > > One of us will have to rebase it with master branch at intervals. I
> > don't
> > > > think everyone will cherry-pick their commits here. We can make it
> once
> > > in
> > > > a month activity. Are we considering updating all dependency library
> > > > version as well?
> > > >
> > > > -Priyanka
> > > >
> > > > On Tue, Jul 12, 2016 at 2:34 AM, Munagala Ramanath <
> > r...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Following up on some comments, wanted to clarify what I have in
> mind
> > > for
> > > > > this branch:
> > > > >
> > > > > 1. The main goal is to stay up-to-date with new releases, so if a
> > > > question
> > > > > of the form
> > > > > "A new release of X is available, should we upgrade ?" comes
> up,
> > > the
> > > > >

Re: ApacheCon Europe Call For Papers Open

2016-07-13 Thread Amol Kekre

Atri,
Can you provide the link to your talk, agenda. I can help with some content
once we know the entire context

Thks
Amol


On Tue, Jul 12, 2016 at 11:28 PM, Atri Sharma  wrote:

> FWIW I am doing a talk on streaming planner with Apache Calcite and Apache
> Apex. Anybody with any inputs on that may kindly suggest.
>
> On Tue, Jul 12, 2016 at 11:55 PM, Rich Bowen  wrote:
>
> > As you are no doubt already aware, we will be holding ApacheCon in
> > Seville, Spain, the week of November 14th, 2016. The call for papers
> > (CFP) for this event is now open, and will remain open until
> > September 9th.
> >
> > The event is divided into two parts, each with its own CFP. The first
> > part of the event, called Apache Big Data, focuses on Big Data
> > projects and related technologies.
> >
> > Website: http://events.linuxfoundation.org/events/apache-big-data-europe
> > CFP:
> >
> http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp
> >
> > The second part, called ApacheCon Europe, focuses on the Apache
> > Software Foundation as a whole, covering all projects, community
> > issues, governance, and so on.
> >
> > Website: http://events.linuxfoundation.org/events/apachecon-europe
> > CFP:
> http://events.linuxfoundation.org/events/apachecon-europe/program/cfp
> >
> > ApacheCon is the official conference of the Apache Software
> > Foundation, and is the best place to meet members of your project and
> > other ASF projects, and strengthen your project's community.
> >
> > If your organization is interested in sponsoring ApacheCon, contact me
> > at e...@apache.org  ApacheCon is a great place to find the brightest
> > developers in the world, and experts on a huge range of technologies.
> >
> > I hope to see you in Seville!
> >
> >
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>

Re: A proposal for Malhar

2016-07-12 Thread Amol Kekre

My vote is to do 2&3

Thks
Amol


On Tue, Jul 12, 2016 at 12:14 PM, Kottapalli, Venkatesh <
vkottapa...@directv.com> wrote:

> +1 for deprecating the packages listed below.
>
> -Original Message-
> From: hsy...@gmail.com [mailto:hsy...@gmail.com]
> Sent: Tuesday, July 12, 2016 12:01 PM
>
> +1
>
> On Tue, Jul 12, 2016 at 11:53 AM, David Yan <da...@datatorrent.com> wrote:
>
> > Hi all,
> >
> > I would like to renew the discussion of retiring operators in Malhar.
> >
> > As stated before, the reason why we would like to retire operators in
> > Malhar is because some of them were written a long time ago before
> > Apache incubation, and they do not pertain to real use cases, are not
> > up to par in code quality, have no potential for improvement, and
> > probably completely unused by anybody.
> >
> > We do not want contributors to use them as a model of their
> > contribution, or users to use them thinking they are of quality, and
> then hit a wall.
> > Both scenarios are not beneficial to the reputation of Apex.
> >
> > The initial 3 packages that we would like to target are *lib/algo*,
> > *lib/math*, and *lib/streamquery*.
> >
> > I'm adding this thread to the users list. Please speak up if you are
> > using any operator in these 3 packages. We would like to hear from you.
> >
> > These are the options I can think of for retiring those operators:
> >
> > 1) Completely remove them from the malhar repository.
> > 2) Move them from malhar-library into a separate artifact called
> > malhar-misc
> > 3) Mark them deprecated and add to their javadoc that they are no
> > longer supported
> >
> > Note that 2 and 3 are not mutually exclusive. Any thoughts?
> >
> > David
> >
> > On Tue, Jun 7, 2016 at 2:27 PM, Pramod Immaneni
> > <pra...@datatorrent.com>
> > wrote:
> >
> >> I wanted to close the loop on this discussion. In general everyone
> >> seemed to be favorable to this idea with no serious objections. Folks
> >> had good suggestions like documenting capabilities of operators, come
> >> up well defined criteria for graduation of operators and what those
> >> criteria may be and what to do with existing operators that may not
> >> yet be mature or unused.
> >>
> >> I am going to summarize the key points that resulted from the
> >> discussion and would like to proceed with them.
> >>
> >>- Operators that do not yet provide the key platform capabilities to
> >>make an operator useful across different applications such as
> >> reusability,
> >>partitioning static or dynamic, idempotency, exactly once will still
> be
> >>accepted as long as they are functionally correct, have unit tests
> >> and will
> >>go into a separate module.
> >>- Contrib module was suggested as a place where new contributions go
> in
> >>that don't yet have all the platform capabilities and are not yet
> >> mature.
> >>If there are no other suggestions we will go with this one.
> >>- It was suggested the operators documentation list those platform
> >>capabilities it currently provides from the list above. I will
> >> document a
> >>structure for this in the contribution guidelines.
> >>- Folks wanted to know what would be the criteria to graduate an
> >>operator to the big leagues :). I will kick-off a separate thread
> >> for it as
> >>I think it requires its own discussion and hopefully we can come
> >> up with a
> >>set of guidelines for it.
> >>- David brought up state of some of the existing operators and their
> >>retirement and the layout of operators in Malhar in general and how
> it
> >>causes problems with development. I will ask him to lead the
> >> discussion on
> >>that.
> >>
> >> Thanks
> >>
> >> On Fri, May 27, 2016 at 7:47 PM, David Yan <da...@datatorrent.com>
> wrote:
> >>
> >> > The two ideas are not conflicting, but rather complementing.
> >> >
> >> > On the contrary, putting a new process for people trying to
> >> > contribute while NOT addressing the old unused subpar operators in
> >> > the repository
> >> is
> >> > what is conflicting.
> >> >
> >> > Keep in mind that when people try to contribute, they always look
> >> > at the existing operators already in the r

[jira] [Created] (APEXMALHAR-2137) Create an Integrated With page

2016-07-11 Thread Amol Kekre (JIRA)

Amol Kekre created APEXMALHAR-2137:
--

 Summary: Create an Integrated With page
 Key: APEXMALHAR-2137
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2137
 Project: Apache Apex Malhar
  Issue Type: Task
Reporter: Amol Kekre



A page that lists the logogs of all the technologies we integrate with. This 
way users can visually see all the technologies in a single page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 109 matches

Mail list logo