Re: project dependencies - important

2018-05-04 Thread Munagala Ramanath
 Similarly, some links at http://apex.apache.org/docs.html are now broken (for 
example http://docs.datatorrent.com/tutorials/topnwords/ and 
http://docs.datatorrent.com/beginner/)
It would be useful if such content could be rehosted elsewhere.
Ram
On Thursday, May 3, 2018, 9:46:04 AM PDT, Thomas Weise  
wrote:  
 
 Pramod,

There are several Apex related blogs on the DT web site that are referenced
from http://apex.apache.org/docs.html (there are probably a few more, but
that should be easy to identify).

Would it be possible to export that subset (for example by saving as full
page html) and check them into another github with ASL license? This way
they would remain available to users and we could continue to reference
them.

Thanks


On Thu, May 3, 2018 at 6:27 AM, Pramod Immaneni 
wrote:

> There may be copyright issues to deal with in case of a move.
>
> Thanks
>
> On Wed, May 2, 2018 at 10:22 PM, Vlad Rozov  wrote:
>
> > For netlet we can either move it to apex-core (it will require package
> > name change and major version update) or keep it in public github
> > repository. Other libraries are available on maven central and I see that
> > you already open a PR (+1).
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 5/2/18 19:38, Thomas Weise wrote:
> >
> >> That one is not needed and I think we can nuke the zmq operator also.
> >>
> >> R has an example, so we might want to find a way to keep that alive.
> >>
> >>
> >> On Wed, May 2, 2018 at 7:33 PM, Pramod Immaneni  >
> >> wrote:
> >>
> >> I see a kafka 0.7.1. Not sure if it is still being used.
> >>>
> >>> Thanks
> >>>
> >>> On Wed, May 2, 2018 at 7:05 PM, Thomas Weise  wrote:
> >>>
> >>> Hi Pramod,
> 
>  Thanks for the headsup.
> 
>  Do you have the list of malhar dependencies that live in the DT
> server?
> 
>  I remember R and zmq, anything else?
> 
>  Thanks,
>  Thomas
> 
> 
>  On Wed, May 2, 2018 at 6:53 PM, Pramod Immaneni <
> pra...@datatorrent.com
>  >
>  wrote:
> 
>  Hello Community,
> >
> > As you may know, DataTorrent was the original developer and initial
> > contributor of Apex. There are a couple of dependencies the project
> >
>  still
> >>>
>  has on DataTorrent.
> >
> > One of them is the project netlet, hosted in DataTorrent github, that
> > provides a networking library for data transfer. Second, malhar
> depends
> > on thirdparty maven artifacts hosted on DataTorrent maven server.
> Going
> > forward, DataTorrent will no longer be hosting these resources.
> >
> > The netlet repository has been transferred to
> > https://github.com/dtpublic/netlet which is a general account
> >
>  unrelated
> >>>
>  to
> 
> > DataTorrent org. This project can also be brought into apex if we
> think
> > that would be best for the library and the apex project.
> >
> > There are a small number of custom maven artifacts under
> > https://www.datatorrent.com/maven/content/repositories/thirdparty/.
> A
> >
>  new
> 
> > home needs to be found for these or malhar changed to not depend on
> >
>  these.
> 
> > The maven server will be shutdown tomorrow. One idea for the
> immediate
> >
>  need
> 
> > is to temporarily store these under apex-site.
> >
> > Thanks
> >
> >
> >
>
  

Re: [Proposal] Simulate setting for application launch

2017-12-19 Thread Munagala Ramanath
 +1
Ram
On Tuesday, December 19, 2017, 8:33:21 AM PST, Pramod Immaneni 
 wrote:  
 
 I have a mini proposal. The command get-app-package-info runs the
populateDAG method of an application to construct the DAG but does not
actually launch the DAG. An application developer does not know in which
context the populateDAG is being called. For example, if they are recording
application starts in an external system from populateDAG, they will have
false entries there. This can be solved in different ways such as
introducing another method in StreamingApplication or more parameters
to populateDAG but a non disruptive option would be to add a property in
the configuration object that is passed to populateDAG to indicate if it is
simulate/test mode or real launch. An application developer can use this
property to take the appropriate actions.

Thanks
  

Re: checking dependencies for known vulnerabilities

2017-11-01 Thread Munagala Ramanath
 Failing the CI build seems too drastic. My suggestion is to create a new 
profile that activates thisfunctionality and encourage people to run with this 
profile and file JIRAs when somethingexciting happens.
Ram
On Wednesday, November 1, 2017, 1:26:00 PM PDT, Thomas Weise 
 wrote:  
 
 Considering typical behavior, unless the CI build fails, very few will be
interested fixing the issues.

Perhaps if after a CI failure the issue can be identified as pre-existing,
we can whitelist and create a JIRA that must be addressed prior to the next
release?


On Wed, Nov 1, 2017 at 7:51 PM, Pramod Immaneni 
wrote:

> I would like to hear what others think.. at this point I am -1 on merging
> the change as is that would fail all PR builds when a matching CVE is
> discovered regardless of whether the PR was the cause of the CVE or not.
>
> On Wed, Nov 1, 2017 at 12:07 PM, Vlad Rozov  wrote:
>
> > On 11/1/17 11:39, Pramod Immaneni wrote:
> >
> >> On Wed, Nov 1, 2017 at 11:36 AM, Vlad Rozov  wrote:
> >>
> >> There is no independent build and the check is still necessary to
> prevent
> >>> new dependencies with CVE being introduced.
> >>>
> >>> There isn't one today but one could be added. What kind of effort is
> >> needed.
> >>
> > After it is added, we can discuss whether it will make sense to move the
> > check to the newly created build. Even if one is added, the check needs
> to
> > be present in the CI builds that verify PR, so it is in the right place
> > already, IMO.
> >
> >>
> >>
> >> Look at Malhar 3.8.0 thread. There are libraries from Category X
> >>> introduced as a dependency, so now instead of dealing with the issue
> when
> >>> such dependencies were introduced, somebody else needs to deal with
> >>> removing/fixing those dependencies.
> >>>
> >>> Those were directly introduced in PRs. I am not against adding
> additional
> >> checks that verify the PR better.
> >>
> > Right and it would be much better to catch the problem at the time it was
> > introduced, but Category X list (as well as known CVE) is not static.
> >
> >
> >>
> >> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 11/1/17 11:21, Pramod Immaneni wrote:
> >>>
> >>> My original concern still remains. I think what you have is valuable
> but
>  would prefer that it be activated in an independent build that
> notifies
>  the
>  interested parties.
> 
>  On Wed, Nov 1, 2017 at 11:13 AM, Vlad Rozov 
> wrote:
> 
>  Any other concerns regarding merging the PR? By looking at the active
>  PRs
> 
> > on the apex core the entire conversation looks to be at the moot
> point.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 10/30/17 18:50, Vlad Rozov wrote:
> >
> > On 10/30/17 17:30, Pramod Immaneni wrote:
> >
> >> On Sat, Oct 28, 2017 at 7:47 AM, Vlad Rozov 
> >> wrote:
> >>
> >>> Don't we use unit test to make sure that PR does not break an
> >>> existing
> >>>
> >>> functionality? For that we use CI environment that we do not
> control
>  and do
>  not introduce any changes to, but for example Apache
> infrastructure
>  team
>  may decide to upgrade Jenkins and that may impact Apex builds. The
>  same
>  applies to CVE. It is there to prevent dependencies with severe
>  vulnerabilities.
> 
>  Infrastructure changes are quite different, IMO, from this
> proposal.
> 
>  While
> >>> they are not in our control, in majority of the cases, the changes
> >>> maintain
> >>> compatibility so everything on top will continue to run the same.
> In
> >>> this
> >>> case a CVE will always fail all PRs, the code changes have nothing
> to
> >>> do
> >>> with introducing the CVE. I did make the exception that if a PR is
> >>> bringing
> >>> in the CVE yes do fail it.
> >>>
> >>> There were just two recent changes, one on Travis CI side and
> another
> >>>
> >> on
> >> Jenkins side that caused builds for all PRs to fail and none of them
> >> was
> >> caused by code changes in any of open PRs, so I don't see how it is
> >> different.
> >>
> >> A code change may or may not have relation to CVE introduced. For
> >> newly
> >> introduced dependencies, there may be known CVEs. In any case I
> don't
> >> think
> >> it is important to differentiate how CVE is introduced, it is
> >> important
> >> to
> >> eliminate dependencies with known CVEs.
> >>
> >>
> >> There is no "stick" in a failed build or keeping PR open until
> >>> dependency
> >>>
> >>> issue is resolved or unit test failure is fixed. Unless an employer
>  punishes its employee for not delivering PR based on that vendor
>  priority,
>  there is no 

Re: [VOTE] Major version change for Apex Library (Malhar)

2017-08-22 Thread Munagala Ramanath
+1 for option 2 (primary)
+1 for option 1 (secondary)
Ram


On Tuesday, August 22, 2017, 6:58:46 AM PDT, Vlad Rozov  
wrote:

+1 for option 2 (primary)
+1 for option 1 (secondary)

Thank you,

Vlad

On 8/21/17 23:37, Ananth G wrote:
> +1 for option 2 and second vote for option 1
>
> Have we finalized the library name ? Going from Apex-malhar 3.7 to 
> Apex-malhar-1.0 would be counter intuitive. Also it would be great if we have 
> an agreed process to mark an operator from @evolving to stable version given 
> we are trying to address this as well as part of the proposal
>
> Regards
> Ananth
>
>> On 22 Aug 2017, at 11:40 am, Thomas Weise  wrote:
>>
>> +1 for option 2 (second vote +1 for option 1)
>>
>>
>>> On Mon, Aug 21, 2017 at 6:39 PM, Thomas Weise  wrote:
>>>
>>> This is to formalize the major version change for Malhar discussed in [1].
>>>
>>> There are two options for major version change. Major version change will
>>> rename legacy packages to org.apache.apex sub packages while retaining file
>>> history in git. Other cleanup such as removing deprecated code is also
>>> expected.
>>>
>>> 1. Version 4.0 as major version change from 3.x
>>>
>>> 2. Version 1.0 with simultaneous change of Maven artifact IDs
>>>
>>> Please refer to the discussion thread [1] for reasoning behind both of the
>>> options.
>>>
>>> Please vote on both options. Primary vote for your preferred option,
>>> secondary for the other. Secondary vote can be used when counting primary
>>> vote alone isn't conclusive.
>>>
>>> Vote will be open for at least 72 hours.
>>>
>>> Thanks,
>>> Thomas
>>>
>>> [1] https://lists.apache.org/thread.html/bd1db8a2d01e23b0c0ab98a785f6ee
>>> 9492a1ac9e52d422568a46e5f3@%3Cdev.apex.apache.org%3E
>>>


Thank you,

Vlad


Re: Backward compatibility issue in 3.6.0 release

2017-05-15 Thread Munagala Ramanath
I like proposal 1 too; I also agree with Ajay about doing a 2.6.1 patch release.
Ram 

On Monday, May 15, 2017 10:18 AM, AJAY GUPTA  wrote:
 

 I would vote for 1 and making variables private since it anyways breaks
semantic versioning.
I think it would it be a good idea to release a 3.6.1 patch release as
well.


Ajay

On Mon, May 15, 2017 at 10:36 PM, Sanjay Pujare 
wrote:

> I vote for renaming to less common names like __count. The renaming breaks
> compatibility from 3.6.0 to 3.7.0 but seems to be the best option.
>
> On Mon, May 15, 2017 at 9:53 AM, Vlad Rozov 
> wrote:
>
> > Hi All,
> >
> > There is a possible change in operators behavior caused by changes that
> > were introduced in the release 3.6.0 into DefaultInputPort and
> > DefaultOutputPort. Please see https://issues.apache.org/jira
> > /browse/APEXCORE-722. We need to agree how to proceed.
> >
> > 1. Break semantic versioning for the Default Input and Output Ports in
> the
> > next release (3.7.0), declare protected variables as private and provide
> > protected access method. Another option is to rename protected variables
> to
> > use less common names (for example __count).
> > 2. Keep protected variables with the risk that the following common
> > operator design pattern will be used accidentally by existing operators
> and
> > newly designed operators:
> >
> > public Operator extends BaseOperator {
> >  private int count;
> >  public DefaultInputPort in = new DefaultInputPort() {
> >    @Override
> >    public void process(Object tuple)
> >    {
> >        count++;  // updates DefaultInputPort count, not Operator count!
> >    }
> >  }
> > }
> >
> >
> > Thank you,
> >
> > Vlad
> >
>


   

Re: Image processing library

2017-05-12 Thread Munagala Ramanath
The injunction that tuple processing should be "as fast as possible" is based 
on anassumption and a fact:
1. In most cases, users want to maximize application throughput.2. If a 
callback (like beginWindow(), process(), endWindow(), etc.) takes too long,   
the platform deems the operator hung and restarts it.
Neither imposes a hard constraint: If, for a particular class of 
applications,it is OK to sacrifice throughput to allow some CPU intensive 
computations to occur,that is certainly possible; the constraint of (2) can be 
relaxed by simply increasingthe TIMEOUT_WINDOW_COUNT attribute, for some or all 
operators.
Secondly, nothing prevents an operator from starting worker threads that 
asynchronouslyperform CPU intensive computations. Naturally, careful 
synchronization will be necessarybetween the main and worker threads to ensure 
correctness and timelydelivery of results.
Ram 

On Friday, May 12, 2017 6:38 PM, Ananth G  wrote:
 

  I guess the use cases as documented look really compelling. There might be 
more comments from code review perspective and below is more from a use case 
perspective only.

I was wondering if you have any latency measurements for the tests you ran. 

If the image processing calls ( in the process function overridden from the 
Toolkit class ) are time consuming it might not be an ideal use case for a 
streaming engine? A very old "blog" (2012)  talks about latencies anywhere 
between tens of milliseconds to almost a second depending on the use case and 
image size. Of course there were hardware improvements and those numbers might 
no longer hold good and hence the question (of course the latencies depend on 
hardware being used as well ) 

This brings me to the next question in general about Apex to the community : 
what is considered an acceptable tolerance level in terms of latencies for 
streaming compute engine like Apex. Is there a way to tune the acceptable 
tolerance level depending on the use case ? I keep reading from the mailing 
lists that the aspect of tuple processing is part of the main thread and hence 
should be as fast as possible. 

Regards
Ananth

> On 12 May 2017, at 9:05 pm, Aditya gholba  wrote:
> 
> Hello,
> I have been working on an image processing library for Malhar and few of
> the operators are ready. I would like to merge them in Malhar contrib. You
> can read about the operators and the applications I have created so far
> here.
> 
> 
> Link to my GitHub 
> 
> All suggestions and opinions are welcome.
> 
> 
> Thanks,
> Aditya.

   

Javadocs for 3.6 apex-core

2017-05-08 Thread Munagala Ramanath
Now available at:
https://ci.apache.org/projects/apex-core/apex-core-javadoc-release-3.6/index.html
Ram


Re: PR merge policy

2017-04-29 Thread Munagala Ramanath
l Kekre wrote:
> >>>
> >>> Strongly agree with Ilya. Lets take these events as learning
> >>>> opportunities
> >>>> for folks to learn and improve. There can always be second commit to
> fix
> >>>> in
> >>>> case there is code issue. If it is a policy issue, we learn and
> improve.
> >>>> Rolling back, should be used rarely and it does need to be a disaster.
> >>>> We
> >>>> need to be cognizant of new contributors worrying about the cost to
> >>>> submit
> >>>> code.
> >>>>
> >>>> I too do not think Apex is hurting from bad code getting in. We are
> >>>> doing
> >>>> great with our current policies.
> >>>>
> >>>> Thks,
> >>>> Amol
> >>>>
> >>>>
> >>>> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
> >>>>
> >>>> www.datatorrent.com
> >>>>
> >>>>
> >>>> On Fri, Apr 28, 2017 at 1:35 PM, Ganelin, Ilya <
> >>>> ilya.gane...@capitalone.com>
> >>>> wrote:
> >>>>
> >>>> Guess we can all go home then. Our work here is done:
> >>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> W.R.T the discussion below, I think rolling back an improperly
> reviewed
> >>>>> PR
> >>>>> could be considered disrespectful to the committer who merged it in
> the
> >>>>> first place. I think that such situations, unless they trigger a
> >>>>> disaster,
> >>>>> should be handled by communicating the error to the responsible party
> >>>>> and
> >>>>> then allowing them to resolve it. E.g. I improperly commit an
> >>>>> unreviewed
> >>>>> PR, someone notices and sends me an email informing me of my error,
> >>>>> and I
> >>>>> then have the responsibility of unrolling the change and getting the
> >>>>> appropriate review. I think we should start with the premise that
> we’re
> >>>>> here in the spirit of collaboration and we should create
> opportunities
> >>>>> for
> >>>>> individuals to learn from their mistakes, recognize the importance of
> >>>>> particular standards (e.g. good review process leads to stable
> >>>>> projects),
> >>>>> and ultimately internalize these ethics.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Internally to our team, we’ve had great success with a policy
> requiring
> >>>>> two PR approvals and not allowing the creator of a patch to be the
> one
> >>>>> to
> >>>>> merge their PR. While this might feel a little silly, it definitely
> >>>>> helps
> >>>>> to build collaboration, familiarity with the code base, and
> >>>>> intrinsically
> >>>>> avoids PRs being merged too quickly (without a sufficient period for
> >>>>> review).
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> - Ilya Ganelin
> >>>>>
> >>>>> [image: id:image001.png@01D1F7A4.F3D42980]
> >>>>>
> >>>>>
> >>>>>
> >>>>> *From: *Pramod Immaneni <pra...@datatorrent.com>
> >>>>> *Reply-To: *"dev@apex.apache.org" <dev@apex.apache.org>
> >>>>> *Date: *Friday, April 28, 2017 at 10:09 AM
> >>>>> *To: *"dev@apex.apache.org" <dev@apex.apache.org>
> >>>>> *Subject: *Re: PR merge policy
> >>>>>
> >>>>>
> >>>>>
> >>>>> On a lighter note, looks like the powers that be have been listening
> on
> >>>>> this conversation and decided to force push an empty repo or maybe
> >>>>> github just decided that this is the best proposal ;)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 27, 2017 at 10:47 PM, Vlad Rozov <
> v.ro...@datatorrent.com>
> >>>>> wrote:
> >>>>&

Re: PR merge policy

2017-04-29 Thread Munagala Ramanath
Given the divergence of views on this topic, I think we need to clarify
what "last resort"
means; to me it means:

In most cases when PRs are prematurely merged, the preferred
mechanism to address it will be to open a new PR with corrections resulting
in a new
commit on top. A forced reversion of the prior commit will only occur only
when it is
judged to be disastrously flawed.

Ram



On Sat, Apr 29, 2017 at 8:33 AM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> Yes, the rollback is the last resort as Thomas mentioned. I hope that we
> will not need it.
>
> Possibly I misjudge, but I don't see forced push as an extremely
> disturbing event for the community especially if it is done quickly.
>
> Thank you,
>
> Vlad
>
> On 4/28/17 23:33, Bhupesh Chawda wrote:
>
>> ​
>> Vlad,
>>
>> Your point regarding not merging any PR without allowing the community to
>> review is well taken.
>> I agree that the committer should be responsible for rolling back the
>> change and
>> ​I think ​
>> we should establish the way in which it should be done.
>>
>> There
>> ​can
>>   be a delay for the committer to notice that a commit should not have
>> been
>> merged and needs to be reverted. But it might be a while before the
>> committer realizes this and a force push might create problems for the
>> community. Perhaps a new PR could be the way to go.
>>
>> Note that I agree that
>> ​undoing a PR
>>   may not even be needed given that the community has had a chance to
>> review
>> it, but there might still be cases where such undo commits need to be
>> done.
>>
>> ~ Bhupesh
>>
>>
>> ___
>>
>> Bhupesh Chawda
>>
>> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>>
>> www.datatorrent.com  |  apex.apache.org
>>
>>
>>
>> On Sat, Apr 29, 2017 at 11:07 AM, Vlad Rozov <v.ro...@datatorrent.com>
>> wrote:
>>
>> If a committer is fast to pull the trigger to merge and slow to rollback
>>> when the policy is violated, the case can be sent to PMC. It will be a
>>> clear indication that a committer does not respect the community, so I
>>> disagree that this is "just kicking the can down the road" as committer
>>> right may be eventually revoked (hope we don't need to go that far ever).
>>>
>>> Not all policies are immediately reflected at
>>> http://apex.apache.org/contributing.html. The vote clearly happened a
>>> week ago when I initially proposed that a PR can be merged only once the
>>> community has a chance to review it . http://apex.apache.org/contrib
>>> uting.html is for new contributors and new committers, existing
>>> committers should be fully aware of the PR merge policy and if in doubt,
>>> raise a question here (dev@apex).
>>>
>>> We are all humans and may overlook problems in PRs that we review. The
>>> concern is not that Travis build may fail and a commit needs to be
>>> reverted. It will not be a valid reason to undo the commit (note that it
>>> is
>>> still committer responsibility to thoroughly review changes/new code and
>>> ensure that license is in place, build is free from compilation and other
>>> errors, code is optimal and readable, and basically put his/her name next
>>> to the contributor name). The concern is when a committer does not give
>>> community a chance for review. It is against "community before code"
>>> behavior that we want to prohibit. I am strongly for requesting a
>>> contributor to undo a commit in such cases and disallowing additional
>>> "fix"
>>> commits.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On 4/28/17 20:17, Munagala Ramanath wrote:
>>>
>>> That's just kicking the can down the road: What if the committer is not
>>>> inclined
>>>> to perform the rollback ? Other reviewers can provide feedback on the
>>>> closed PR
>>>> but the committer may choose to add a new commit on top.
>>>>
>>>> Firstly, the point about waiting a day or two is not even a formal
>>>> policy
>>>> requirement
>>>> outlined at http://apex.apache.org/contributing.html in the "Merging a
>>>> Pull
>>>> Request"
>>>> section, so it only has an advisory status currently in my view.
>>>>
>>>> Secondly, I think a variety of so called "violations"

Re: PR merge policy

2017-04-28 Thread Munagala Ramanath
o quickly (without a sufficient period for
> >>> review).
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> - Ilya Ganelin
> >>>
> >>> [image: id:image001.png@01D1F7A4.F3D42980]
> >>>
> >>>
> >>>
> >>> *From: *Pramod Immaneni <pra...@datatorrent.com>
> >>> *Reply-To: *"dev@apex.apache.org" <dev@apex.apache.org>
> >>> *Date: *Friday, April 28, 2017 at 10:09 AM
> >>> *To: *"dev@apex.apache.org" <dev@apex.apache.org>
> >>> *Subject: *Re: PR merge policy
> >>>
> >>>
> >>>
> >>>
> >>> On a lighter note, looks like the powers that be have been listening on
> >>> this conversation and decided to force push an empty repo or maybe
> >>> github just decided that this is the best proposal ;)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Apr 27, 2017 at 10:47 PM, Vlad Rozov <v.ro...@datatorrent.com>
> >>> wrote:
> >>>
> >>> In this case please propose how to deal with PR merge policy violations
> >>> in
> >>> the future. I will -1 proposal to commit an improvement on top of a
> >>> commit.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>>
> >>> On 4/27/17 21:48, Pramod Immaneni wrote:
> >>>
> >>> I am sorry but I am -1 on the force push in this case.
> >>>
> >>> On Apr 27, 2017, at 9:27 PM, Thomas Weise <t...@apache.org> wrote:
> >>>
> >>> +1 as measure of last resort.
> >>>
> >>> On Thu, Apr 27, 2017 at 9:25 PM, Vlad Rozov <v.ro...@datatorrent.com>
> >>> wrote:
> >>>
> >>> IMO, force push will bring enough consequent embarrassment to avoid
> such
> >>> behavior in the future.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>> On 4/27/17 21:16, Munagala Ramanath wrote:
> >>>
> >>> My thought was that leaving the bad commit would be a permanent
> reminder
> >>> to
> >>> the committer
> >>> (and others) that a policy violation occurred and the consequent
> >>> embarrassment would be an
> >>> adequate deterrent.
> >>>
> >>> Ram
> >>>
> >>> On Thu, Apr 27, 2017 at 9:12 PM, Vlad Rozov <v.ro...@datatorrent.com>
> >>> wrote:
> >>>
> >>> I also was under impression that everyone agreed to the policy that
> gives
> >>>
> >>> everyone in the community a chance to raise a concern or to propose an
> >>> improvement to a PR. Unfortunately, it is not the case, and we need to
> >>> discuss it again. I hope that this discussion will lead to no future
> >>> violations so we don't need to forcibly undo such commits, but it will
> be
> >>> good for the community to agree on the policy that deals with
> violations.
> >>>
> >>> Ram, committing an improvement on top of a commit should be
> discouraged,
> >>> not encouraged as it eventually leads to the policy violation and lousy
> >>> PR
> >>> reviews.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>> On 4/27/17 20:54, Thomas Weise wrote:
> >>>
> >>> I also thought that everybody was in agreement about that after the
> first
> >>>
> >>> round of discussion and as you say it would be hard to argue against
> it.
> >>> And I think we should not have to be back to the same topic a few days
> >>> later.
> >>>
> >>> While you seem to be focussed on the disagreement on policy violation,
> >>> I'm
> >>> more interested in a style of collaboration that does not require such
> >>> discussion.
> >>>
> >>> Thomas
> >>>
> >>> On Thu, Apr 27, 2017 at 8:45 PM, Munagala Ramanath <
> r...@datatorrent.com
> >>> wrote:
> >>>
> >>> Everybody seems agreed on what the committers should do -- that waiting
> >>> a
> >>>
> >>> day or two for
> >>> others to have a chance to comment seems like an entirely reasonable
> >>> t

Re: Github's disappearing mirrors

2017-04-28 Thread Munagala Ramanath
I got this about an hour ago from github:

--
Hi Ram,

Just wanted to let you know that we've fixed the issue and mirrors are now
syncing normally again. If the issue with your specific mirrors hasn't
resolved yet, it will when the next sync happens.

Cheers,
Jamie


On Fri, Apr 28, 2017 at 3:06 PM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

> The message below seems to indicate it is a temporary workaround and the
> fix isn't ready yet.
>
> Thanks
>
> On Fri, Apr 28, 2017 at 2:57 PM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > The repos are OK now -- github has fixed the issue.
> >
> > Ram
> >
> > On Fri, Apr 28, 2017 at 2:44 PM, Pramod Immaneni <pra...@datatorrent.com
> >
> > wrote:
> >
> > > Could happen again so don't be alarmed.
> > >
> > > Thanks
> > >
> > > Begin forwarded message:
> > >
> > > *From:* Chris Lambertus <c...@apache.org>
> > > *Date:* April 28, 2017 at 12:22:50 PM PDT
> > > *To:* committers <committ...@apache.org>
> > > *Subject:* *Github's disappearing mirrors*
> > > *Reply-To:* us...@infra.apache.org
> > >
> > > Hello committers,
> > >
> > > We have received quite a few reports of github mirrors gone missing.
> > We’ve
> > > tracked this down to an errant process at Github which appears to be
> > > deleting
> > > not only ours but also other orgs’ mirrors. We contacted Github but
> have
> > > yet to
> > > receive a reply. Another organization also contacted github and
> received
> > > the
> > > following reply:
> > >
> > > "Hi there, Sorry for the trouble! We've now had a couple of reports of
> > this
> > > problem, and we've opened an issue internally to investigate.  I don't
> > have
> > > an
> > > ETA on a fix, but we'll be in touch if we need more information from
> you
> > or
> > > if
> > > we have any information to share.  Regards, Laura GitHub Support”
> > >
> > >
> > > We have no further information at this time. We have been restoring the
> > > mirrors
> > > wherever possible, but until the root cause is resolved on Github’s
> side,
> > > we
> > > expect mirrors to continue to be erroneously removed.
> > >
> > > Access to the repos via the usual https://git-wip-us.apache.org/
> channel
> > > remains functional.
> > >
> > > -Chris
> > > ASF Infra
> > >
> >
> >
> >
> > --
> >
> > ___
> >
> > Munagala V. Ramanath
> >
> > Software Engineer
> >
> > E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam
> >
> > www.datatorrent.com  |  apex.apache.org
> >
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: Github's disappearing mirrors

2017-04-28 Thread Munagala Ramanath
The repos are OK now -- github has fixed the issue.

Ram

On Fri, Apr 28, 2017 at 2:44 PM, Pramod Immaneni 
wrote:

> Could happen again so don't be alarmed.
>
> Thanks
>
> Begin forwarded message:
>
> *From:* Chris Lambertus 
> *Date:* April 28, 2017 at 12:22:50 PM PDT
> *To:* committers 
> *Subject:* *Github's disappearing mirrors*
> *Reply-To:* us...@infra.apache.org
>
> Hello committers,
>
> We have received quite a few reports of github mirrors gone missing. We’ve
> tracked this down to an errant process at Github which appears to be
> deleting
> not only ours but also other orgs’ mirrors. We contacted Github but have
> yet to
> receive a reply. Another organization also contacted github and received
> the
> following reply:
>
> "Hi there, Sorry for the trouble! We've now had a couple of reports of this
> problem, and we've opened an issue internally to investigate.  I don't have
> an
> ETA on a fix, but we'll be in touch if we need more information from you or
> if
> we have any information to share.  Regards, Laura GitHub Support”
>
>
> We have no further information at this time. We have been restoring the
> mirrors
> wherever possible, but until the root cause is resolved on Github’s side,
> we
> expect mirrors to continue to be erroneously removed.
>
> Access to the repos via the usual https://git-wip-us.apache.org/ channel
> remains functional.
>
> -Chris
> ASF Infra
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: PR merge policy

2017-04-27 Thread Munagala Ramanath
My thought was that leaving the bad commit would be a permanent reminder to
the committer
(and others) that a policy violation occurred and the consequent
embarrassment would be an
adequate deterrent.

Ram

On Thu, Apr 27, 2017 at 9:12 PM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> I also was under impression that everyone agreed to the policy that gives
> everyone in the community a chance to raise a concern or to propose an
> improvement to a PR. Unfortunately, it is not the case, and we need to
> discuss it again. I hope that this discussion will lead to no future
> violations so we don't need to forcibly undo such commits, but it will be
> good for the community to agree on the policy that deals with violations.
>
> Ram, committing an improvement on top of a commit should be discouraged,
> not encouraged as it eventually leads to the policy violation and lousy PR
> reviews.
>
> Thank you,
>
> Vlad
>
> On 4/27/17 20:54, Thomas Weise wrote:
>
>> I also thought that everybody was in agreement about that after the first
>> round of discussion and as you say it would be hard to argue against it.
>> And I think we should not have to be back to the same topic a few days
>> later.
>>
>> While you seem to be focussed on the disagreement on policy violation, I'm
>> more interested in a style of collaboration that does not require such
>> discussion.
>>
>> Thomas
>>
>> On Thu, Apr 27, 2017 at 8:45 PM, Munagala Ramanath <r...@datatorrent.com>
>> wrote:
>>
>> Everybody seems agreed on what the committers should do -- that waiting a
>>> day or two for
>>> others to have a chance to comment seems like an entirely reasonable
>>> thing.
>>>
>>> The disagreement is about what to do when that policy is violated.
>>>
>>> Ram
>>>
>>> On Thu, Apr 27, 2017 at 8:37 PM, Thomas Weise <t...@apache.org> wrote:
>>>
>>> Forced push should not be necessary if committers follow the development
>>>> process.
>>>>
>>>> So why not focus on what the committer should do? Development process
>>>> and
>>>> guidelines are there for a reason, and the issue was raised before.
>>>>
>>>> In this specific case, there is a string of commits related to the
>>>> plugin
>>>> feature that IMO should be part of the original review. There wasn't any
>>>> need to rush this, the change wasn't important for the release.
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On Thu, Apr 27, 2017 at 8:11 PM, Munagala Ramanath <r...@datatorrent.com
>>>> >
>>>> wrote:
>>>>
>>>> I agree with Pramod that force pushing should be a rare event done only
>>>>> when there is an immediate
>>>>> need to undo something serious. Doing it just for a policy violation
>>>>>
>>>> should
>>>>
>>>>> itself be codified in our
>>>>> policies as a policy violation.
>>>>>
>>>>> Why not just commit an improvement on top ?
>>>>>
>>>>> Ram
>>>>>
>>>>> On Thu, Apr 27, 2017 at 7:55 PM, Vlad Rozov <v.ro...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> Violation of the PR merge policy is a sufficient reason to forcibly
>>>>>>
>>>>> undo
>>>>
>>>>> the commit and such violations affect everyone in the community.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>> On 4/27/17 19:36, Pramod Immaneni wrote:
>>>>>>
>>>>>> I agree that PRs should not be merged without a chance for others to
>>>>>>> review. I don't agree to force push and altering the commit tree
>>>>>>>
>>>>>> unless
>>>>
>>>>> it
>>>>>
>>>>>> is absolutely needed, as it affects everyone. This case doesn't
>>>>>>>
>>>>>> warrant
>>>>
>>>>> this step, one scenario where a force push might be needed is if
>>>>>>>
>>>>>> somebody
>>>>>
>>>>>> pushed some copyrighted code.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Thu, Apr 27, 2017 at 6:44 PM, Vlad Rozov <
>>>>>>>
>&

Re: PR merge policy

2017-04-27 Thread Munagala Ramanath
Idealism is a wonderful thing but reality sometimes intrudes.

Ram

On Thu, Apr 27, 2017 at 8:54 PM, Thomas Weise <t...@apache.org> wrote:

> I also thought that everybody was in agreement about that after the first
> round of discussion and as you say it would be hard to argue against it.
> And I think we should not have to be back to the same topic a few days
> later.
>
> While you seem to be focussed on the disagreement on policy violation, I'm
> more interested in a style of collaboration that does not require such
> discussion.
>
> Thomas
>
> On Thu, Apr 27, 2017 at 8:45 PM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > Everybody seems agreed on what the committers should do -- that waiting a
> > day or two for
> > others to have a chance to comment seems like an entirely reasonable
> thing.
> >
> > The disagreement is about what to do when that policy is violated.
> >
> > Ram
> >
> > On Thu, Apr 27, 2017 at 8:37 PM, Thomas Weise <t...@apache.org> wrote:
> >
> > > Forced push should not be necessary if committers follow the
> development
> > > process.
> > >
> > > So why not focus on what the committer should do? Development process
> and
> > > guidelines are there for a reason, and the issue was raised before.
> > >
> > > In this specific case, there is a string of commits related to the
> plugin
> > > feature that IMO should be part of the original review. There wasn't
> any
> > > need to rush this, the change wasn't important for the release.
> > >
> > > Thomas
> > >
> > >
> > > On Thu, Apr 27, 2017 at 8:11 PM, Munagala Ramanath <
> r...@datatorrent.com>
> > > wrote:
> > >
> > > > I agree with Pramod that force pushing should be a rare event done
> only
> > > > when there is an immediate
> > > > need to undo something serious. Doing it just for a policy violation
> > > should
> > > > itself be codified in our
> > > > policies as a policy violation.
> > > >
> > > > Why not just commit an improvement on top ?
> > > >
> > > > Ram
> > > >
> > > > On Thu, Apr 27, 2017 at 7:55 PM, Vlad Rozov <v.ro...@datatorrent.com
> >
> > > > wrote:
> > > >
> > > > > Violation of the PR merge policy is a sufficient reason to forcibly
> > > undo
> > > > > the commit and such violations affect everyone in the community.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Vlad
> > > > >
> > > > > On 4/27/17 19:36, Pramod Immaneni wrote:
> > > > >
> > > > >> I agree that PRs should not be merged without a chance for others
> to
> > > > >> review. I don't agree to force push and altering the commit tree
> > > unless
> > > > it
> > > > >> is absolutely needed, as it affects everyone. This case doesn't
> > > warrant
> > > > >> this step, one scenario where a force push might be needed is if
> > > > somebody
> > > > >> pushed some copyrighted code.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> On Thu, Apr 27, 2017 at 6:44 PM, Vlad Rozov <
> > v.ro...@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >> I am open to both approaches - two commits or a join commit. Both
> > have
> > > > >>> pros and cons that we may discuss. What I am strongly against are
> > PRs
> > > > >>> that
> > > > >>> are merged without a chance for other contributors/committers to
> > > > review.
> > > > >>> There should be a way to forcibly undo such commits.
> > > > >>>
> > > > >>> Thank you,
> > > > >>>
> > > > >>> Vlad
> > > > >>>
> > > > >>>
> > > > >>> On 4/27/17 12:41, Pramod Immaneni wrote:
> > > > >>>
> > > > >>> My comments inline..
> > > > >>>>
> > > > >>>> On Thu, Apr 27, 2017 at 12:01 PM, Thomas Weise <t...@apache.org>
> > > > wrote:
> > > > >>>>
> > > > >>>> I'm -1 on using the author field like this in Apex for the
> reas

Re: PR merge policy

2017-04-27 Thread Munagala Ramanath
Everybody seems agreed on what the committers should do -- that waiting a
day or two for
others to have a chance to comment seems like an entirely reasonable thing.

The disagreement is about what to do when that policy is violated.

Ram

On Thu, Apr 27, 2017 at 8:37 PM, Thomas Weise <t...@apache.org> wrote:

> Forced push should not be necessary if committers follow the development
> process.
>
> So why not focus on what the committer should do? Development process and
> guidelines are there for a reason, and the issue was raised before.
>
> In this specific case, there is a string of commits related to the plugin
> feature that IMO should be part of the original review. There wasn't any
> need to rush this, the change wasn't important for the release.
>
> Thomas
>
>
> On Thu, Apr 27, 2017 at 8:11 PM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > I agree with Pramod that force pushing should be a rare event done only
> > when there is an immediate
> > need to undo something serious. Doing it just for a policy violation
> should
> > itself be codified in our
> > policies as a policy violation.
> >
> > Why not just commit an improvement on top ?
> >
> > Ram
> >
> > On Thu, Apr 27, 2017 at 7:55 PM, Vlad Rozov <v.ro...@datatorrent.com>
> > wrote:
> >
> > > Violation of the PR merge policy is a sufficient reason to forcibly
> undo
> > > the commit and such violations affect everyone in the community.
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > > On 4/27/17 19:36, Pramod Immaneni wrote:
> > >
> > >> I agree that PRs should not be merged without a chance for others to
> > >> review. I don't agree to force push and altering the commit tree
> unless
> > it
> > >> is absolutely needed, as it affects everyone. This case doesn't
> warrant
> > >> this step, one scenario where a force push might be needed is if
> > somebody
> > >> pushed some copyrighted code.
> > >>
> > >> Thanks
> > >>
> > >> On Thu, Apr 27, 2017 at 6:44 PM, Vlad Rozov <v.ro...@datatorrent.com>
> > >> wrote:
> > >>
> > >> I am open to both approaches - two commits or a join commit. Both have
> > >>> pros and cons that we may discuss. What I am strongly against are PRs
> > >>> that
> > >>> are merged without a chance for other contributors/committers to
> > review.
> > >>> There should be a way to forcibly undo such commits.
> > >>>
> > >>> Thank you,
> > >>>
> > >>> Vlad
> > >>>
> > >>>
> > >>> On 4/27/17 12:41, Pramod Immaneni wrote:
> > >>>
> > >>> My comments inline..
> > >>>>
> > >>>> On Thu, Apr 27, 2017 at 12:01 PM, Thomas Weise <t...@apache.org>
> > wrote:
> > >>>>
> > >>>> I'm -1 on using the author field like this in Apex for the reason
> > stated
> > >>>>
> > >>>>> (it is also odd to see something like this showing up without prior
> > >>>>> discussion).
> > >>>>>
> > >>>>>
> > >>>>> I am not set on this for future commits but would like to say, do
> we
> > >>>>>
> > >>>> really
> > >>>> verify the author field and treat it with importance. For example, I
> > >>>> don't
> > >>>> think we ever check if the author is the person they say they are,
> > check
> > >>>> name, email etc. If I were to use slightly different variations of
> my
> > >>>> name
> > >>>> or email (not that I would do that) would reviewers really verify
> > that.
> > >>>> I
> > >>>> also have checked that tools don't fail on reading a commit because
> > >>>> author
> > >>>> needs to be in a certain format. I guess contribution stats are the
> > ones
> > >>>> that will be affected but if used rarely I dont see that being a big
> > >>>> problem. I can understand if we wanted to have strict requirements
> for
> > >>>> the
> > >>>> committer field.
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>> Consider adding the additional author information 

Re: PR merge policy

2017-04-27 Thread Munagala Ramanath
I agree with Pramod that force pushing should be a rare event done only
when there is an immediate
need to undo something serious. Doing it just for a policy violation should
itself be codified in our
policies as a policy violation.

Why not just commit an improvement on top ?

Ram

On Thu, Apr 27, 2017 at 7:55 PM, Vlad Rozov  wrote:

> Violation of the PR merge policy is a sufficient reason to forcibly undo
> the commit and such violations affect everyone in the community.
>
> Thank you,
>
> Vlad
>
> On 4/27/17 19:36, Pramod Immaneni wrote:
>
>> I agree that PRs should not be merged without a chance for others to
>> review. I don't agree to force push and altering the commit tree unless it
>> is absolutely needed, as it affects everyone. This case doesn't warrant
>> this step, one scenario where a force push might be needed is if somebody
>> pushed some copyrighted code.
>>
>> Thanks
>>
>> On Thu, Apr 27, 2017 at 6:44 PM, Vlad Rozov 
>> wrote:
>>
>> I am open to both approaches - two commits or a join commit. Both have
>>> pros and cons that we may discuss. What I am strongly against are PRs
>>> that
>>> are merged without a chance for other contributors/committers to review.
>>> There should be a way to forcibly undo such commits.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 4/27/17 12:41, Pramod Immaneni wrote:
>>>
>>> My comments inline..

 On Thu, Apr 27, 2017 at 12:01 PM, Thomas Weise  wrote:

 I'm -1 on using the author field like this in Apex for the reason stated

> (it is also odd to see something like this showing up without prior
> discussion).
>
>
> I am not set on this for future commits but would like to say, do we
>
 really
 verify the author field and treat it with importance. For example, I
 don't
 think we ever check if the author is the person they say they are, check
 name, email etc. If I were to use slightly different variations of my
 name
 or email (not that I would do that) would reviewers really verify that.
 I
 also have checked that tools don't fail on reading a commit because
 author
 needs to be in a certain format. I guess contribution stats are the ones
 that will be affected but if used rarely I dont see that being a big
 problem. I can understand if we wanted to have strict requirements for
 the
 committer field.

 Thanks


 Consider adding the additional author information to the commit message.

> Thomas
>
> On Thu, Apr 27, 2017 at 11:55 AM, Pramod Immaneni <
> pra...@datatorrent.com>
> wrote:
>
> Agreed it is not regular and should only be used in special
> circumstances.
>
> One example of this is pair programming. It has been done before in
>> other
>> projects and searching on google or stackoverflow you can see how
>> other
>> people have tried to handle it
>>
>> https://bugs.eclipse.org/bugs/show_bug.cgi?id=375536
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451880
>> http://stackoverflow.com/questions/7442112/attributing-
>> a-single-commit-to-multiple-developers
>>
>> Thanks
>>
>> On Thu, Apr 27, 2017 at 9:57 AM, Thomas Weise  wrote:
>>
>> commit 9856080ede62a4529d730bcb6724c757f5010990
>>
>>> Author: Pramod Immaneni & Vlad Rozov >> >
>>> Date:   Tue Apr 18 09:37:22 2017 -0700
>>>
>>> Please don't use the author field in such a way, it leads to
>>> incorrect
>>> tracking of contributions.
>>>
>>> Either have separate commits or have one author.
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Thu, Apr 27, 2017 at 9:31 AM, Pramod Immaneni <
>>>
>>> pra...@datatorrent.com
>> wrote:
>>
>>> The issue was two different plugin models were developed, one for
>>>
 pre-launch and other for post-launch. I felt that the one built

 latter
>>>
>> was
>>
>>> better and it would be better to have a uniform interface for the

 users
>>>
>> and
>>
>>> hence asked for the changes.

 On Thu, Apr 27, 2017 at 9:05 AM, Thomas Weise 

 wrote:
>>>
>> I think the plugins feature could have benefited from better
>>
>>> original

>>> review, which would have eliminated much of the back and forth
>>
>>> after

>>> the
>>
>>> fact.

>
> On Thu, Apr 27, 2017 at 8:20 AM, Vlad Rozov <
>
> v.ro...@datatorrent.com

>>> wrote:
>>
>>> Pramod,
>
>> No, it is not a request to update the apex.apache.org (to do
>>
>> that
>
 we
>>
>> need
>>>
 to file JIRA). It is a reminder that 

Re: Programmatic log4j appender in Apex

2017-04-10 Thread Munagala Ramanath
I don't have one, I thought that was what the intent of the proposal was,
but looks like
I misunderstood. After re-reading some of the earlier responses, I
understand the
proposal better.

Ram



On Mon, Apr 10, 2017 at 5:39 PM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> I don't see a use case where an individual operators need to define a
> specific appender, can you provide one?
>
> Thank you,
>
> Vlad
>
> On 4/10/17 16:53, Munagala Ramanath wrote:
>
>> Yes, totally agree, it would be helpful to have a detailed use case and/or
>> a detailed spec
>> of the desired capabilities -- not necessarily a complete spec but with
>> enough detail to
>> understand why existing capabilities are inadequate.
>>
>> Ram
>>
>> On Mon, Apr 10, 2017 at 4:43 PM, Vlad Rozov <v.ro...@datatorrent.com>
>> wrote:
>>
>> It will be good to understand a use case where an operator needs a
>>> specific appender.
>>>
>>> IMO, an operator designer defines *what* should be logged and dev-ops
>>> team
>>> defines *where* to log.
>>>
>>> Thank you,
>>>
>>> Vlad
>>> On 4/10/17 16:27, Munagala Ramanath wrote:
>>>
>>> Yes, I understand, I was just wondering if individual operators could
>>>> define the appenders
>>>> they potentially need at compile time and then the operator callbacks
>>>> could
>>>> simply
>>>> check the desired runtime condition and add the appropriate appender.
>>>>
>>>> Or are we saying there are scenarios where we absolutely cannot create
>>>> the
>>>> appender beforehand ?
>>>>
>>>> So broadly speaking, my question is whether the combination of providing
>>>> predefined appenders
>>>> and the PropertyConfigurator capabilities meets the need.
>>>>
>>>> Ram
>>>>
>>>> On Mon, Apr 10, 2017 at 2:18 PM, Sergey Golovko <ser...@datatorrent.com
>>>> >
>>>> wrote:
>>>>
>>>> Ram,
>>>>
>>>>> Really the new appender class must extend the abstract class
>>>>> AppenderSkeleton. And in order to add a new appender programmatically
>>>>> in
>>>>> Java, some code in Apex should call the following log4j method:
>>>>>
>>>>> org.apache.log4j.Logger.getRootLogger().addAppender(Appender
>>>>> newAppender)
>>>>>
>>>>> The general idea of my proposal is "*based on some runtime parameter(s)
>>>>> to
>>>>> provide ability to create an appender instance via reflection and add
>>>>> it
>>>>> to
>>>>> the list of active log4j appenders*".
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>>
>>>>> On Mon, Apr 10, 2017 at 2:04 PM, Vlad Rozov <v.ro...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> It will require application recompilation and repackaging. The proposed
>>>>>
>>>>>> functionality is for dev-ops to be able to route application logging
>>>>>> to
>>>>>> a
>>>>>> preferred destination without recompiling applications. It is run-time
>>>>>> configuration vs compile time hardcoded appender.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>> On 4/10/17 11:23, Munagala Ramanath wrote:
>>>>>>
>>>>>> You can do it in a trivial derived class without changing the base
>>>>>> class.
>>>>>> Ram
>>>>>>
>>>>>>> On Mon, Apr 10, 2017 at 11:19 AM, Vlad Rozov <
>>>>>>> v.ro...@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Does not the proposal to use Logger.addAppender() requires
>>>>>>> modifications
>>>>>>>
>>>>>>> to used operators code?
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>> On 4/10/17 10:58, Munagala Ramanath wrote:
>>>>>>>>
>>>>>>>> People can currently do this by simply implementing the Appender
>>>>>>>&

Re: Programmatic log4j appender in Apex

2017-04-10 Thread Munagala Ramanath
http://logging.apache.org/log4j/1.2/faq.html#3.6

Log4j has the ability to dynamically reload changes to the properties file
via the
*configureAndWatch()* method of *PropertyConfigurator*. If this is already
baked in, could
devops simply change the properties file if we provide an enhanced set of
appenders
for the desired use cases ?

Ram

On Mon, Apr 10, 2017 at 2:04 PM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> It will require application recompilation and repackaging. The proposed
> functionality is for dev-ops to be able to route application logging to a
> preferred destination without recompiling applications. It is run-time
> configuration vs compile time hardcoded appender.
>
> Thank you,
>
> Vlad
> On 4/10/17 11:23, Munagala Ramanath wrote:
>
>> You can do it in a trivial derived class without changing the base class.
>>
>> Ram
>>
>> On Mon, Apr 10, 2017 at 11:19 AM, Vlad Rozov <v.ro...@datatorrent.com>
>> wrote:
>>
>> Does not the proposal to use Logger.addAppender() requires modifications
>>> to used operators code?
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On 4/10/17 10:58, Munagala Ramanath wrote:
>>>
>>> People can currently do this by simply implementing the Appender
>>>> interface
>>>> and adding it
>>>> with Logger.addAppender() in the setup method. Why do we need something
>>>> more elaborate ?
>>>>
>>>> Ram
>>>>
>>>> On Mon, Apr 10, 2017 at 10:30 AM, Sergey Golovko <
>>>> ser...@datatorrent.com>
>>>> wrote:
>>>>
>>>> The configuration of a log4j appender via log4j configuration file is a
>>>>
>>>>> static configuration that cannot be disabled/enabled and managed
>>>>> dynamically by an application designer. The programmatic approach will
>>>>> allow  an application designer to specify which of the available log4j
>>>>> appenders should be used for the specific application.
>>>>>
>>>>> It is not necessary Apex should use the predefined log4j appenders
>>>>> only.
>>>>> The log4j events contain useful but the very limited number of
>>>>> properties
>>>>> which values can be printed into output log4j sources. But based on the
>>>>> knowledge of the software product workflow, the custom defined log4j
>>>>> appender can extend a list of predefined output log events properties
>>>>> and,
>>>>> for instance for Apex, return: node, user name, application name,
>>>>> application id, container id, operator name, etc.
>>>>>
>>>>> Also the output log events that are generated by a custom defined log4j
>>>>> appender can be stored and indexed by any type of a full text search
>>>>> database. It will allow the customers and developers to simplify
>>>>> collection
>>>>> of log events statistics and searching/filtering of specific events for
>>>>> debugging and investigation.
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>>
>>>>> On Mon, Apr 10, 2017 at 6:34 AM, Vlad Rozov <v.ro...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> +1 Apex engine does not own log4j config file - it is provided either
>>>>> by
>>>>>
>>>>>> Hadoop or an application. Hadoop log4j config does not necessarily
>>>>>> meet
>>>>>> application logging requirements, but if log4j is provided by an
>>>>>> application designer, who can only specify what to log, it may not
>>>>>> meet
>>>>>> operations requirements. Dev-ops should have an ability to specify
>>>>>> where
>>>>>>
>>>>>> to
>>>>>
>>>>> log depending on the available infrastructure at run-time.
>>>>>>
>>>>>> It will be good to have an ability not only specify extra log4j
>>>>>> appenders
>>>>>> at lunch time, but also at run-time, the same way how log4j logger
>>>>>> levels
>>>>>> may be changed.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>> On 4/9/17 23:14, Priyanka Gugale wrote:
>>>>>>
>>>>>> We can always wr

Re: Programmatic log4j appender in Apex

2017-04-10 Thread Munagala Ramanath
You can do it in a trivial derived class without changing the base class.

Ram

On Mon, Apr 10, 2017 at 11:19 AM, Vlad Rozov <v.ro...@datatorrent.com>
wrote:

> Does not the proposal to use Logger.addAppender() requires modifications
> to used operators code?
>
> Thank you,
>
> Vlad
>
> On 4/10/17 10:58, Munagala Ramanath wrote:
>
>> People can currently do this by simply implementing the Appender interface
>> and adding it
>> with Logger.addAppender() in the setup method. Why do we need something
>> more elaborate ?
>>
>> Ram
>>
>> On Mon, Apr 10, 2017 at 10:30 AM, Sergey Golovko <ser...@datatorrent.com>
>> wrote:
>>
>> The configuration of a log4j appender via log4j configuration file is a
>>> static configuration that cannot be disabled/enabled and managed
>>> dynamically by an application designer. The programmatic approach will
>>> allow  an application designer to specify which of the available log4j
>>> appenders should be used for the specific application.
>>>
>>> It is not necessary Apex should use the predefined log4j appenders only.
>>> The log4j events contain useful but the very limited number of properties
>>> which values can be printed into output log4j sources. But based on the
>>> knowledge of the software product workflow, the custom defined log4j
>>> appender can extend a list of predefined output log events properties
>>> and,
>>> for instance for Apex, return: node, user name, application name,
>>> application id, container id, operator name, etc.
>>>
>>> Also the output log events that are generated by a custom defined log4j
>>> appender can be stored and indexed by any type of a full text search
>>> database. It will allow the customers and developers to simplify
>>> collection
>>> of log events statistics and searching/filtering of specific events for
>>> debugging and investigation.
>>>
>>> Thanks,
>>> Sergey
>>>
>>>
>>> On Mon, Apr 10, 2017 at 6:34 AM, Vlad Rozov <v.ro...@datatorrent.com>
>>> wrote:
>>>
>>> +1 Apex engine does not own log4j config file - it is provided either by
>>>> Hadoop or an application. Hadoop log4j config does not necessarily meet
>>>> application logging requirements, but if log4j is provided by an
>>>> application designer, who can only specify what to log, it may not meet
>>>> operations requirements. Dev-ops should have an ability to specify where
>>>>
>>> to
>>>
>>>> log depending on the available infrastructure at run-time.
>>>>
>>>> It will be good to have an ability not only specify extra log4j
>>>> appenders
>>>> at lunch time, but also at run-time, the same way how log4j logger
>>>> levels
>>>> may be changed.
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>> On 4/9/17 23:14, Priyanka Gugale wrote:
>>>>
>>>> We can always write a custom appender and add it by changing root
>>>>>
>>>> appender
>>>
>>>> in log4j config file. Can you explain how adding appender grammatically
>>>>> would help?
>>>>>
>>>>> -Priyanka
>>>>>
>>>>> On Sun, Apr 9, 2017 at 11:50 AM, Sanjay Pujare <san...@datatorrent.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>> Please give some examples and/or use cases of this programmatic log4j
>>>>>
>>>>>> appender.
>>>>>>
>>>>>> On Fri, Apr 7, 2017 at 8:40 PM, Sergey Golovko <
>>>>>> ser...@datatorrent.com
>>>>>> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>>> I'd like to add supporting of a custom defined log4j appender that
>>>>>>> can
>>>>>>> be
>>>>>>> added to Apex Application Master and Containers and be configurable
>>>>>>> programmatically.
>>>>>>>
>>>>>>> Sometimes it is not trivial to control log4j configuration via log4j
>>>>>>> properties. And I think the having of the approach to add a log4j
>>>>>>>
>>>>>>> appender
>>>>>>
>>>>>> programmatically will allow the customers and developers to plugin
>>>>>>>
>>>>>> their
>>>
>>>> own custom defined log4j appenders and be much flexible for streaming
>>>>>>> and
>>>>>>> collection of Apex log events.
>>>>>>>
>>>>>>> I assume to provide generic approach for definition of the
>>>>>>>
>>>>>> programmatic
>>>
>>>> log4j appender and to pass all configuration parameters including a
>>>>>>>
>>>>>> name
>>>
>>>> of
>>>>>>
>>>>>> the Java class with implementation of the log4j appender via system
>>>>>>>
>>>>>>> and/or
>>>>>>
>>>>>> command line properties.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sergey
>>>>>>>
>>>>>>>
>>>>>>>
>>
>>
>


-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: open/close ports and active/inactive streams

2017-04-01 Thread Munagala Ramanath
What's a use case for this ?

Ram

On Sat, Apr 1, 2017 at 8:12 AM, Vlad Rozov  wrote:

> All,
>
> Currently Apex assumes that an operator can emit on any defined output
> port and all streams defined by a DAG are active. I'd like to propose an
> ability for an operator to open and close output ports. By default all
> ports defined by an operator will be open. In the case an operator for any
> reason decides that it will not emit tuples on the output port, it may
> close it. This will make the stream inactive and the application master may
> undeploy the downstream (for that input stream) operators. If this leads to
> containers that don't have any active operators, those containers may be
> undeployed as well leading to better cluster resource utilization and
> better Apex elasticity. Later, the operator may be in a state where it
> needs to emit tuples on the closed port. In this case, it needs to re-open
> the port and wait till the stream becomes active again before emitting
> tuples on that port. Making inactive stream active again, requires the
> application master to re-allocate containers and re-deploy the downstream
> operators.
>
> It should be also possible for an application designer to mark streams as
> inactive when an application starts. This will allow the application master
> avoid reserving all containers when the application starts. Later, the port
> can be open and inactive stream become active.
>
> Thank you,
>
> Vlad
>
>


-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: RC1 documentation issues

2017-03-28 Thread Munagala Ramanath
The javadoc for 3.7 is there now -- I triggered a build via IRC.

Commit triggering was not working for me for some reason, so it is set to
build nightly.

Ram

On Mon, Mar 27, 2017 at 11:58 PM, Thomas Weise  wrote:

> The following two issues observed while preparing 3.7.0 RC1:
>
> User doc contains duplicate entries in left nav:
>
> http://apex.apache.org/docs/malhar-3.7/
>
> Javadoc wasn't created in expected location after updating the buildbot
> file:
>
> https://ci.apache.org/projects/apex-malhar/apex-
> malhar-javadoc-release-3.7/index.html
>
> Ram, can you have a look please?
>
> Thanks,
> Thomas
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: RC1 documentation issues

2017-03-28 Thread Munagala Ramanath
Fixed the duplicate entries issue:
https://github.com/apache/apex-malhar/pull/592

Looking at the other one.

Ram

On Mon, Mar 27, 2017 at 11:58 PM, Thomas Weise  wrote:

> The following two issues observed while preparing 3.7.0 RC1:
>
> User doc contains duplicate entries in left nav:
>
> http://apex.apache.org/docs/malhar-3.7/
>
> Javadoc wasn't created in expected location after updating the buildbot
> file:
>
> https://ci.apache.org/projects/apex-malhar/apex-
> malhar-javadoc-release-3.7/index.html
>
> Ram, can you have a look please?
>
> Thanks,
> Thomas
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: Python Binding implementation for design review.

2017-03-27 Thread Munagala Ramanath
Getting: "Sorry, the file you have requested does not exist."

Ram


On Sun, Mar 26, 2017 at 11:18 PM, vikram patil 
wrote:

> Hello All,
>
> I have developed python implementation allowing Apex Developers to create
> basic Apex Application using python. Still, there are some issues need to
> be resolved, I would like to start discussion for requirements and review
> for proposed implementation
>
> Please find technical design and functional specification for this one.
> https://docs.google.com/document/d/1nSrNvdUkwNVU2vS574C3GtxpjvY-
> RieKgWQ-3tVWM5E/edit#
>
> Thanks & Regards,
> Vikram
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: Local mode execution

2017-03-03 Thread Munagala Ramanath
There was a response from Thomas 2 days ago which also included a code
fragment.

Let me know if you are unable to locate it and I'll copy it here.

Ram

On Fri, Mar 3, 2017 at 9:10 AM, Ganelin, Ilya 
wrote:

> Hey all - any word on this? Would love to be able to test locally using
> the new framework. Thanks!
>
>
>
>
> 
> From: Ganelin, Ilya 
> Sent: Wednesday, March 1, 2017 12:23:06 PM
> To: dev@apex.apache.org
> Subject: Local mode execution
>
> Hi all, I’ve returned to writing Apex apps after a hiatus, it seems that
> the LocalMode is now deprecated, having been replaced by the Launcher
> interface. Is there an example or documentation anywhere of using this new
> API?
>
> Please let me know, thanks!
>
> - Ilya Ganelin
> [id:image001.png@01D1F7A4.F3D42980]
>
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: example applications in malhar

2017-02-23 Thread Munagala Ramanath
+1 for renaming to "examples"

Ram

On Thu, Feb 23, 2017 at 9:12 AM, Lakshmi Velineni 
wrote:

> I am ready to bring the examples over into the demos folder. I was
> wondering if anybody has any input on Thomas's suggestion to rename the
> demos folder to examples. I would rather do that first and then bring the
> examples over instead of doing it the other way around as that would lead
> to refactoring the new examples again.
>
> Thanks
>
> On Wed, Jan 25, 2017 at 8:12 AM, Lakshmi Velineni  >
> wrote:
>
> > Hi,
> >
> > Since the examples have little history I was planning to have two
> > commits for every example, one for the code as the primary author of
> > the example and another containing pom.xml and other changes to make
> > it work under malhar.
> >
> > Thanks
> >
> > On Wed, Nov 2, 2016 at 9:49 PM, Lakshmi Velineni
> >  wrote:
> > > Thanks for the suggestions and I am working on the process to migrate
> the
> > > examples with the guidelines you mentioned. I will send out a list of
> > > examples and the destination modules very soon.
> > >
> > >
> > > On Thu, Oct 27, 2016 at 1:43 PM, Thomas Weise 
> > > wrote:
> > >>
> > >> Maybe a good first step would be to identify which examples to bring
> > over
> > >> and where appropriate how to structure them in Malhar (for example, I
> > see
> > >> multiple hdfs related apps that could go into the same Maven module).
> > >>
> > >>
> > >> On Tue, Oct 25, 2016 at 1:00 PM, Thomas Weise  wrote:
> > >>
> > >> > That would be great. There are a few things to consider when working
> > on
> > >> > it:
> > >> >
> > >> > * preserve attribtion
> > >> > * ensure there is a test that runs the application in the CI
> > >> > * check that dependencies are compatible license
> > >> > * maybe extract common boilerplate code from pom.xml
> > >> >
> > >> > etc.
> > >> >
> > >> > Existing examples are under https://github.com/apache/
> > >> > apex-malhar/tree/master/demos
> > >> >
> > >> > Perhaps we should rename it to "examples"
> > >> >
> > >> > I also propose that each app has a README and we add those for
> > existing
> > >> > apps as well.
> > >> >
> > >> > Thanks,
> > >> > Thomas
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Oct 25, 2016 at 12:49 PM, Lakshmi Velineni <
> > >> > laks...@datatorrent.com> wrote:
> > >> >
> > >> >>   Can i work on this?
> > >> >>
> > >> >> Thanks
> > >> >> Lakshmi Prasanna
> > >> >>
> > >> >> On Mon, Sep 12, 2016 at 9:41 PM, Ashwin Chandra Putta <
> > >> >> ashwinchand...@gmail.com> wrote:
> > >> >>
> > >> >> > Here is the JIRA:
> > >> >> > https://issues.apache.org/jira/browse/APEXMALHAR-2233
> > >> >> >
> > >> >> > On Tue, Sep 6, 2016 at 10:20 PM, Amol Kekre <
> a...@datatorrent.com>
> > >> >> wrote:
> > >> >> >
> > >> >> > > Good idea to consolidate them into Malhar. We should bring in
> as
> > >> >> > > many
> > >> >> > from
> > >> >> > > this gitHub as possible.
> > >> >> > >
> > >> >> > > Thks
> > >> >> > > Amol
> > >> >> > >
> > >> >> > >
> > >> >> > > On Tue, Sep 6, 2016 at 6:02 PM, Thomas Weise
> > >> >> > > 
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > > > I'm also for consolidating these different example locations.
> > We
> > >> >> should
> > >> >> > > > also look if all of it is still relevant.
> > >> >> > > >
> > >> >> > > > The stuff from the DT repository needs to be brought into
> shape
> > >> >> > > > wrt
> > >> >> > > > licensing, checkstyle, CI support etc.
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > On Tue, Sep 6, 2016 at 4:34 PM, Pramod Immaneni <
> > >> >> > pra...@datatorrent.com>
> > >> >> > > > wrote:
> > >> >> > > >
> > >> >> > > > > Sounds like a good idea. How about merging demos with apps
> as
> > >> >> well?
> > >> >> > > > >
> > >> >> > > > > On Tue, Sep 6, 2016 at 4:30 PM, Ashwin Chandra Putta <
> > >> >> > > > > ashwinchand...@gmail.com> wrote:
> > >> >> > > > >
> > >> >> > > > > > Hi All,
> > >> >> > > > > >
> > >> >> > > > > > We have a lot of examples for apex malhar operators in
> the
> > >> >> > following
> > >> >> > > > > > repository which resides outside of malhar.
> > >> >> > > > > > https://github.com/DataTorrent/examples/tree/
> > master/tutorials
> > >> >> > > > > >
> > >> >> > > > > > Now that it has grown quite a bit, does it make sense to
> > >> >> > > > > > bring
> > >> >> some
> > >> >> > > of
> > >> >> > > > > the
> > >> >> > > > > > most common examples to malhar repository? Probably under
> > >> >> > > > > > apps
> > >> >> > > > directory?
> > >> >> > > > > >
> > >> >> > > > > > That way folks looking at malhar repository will have
> some
> > >> >> samples
> > >> >> > to
> > >> >> > > > > look
> > >> >> > > > > > at without having to search elsewhere.
> > >> >> > > > > >
> > >> >> > > > > > --
> > >> >> > > > > >
> > >> >> > > > > > Regards,
> > >> >> > > > > > Ashwin.
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > 

Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Munagala Ramanath
Please note that tuples should not be emitted by any thread other than the
main operator thread.

A common pattern is to use a thread-safe queue and have worker threads
enqueue
tuples there; the main operator thread then pulls tuples from the queue and
emits them.

Ram

On Tue, Feb 21, 2017 at 10:05 AM, Sunil Parmar 
wrote:

> Hi there,
> We have the following setup:
>
>- we have a generic operator that’s processing tuples in its input port
>- in the input port’s process method, we check for a condition, and:
>   - if the condition is met, the tuple is emitted to the next
>   operator right away (in the process method)
>   - Otherwise, if the condition is not met, we store the tuple  in
>   some cache and we use some threads that periodically check the 
> condition to
>   become true. Once the condition is true, the threads call the emit 
> method
>   on the stored tuples.
>
> With this setup, we occasionally encounter the following error:
> 2017-02-15 17:29:09,364 ERROR com.datatorrent.stram.engine.GenericNode:
> Catastrophic Error: Out of sequence BEGIN_WINDOW tuple 58a404613b7f on
> port transformedJSON while expecting 58a404613b7e
>
> Is there a way to make the above work correctly?
> If not, can you recommend a better way of doing this?
> How can we ensure window assignment is done synchronously before emitting
> tuples ?
>
> Thanks very much in advance…
> -allan
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Re: Infinite loop in CircularBuffer constructor (corner case)

2017-02-09 Thread Munagala Ramanath
Created https://github.com/DataTorrent/Netlet/issues/60 and
pull request https://github.com/DataTorrent/Netlet/pull/59

Ram

On Sun, Jan 29, 2017 at 6:42 PM, Vlad Rozov <v.ro...@datatorrent.com> wrote:

> Netlet is not an Apache project. Please open issue on github
> https://github.com/DataTorrent/netlet/issues.
>
> Thank you,
>
> Vlad
>
> On 12/22/16 18:02, Munagala Ramanath wrote:
>
>> Actually, this will happen whenever the parameter n satisfies: 2**30 < n
>> <=
>> Integer.MAX_VALUE
>>
>> Ram
>>
>> On Thu, Dec 22, 2016 at 5:34 PM, Munagala Ramanath <r...@datatorrent.com>
>> wrote:
>>
>> In Netlet CircularBuffer constructor, we have an infinite loop if the
>>> first parameter (*n*)
>>> is *Integer.MAX_VALUE* because the loop counter left-shifts 1 till it
>>> drops into the sign
>>> bit at which point the value is negative and fails the loop exit test.
>>> The
>>> next left shift
>>> yields 0 which, of course, stays that way forever; here is the fragment:
>>>
>>> *int i = 1;*
>>> *while (i < n) {*
>>> *  i <<= 1;*
>>> *}*
>>>
>>> Ram
>>>
>>>
>


Re: Setting JAVA Serializer to be used at App Level.

2017-02-07 Thread Munagala Ramanath
There is some discussion of Kryo serializer issues at:
http://docs.datatorrent.com/troubleshooting/
under the heading "Application throwing following Kryo exception."

Ram

On Tue, Feb 7, 2017 at 12:22 AM, Ambarish Pande <
ambarish.pande2...@gmail.com> wrote:

> I understand that it affects performance, but for now the library that I
> am trying to use has some issues with Kryo. So I have no choice but to use
> Java Serializer for serializing objects of that library.
>
> Okay, I can use this too. Currently I am using FieldSerializer annotation.
>
> I'll keep in mind to post such questions on users@apex.
>
> Thank You.
>
>
> On Tue, Feb 7, 2017 at 1:40 AM, Vlad Rozov 
> wrote:
>
>> Kryo allows to delegate serialization to Java. Add
>> "@DefaultSerializer(JavaSerializer.class)" to a tuple class definition.
>> As Santosh already mentioned, using Java serializer is not recommended as
>> it affect performance.
>>
>> I would recommend in the future to post similar questions on the
>> user@apex. dev@apex is used to discuss development of the platform and
>> the library.
>>
>> Thank you,
>>
>> Vlad
>>
>> *Join us at Apex Big Data World-San Jose
>> , April 4, 2017*
>> [image: http://www.apexbigdata.com/san-jose-register.html]
>> 
>> On 2/5/17 22:34, Ambarish Pande wrote:
>>
>> This is exactly what i wanted.
>> Thank You.
>>
>> On Mon, Feb 6, 2017 at 11:35 AM, Hitesh Kapoor  
>> 
>> wrote:
>>
>>
>> Hi Ambarish,
>>
>> Yes you can plug in your own serializer. You will have to set the
>> "STREAM_CODEC" port attribute to achieve the same.
>> You can refer xmlParserApplication from examples repo 
>> (https://github.com/DataTorrent/examples).
>>
>> Regards,
>> Hitesh
>>
>>
>> On Mon, Feb 6, 2017 at 11:07 AM, Ambarish Pande 
>>  wrote:
>>
>>
>> Hello,
>>
>> Is there a way to set up JAVA Serializer as the default serializer to be
>> used for a particular application. Currently, Kryo is the default
>> serializer and the library that I am using has compatibility issues with
>> Kryo.
>>
>> Thank You.
>>
>>
>>
>>
>


Re: relevant conferences and meetups on website

2017-02-03 Thread Munagala Ramanath
Maybe they are too big and being stripped out by the mail programs ?
'cause I'm still not seeing them.

Ram

On Fri, Feb 3, 2017 at 12:15 PM, Michelle Xiao 
wrote:

> Here are the screenshots.
>
>
> Thanks,
> Michelle
>
>
>
> On Fri, Feb 3, 2017 at 11:56 AM, Pramod Immaneni 
> wrote:
>
>> Michelle,
>>
>> Could you attach the screenshots.
>>
>> Thanks
>>
>>
>> On Fri, Feb 3, 2017 at 7:15 AM, Thomas Weise  wrote:
>>
>> > I don't see a screen shot attached to the email.
>> >
>> > Thomas
>> >
>> > On Thu, Feb 2, 2017 at 4:30 PM, Michelle Xiao > >
>> > wrote:
>> >
>> > > I worked locally to add events section on Apex home page. Attached
>> are a
>> > > couple of screenshots show how Apex home page look like with this
>> events
>> > > section. We can reduce the height of "Tweets by @ApacheApex" section
>> when
>> > > embedding it, to make sure the left Announcement section is not
>> shorter
>> > > than the right side.
>> > >
>> > > Thanks,
>> > > Michelle
>> > >
>> > > On Thu, Feb 2, 2017 at 3:48 PM, Amol Kekre 
>> wrote:
>> > >
>> > >>
>> > >> Perfect. Sorry for my paranoia
>> > >>
>> > >> Thks
>> > >> Amol
>> > >>
>> > >> *Join us at Apex Big Data World-San Jose
>> > >> , April 4, 2017!*
>> > >> [image: http://www.apexbigdata.com/san-jose-register.html]
>> > >> 
>> > >>
>> > >> On Thu, Feb 2, 2017 at 3:31 PM, Pramod Immaneni <
>> pra...@datatorrent.com
>> > >
>> > >> wrote:
>> > >>
>> > >>> Not auto generated. I want to be able to send a sample to dev list
>> so
>> > >>> that folks can look at it and approve. Michelle is trying to do
>> that.
>> > >>>
>> > >>> Thanks
>> > >>>
>> > >>> On Thu, Feb 2, 2017 at 12:54 PM, Amol Kekre 
>> > >>> wrote:
>> > >>>
>> > 
>> >  Pramod, Michelle,
>> >  This cannot be a page that is auto-generated. I am a little
>> paranoid,
>> >  because auto-generation will get all events and then this will be
>> shut
>> >  down. It will need hand creating the events on the page, and then
>> > moving
>> >  them into "past events" once a month/quarter or so.
>> > 
>> >  Thks
>> >  Amol
>> > 
>> > 
>> >  *Join us at Apex Big Data World-San Jose
>> >  , April 4, 2017!*
>> >  [image: http://www.apexbigdata.com/san-jose-register.html]
>> >  
>> > 
>> >  On Thu, Feb 2, 2017 at 12:52 PM, Michelle Xiao <
>> >  miche...@datatorrent.com> wrote:
>> > 
>> > > Sure, I will.
>> > >
>> > > On Thu, Feb 2, 2017 at 12:48 PM, Pramod Immaneni <
>> > > pra...@datatorrent.com> wrote:
>> > >
>> > >> Could you send an email with screenshot Michelle to the dev
>> group in
>> > >> the email thread where we are discussing this when you are done?
>> > >>
>> > >> Thanks
>> > >>
>> > >> On Thu, Feb 2, 2017 at 12:46 PM, Michelle Xiao <
>> > >> miche...@datatorrent.com> wrote:
>> > >>
>> > >>> Hi Pramod,
>> > >>>
>> > >>> I will work on my local server to add events section on Apex
>> home
>> > >>> page, will show you how it looks like when I am done.
>> > >>>
>> > >>> Thanks,
>> > >>> Michelle
>> > >>>
>> > >>> On Thu, Feb 2, 2017 at 7:22 AM, Pramod Immaneni <
>> > >>> pra...@datatorrent.com> wrote:
>> > >>>
>> >  Hey Guys,
>> > 
>> >  I had started a discussion on apex list to put conference
>> details
>> >  and meetup events on the apex website. So far the community has
>> > agreed on
>> >  putting up the conference details on Community -> Events ->
>> > Meetup Groups
>> >  section. The discussion to put conference link and other meetup
>> > events
>> >  directly on the main page is ongoing as you can see from the
>> > discussion
>> >  below. Would it be possible to see how the page will look like
>> if
>> > we put an
>> >  events section on the page next to the tweet section or at the
>> > bottom of
>> >  the page?
>> > 
>> >  Thanks
>> > 
>> >  -- Forwarded message --
>> >  From: Ashwin Chandra Putta 
>> >  Date: Wed, Feb 1, 2017 at 4:45 PM
>> >  Subject: Re: relevant conferences and meetups on website
>> >  To: dev@apex.apache.org
>> > 
>> > 
>> >  +1 for community volunteers to manually maintain the list.
>> >  +1 for adding the events to the main page.
>> > 
>> >  The community as a whole is working toward Apex adoption, we
>> > should
>> >  collectively publish all relevant events that help toward the
>> >  common goal.
>> >  All Apex events should be treated equally 

Re: One Yarn with Multiple Apex Applications

2017-01-30 Thread Munagala Ramanath
Are you running on the sandbox (if so, what version ?) or your own cluster ?

In either case, please check the following configuration item in
capacity-scheduler.xml:

 
yarn.scheduler.capacity.maximum-am-resource-percent
0.1

  Maximum percent of resources in the cluster which can be used to run
  application masters i.e. controls number of concurrent running
  applications.

  

Try increasing the value from 0.1 to 0.5, restart YARN and try launching
multiple applications again.

Ram

On Mon, Jan 30, 2017 at 12:54 AM, Santhosh Kumari G <
santhosh.kum...@qolsys.com> wrote:

> Hi,
>
>   Can we launch more than one (multiple) apex engine in one node with
> multiple terminals and one yarn running. If yes, what is the process.
>
> I tried launching 2 apex apps with 2 apex engine's. First apex app is
> running without any issue using the port 8042 configured in
> yarn-default.xml Then I tried to launch 2nd app it is saying accepted but
> not running as 8042 port is already in use. When I killed the 1st app,2nd
> app is getting launched automatically.
>
> So can we manage one yarn with multiple apex engine app's?.
>
> Thank you,
> Santhosh Kumari G.
>


Re: Contribution Process before PR

2017-01-15 Thread Munagala Ramanath
Yes, I think an initial discussion that includes some or all of the
following
would be invaluable both for feature implementations and for bug fixes:
1. Discussion of current implementation and how/why it fails to meet
current need.
2. Possible approaches and tradeoffs (if any) for each.
3. Recommended best approach if author has reached such a conclusion.

However, some contributors may want to throw in a prototype implementation
as a
PR as part of the discussion since code has a clarity and precision that,
ultimately, cannot be matched by natural language. We should allow and even
encourage this since it helps, in many cases, to avoid a long chain of Q
It also helps others to try out working code to make their own assessment
of its strengths and weaknesses. Additionally, it helps contributors whose
English may not be as strong as their coding skills, to participate in the
discussion.

In such cases, it should be clearly understood that the PR is potentially
throwaway code, intended to further the discussion and not necessarily
intended to be merged.

Ram

On Mon, Jan 16, 2017 at 9:20 AM, Thomas Weise  wrote:

> Hi,
>
> I want to propose additions to the contributor guidelines that place
> stronger emphasis on open collaboration and the early part of the
> contribution process.
>
> Specifically, I would like to suggest that *thought process* and *design
> discussion* are more important than the final code produced. It is
> necessary to develop the community and invest in the future of the project.
>
> I start this discussion based on observation over time. I have seen cases
> (non trivial changes) where code and JIRAs appear at the same time, where
> the big picture is discussed after the PR is already open, or where
> information that would be valuable to other contributors or users isn't on
> record.
>
> Let's consider a non-trivial change or a feature. It would normally start
> with engagement on the mailing list to ensure time is well spent and the
> proposal is welcomed by the community, does not conflict with other
> initiatives etc.
>
> Once that is cleared, we would want to think about design, the how in the
> larger picture. In many cases that would involve discussion, questions,
> suggestions, consensus building towards agreed approach. Or maybe it is
> done through prototyping. In any case, before a PR is raised, it will be
> good to have as prerequisite that *thought process and approach have been
> documented*. I would prefer to see that on the JIRA, what do others think?
>
> Benefits:
>
> * Contributor does not waste time and there is no frustration due to a PR
> being turned down for reasons that could be avoided with upfront
> communication.
>
> * Contributor benefits from suggestions, questions, guidance of those with
> in depth knowledge of particular areas.
>
> * Other community members have an opportunity to learn from discussion, the
> knowledge base broadens.
>
> * Information gets indexed, user later looking at JIRAs will find valuable
> information on how certain problems were solved that they would never
> obtain from a PR.
>
> The ASF and "Apache Way", a read for the bigger picture with more links in
> it:
> http://krzysztof-sobkowiak.net/blog/celebrating-17-years-
> of-the-apache-software-foundation/
>
> Looking forward to feedback and discussion,
> Thomas
>


Re: Infinite loop in CircularBuffer constructor (corner case)

2016-12-22 Thread Munagala Ramanath
Actually, this will happen whenever the parameter n satisfies: 2**30 < n <=
Integer.MAX_VALUE

Ram

On Thu, Dec 22, 2016 at 5:34 PM, Munagala Ramanath <r...@datatorrent.com>
wrote:

> In Netlet CircularBuffer constructor, we have an infinite loop if the
> first parameter (*n*)
> is *Integer.MAX_VALUE* because the loop counter left-shifts 1 till it
> drops into the sign
> bit at which point the value is negative and fails the loop exit test. The
> next left shift
> yields 0 which, of course, stays that way forever; here is the fragment:
>
> *int i = 1;*
> *while (i < n) {*
> *  i <<= 1;*
> *}*
>
> Ram
>


Infinite loop in CircularBuffer constructor (corner case)

2016-12-22 Thread Munagala Ramanath
In Netlet CircularBuffer constructor, we have an infinite loop if the first
parameter (*n*)
is *Integer.MAX_VALUE* because the loop counter left-shifts 1 till it drops
into the sign
bit at which point the value is negative and fails the loop exit test. The
next left shift
yields 0 which, of course, stays that way forever; here is the fragment:

*int i = 1;*
*while (i < n) {*
*  i <<= 1;*
*}*

Ram


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
Agree it should be via YARN; the poison pill would be the final barrier in
the event
all other mechanisms have failed -- sort of like an API call which
documents that a parameter
should be non-null but nevertheless checks it internally and throws an
exception if it finds null.

Additionally, it also helps teams that do not have control over YARN
configuration.

Ram

On Fri, Dec 2, 2016 at 7:15 AM, Amol Kekre <a...@datatorrent.com> wrote:

> Stram exclude node should be via Yarn, poison pill is not a good way as it
> induces a terminate for wrong reasons.
>
> Thks
> Amol
>
>
> On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > Could STRAM include a poison pill where it simply exits with diagnostic
> if
> > its host name is blacklisted ?
> >
> > Ram
> >
> > On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre <a...@datatorrent.com>
> wrote:
> >
> > > Yarn will deploy AM (Stram) on a node of its choice, therey rendering
> any
> > > attribute within the app un-enforceable in terms of not deploying
> master
> > on
> > > a node.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve <mili...@gmail.com>
> wrote:
> > >
> > > > Additionally, this would apply to Stram as well i.e. the master
> should
> > > also
> > > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > > operators.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve <mili...@gmail.com>
> > wrote:
> > > >
> > > > > My previous mail explains it, but just forgot to add : -1 to cover
> > this
> > > > > under anti affinity.
> > > > >
> > > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve <mili...@gmail.com>
> > > wrote:
> > > > >
> > > > >> While it is possible to extend anti-affinity to take care of
> this, I
> > > > feel
> > > > >> it will cause confusion from a user perspective. As a user, when I
> > > think
> > > > >> about anti-affinity, what comes to mind right away is a relative
> > > > relation
> > > > >> between operators.
> > > > >>
> > > > >> On the other hand, the current ask is not that, but a relation at
> an
> > > > >> application level w.r.t. a node. (Further, we might even think of
> > > > extending
> > > > >> this at an operator level - which would mean do not deploy an
> > operator
> > > > on a
> > > > >> particular node)
> > > > >>
> > > > >> We would be better off clearly articulating and allowing users to
> > > > >> configure it seperately as against using anti-affinity.
> > > > >>
> > > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > > bhup...@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Okay, I think that serves an alternate purpose of detecting any
> > newly
> > > > >>> gone
> > > > >>> bad node and excluding it.
> > > > >>>
> > > > >>> +1 for covering the original scenario under anti-affinity.
> > > > >>>
> > > > >>> ~ Bhupesh
> > > > >>>
> > > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > > r...@datatorrent.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > It only takes effect after failures -- no way to exclude from
> the
> > > > >>> get-go.
> > > > >>> >
> > > > >>> > Ram
> > > > >>> >
> > > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> > bhup...@datatorrent.com>
> > > > >>> wrote:
> > > > >>> >
> > > > >>> > > As suggested by Sandesh, the parameter
> > > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > > exactly
> > > > >>> > what
> > > > >>> > > is needed.
> > > > >>> > > Why would this not work?
> > > > >>> > >
> > > > >>> > > ~ Bhupesh
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> ~Milind bee at gee mail dot com
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ~Milind bee at gee mail dot com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
The OP is claiming (in the comment to the first response) that he actually
tried the
proposed solution and it did not work for him and shows the RM code fragment
that is clobbering his preference.

Ram

On Fri, Dec 2, 2016 at 12:17 AM, Sandesh Hegde <sand...@datatorrent.com>
wrote:

> Yarn allows the AppMaster to run on the selected node, Apex shouldn't
> select the blacklisted nodes, so it is possible to achieve not running the
> Apex containers on certain nodes.
>
> http://stackoverflow.com/questions/29302659/run-my-own-
> application-master-on-a-specific-node-in-a-yarn-cluster
>
>
> On Thu, Dec 1, 2016 at 11:52 PM Amol Kekre <a...@datatorrent.com> wrote:
>
> > Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> > attribute within the app un-enforceable in terms of not deploying master
> on
> > a node.
> >
> > Thks
> > Amol
> >
> >
> > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve <mili...@gmail.com> wrote:
> >
> > > Additionally, this would apply to Stram as well i.e. the master should
> > also
> > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > operators.
> > >
> > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve <mili...@gmail.com>
> wrote:
> > >
> > > > My previous mail explains it, but just forgot to add : -1 to cover
> this
> > > > under anti affinity.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve <mili...@gmail.com>
> > wrote:
> > > >
> > > >> While it is possible to extend anti-affinity to take care of this, I
> > > feel
> > > >> it will cause confusion from a user perspective. As a user, when I
> > think
> > > >> about anti-affinity, what comes to mind right away is a relative
> > > relation
> > > >> between operators.
> > > >>
> > > >> On the other hand, the current ask is not that, but a relation at an
> > > >> application level w.r.t. a node. (Further, we might even think of
> > > extending
> > > >> this at an operator level - which would mean do not deploy an
> operator
> > > on a
> > > >> particular node)
> > > >>
> > > >> We would be better off clearly articulating and allowing users to
> > > >> configure it seperately as against using anti-affinity.
> > > >>
> > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > bhup...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >>> Okay, I think that serves an alternate purpose of detecting any
> newly
> > > >>> gone
> > > >>> bad node and excluding it.
> > > >>>
> > > >>> +1 for covering the original scenario under anti-affinity.
> > > >>>
> > > >>> ~ Bhupesh
> > > >>>
> > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > r...@datatorrent.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > It only takes effect after failures -- no way to exclude from the
> > > >>> get-go.
> > > >>> >
> > > >>> > Ram
> > > >>> >
> > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> bhup...@datatorrent.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > As suggested by Sandesh, the parameter
> > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > exactly
> > > >>> > what
> > > >>> > > is needed.
> > > >>> > > Why would this not work?
> > > >>> > >
> > > >>> > > ~ Bhupesh
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> ~Milind bee at gee mail dot com
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
Could STRAM include a poison pill where it simply exits with diagnostic if
its host name is blacklisted ?

Ram

On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre <a...@datatorrent.com> wrote:

> Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> attribute within the app un-enforceable in terms of not deploying master on
> a node.
>
> Thks
> Amol
>
>
> On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve <mili...@gmail.com> wrote:
>
> > Additionally, this would apply to Stram as well i.e. the master should
> also
> > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > operators.
> >
> > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve <mili...@gmail.com> wrote:
> >
> > > My previous mail explains it, but just forgot to add : -1 to cover this
> > > under anti affinity.
> > >
> > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve <mili...@gmail.com>
> wrote:
> > >
> > >> While it is possible to extend anti-affinity to take care of this, I
> > feel
> > >> it will cause confusion from a user perspective. As a user, when I
> think
> > >> about anti-affinity, what comes to mind right away is a relative
> > relation
> > >> between operators.
> > >>
> > >> On the other hand, the current ask is not that, but a relation at an
> > >> application level w.r.t. a node. (Further, we might even think of
> > extending
> > >> this at an operator level - which would mean do not deploy an operator
> > on a
> > >> particular node)
> > >>
> > >> We would be better off clearly articulating and allowing users to
> > >> configure it seperately as against using anti-affinity.
> > >>
> > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > bhup...@datatorrent.com>
> > >> wrote:
> > >>
> > >>> Okay, I think that serves an alternate purpose of detecting any newly
> > >>> gone
> > >>> bad node and excluding it.
> > >>>
> > >>> +1 for covering the original scenario under anti-affinity.
> > >>>
> > >>> ~ Bhupesh
> > >>>
> > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> r...@datatorrent.com
> > >
> > >>> wrote:
> > >>>
> > >>> > It only takes effect after failures -- no way to exclude from the
> > >>> get-go.
> > >>> >
> > >>> > Ram
> > >>> >
> > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <bhup...@datatorrent.com>
> > >>> wrote:
> > >>> >
> > >>> > > As suggested by Sandesh, the parameter
> > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > exactly
> > >>> > what
> > >>> > > is needed.
> > >>> > > Why would this not work?
> > >>> > >
> > >>> > > ~ Bhupesh
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> ~Milind bee at gee mail dot com
> > >>
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Munagala Ramanath
It only takes effect after failures -- no way to exclude from the get-go.

Ram

On Dec 1, 2016 7:15 PM, "Bhupesh Chawda"  wrote:

> As suggested by Sandesh, the parameter
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly what
> is needed.
> Why would this not work?
>
> ~ Bhupesh
>


Re: Apex malhar Guava dependency

2016-12-01 Thread Munagala Ramanath
Bright,

We have this in contrib/pom.xml which I think is causing the problem:


  com.google.guava
  guava
  16.0.1
  provided
  true


whereas the hadoop-common 2.2.0 depends on 11.0.2

Ram

On Thu, Dec 1, 2016 at 4:11 PM, Bright Chen  wrote:

> Hi All,
> I got an runtime exception (In unit test in local mode) when trying to test
> Guava BloomFilter in malhar. the exception is as following. It seem the
> malhar depended on Guava 11.0.2 at compile time while depended on higher
> version at runtime. Which caused this compatible issue.
>
> Anyone got same issue or any idea how to solve or get around this issue?
>
> java.lang.AbstractMethodError:
> org.apache.apex.malhar.lib.state.managed.SliceFunnel.
> funnel(Ljava/lang/Object;Lcom/google/common/hash/PrimitiveSink;)V
>
> at
> com.google.common.hash.AbstractStreamingHashFunction$
> AbstractStreamingHasher.putObject(
> AbstractStreamingHashFunction.java:223)
>
> at com.google.common.hash.AbstractStreamingHashFunction.hashObject(
> AbstractStreamingHashFunction.java:37)
>
> thanks
> Bright
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.

Meanwhile, if they notice that a particular node is consistently causing
problems for their
app, having a simple way to exclude it would be very helpful since it gives
them a way
to bypass communication and process issues within their own organization.

Ram

On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
But then, what's the solution to the 2 problem scenarios that Milind
describes ?

Ram

On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare 
wrote:

> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of these tasks in Apex would
> be very useful.
>
> I agree with Amol that apps should be node neutral. Resource management in
> Yarn together with fault tolerance in Apex should minimize the need for
> this feature although I am sure one can find use cases.
>
>
> On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
>
> We do have this feature in Yarn, but that applies to all applications.
> I am
> not sure if Yarn has anti-affinity. This feature may be used, but in
> general there is danger is an application taking over resource
> allocation.
> Another quirk is that big data apps should ideally be node-neutral.
> This is
> a good idea, if we are able to carve out something where need is app
> specific.
>
> Thks
> Amol
>
>
> On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve 
> wrote:
>
> > We have seen 2 cases mentioned below, where, it would have been nice
> if
> > Apex allowed us to exclude a node from the cluster for an
> application.
> >
> > 1. A node in the cluster had gone bad (was randomly rebooting) and
> so an
> > Apex app should not use it - other apps can use it as they were
> batch jobs.
> > 2. A node is being used for a mission critical app (Could be an Apex
> app
> > itself), but another Apex app which is mission critical should not
> be using
> > resources on that node.
> >
> > Can we have a way in which, Stram and YARN can coordinate between
> each
> > other to not use a set of nodes for the application. It an be done
> in 2 way
> > s-
> >
> > 1. Have a list of "exclude" nodes with Stram- when YARN allcates
> resources
> > on either of these, STRAM rejects and gets resources allocated again
> frm
> > YARN
> > 2. Have a list of nodes that can be used for an app - This can be a
> part of
> > config. Hwever, I don't think this would be a right way to do so as
> we will
> > need support from YARN as well. Further, this might be difficult to
> change
> > at runtim if need be.
> >
> > Any thoughts?
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
>


Re: Visitor API for DAG

2016-11-17 Thread Munagala Ramanath
As Tushar mentions, properties/attributes can be injected from external
sources. We've
already had multiple questions on the mailing list asking how to do this.

Ram

On Thu, Nov 17, 2016 at 10:05 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> There is a risk if the user written code blocks the thread or crashes the
> process. What are the real life examples of this use case?
>
>
> On 11/17/16, 9:21 AM, "amol kekre" <amolhke...@gmail.com> wrote:
>
> +1. Opening up the API for users to put in their own code is good. In
> general we should enable users to register their code in a lot of
> scenerios.
>
> Thks
> Amol
>
> On Thu, Nov 17, 2016 at 9:06 AM, Tushar Gosavi <tus...@datatorrent.com
> >
> wrote:
>
> > Yes, It could happen after current DAG validation and before the
> > application master is launched.
> >
> > - Tushar.
> >
> >
> > On Thu, Nov 17, 2016 at 8:32 PM, Munagala Ramanath <
> r...@datatorrent.com>
> > wrote:
> > > When would the visits happen ? Just before normal validation ?
> > >
> > > Ram
> > >
> > > On Wed, Nov 16, 2016 at 9:50 PM, Tushar Gosavi <tus...@apache.org>
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> How about adding visitor like API for DAG in Apex, and an api to
> > >> register visitor for the DAG.
> > >> Possible use cases are
> > >> -  Validator visitor which could validate the dag
> > >> -  Visitor to inject properties/attribute in the operator/streams
> from
> > >> some external sources.
> > >> -  Platform does not support validation of individual operators.
> > >> developer could write a validator visitor which would call
> validate
> > >> function of operator if it implements Validator interface.
> > >> - generate output schema based on operator config and input
> schema,
> > >> and set the schema on output stream.
> > >>
> > >> Sample API :
> > >>
> > >> dag.registerVisitor(DAGVisitor visitor);
> > >>
> > >> Call order of visitorFunctions.
> > >> - preVisitDAG(Attributes) // dag attributes
> > >>   for all operators
> > >>   - visitOperator(OperatorMeta meta) // access to operator, name,
> > >> attributes, properties
> > >>  ports
> > >>   - visitStream(StreamMeta meta) // access to
> > >> stream/name/attributes/properties/ports
> > >> - postVisitDAG()
> > >>
> > >> Regards,
> > >> -Tushar.
> > >>
> >
>
>
>
>


Re: Visitor API for DAG

2016-11-17 Thread Munagala Ramanath
When would the visits happen ? Just before normal validation ?

Ram

On Wed, Nov 16, 2016 at 9:50 PM, Tushar Gosavi  wrote:

> Hi All,
>
> How about adding visitor like API for DAG in Apex, and an api to
> register visitor for the DAG.
> Possible use cases are
> -  Validator visitor which could validate the dag
> -  Visitor to inject properties/attribute in the operator/streams from
> some external sources.
> -  Platform does not support validation of individual operators.
> developer could write a validator visitor which would call validate
> function of operator if it implements Validator interface.
> - generate output schema based on operator config and input schema,
> and set the schema on output stream.
>
> Sample API :
>
> dag.registerVisitor(DAGVisitor visitor);
>
> Call order of visitorFunctions.
> - preVisitDAG(Attributes) // dag attributes
>   for all operators
>   - visitOperator(OperatorMeta meta) // access to operator, name,
> attributes, properties
>  ports
>   - visitStream(StreamMeta meta) // access to
> stream/name/attributes/properties/ports
> - postVisitDAG()
>
> Regards,
> -Tushar.
>


Re: Need help to initialize a list from properties.xml

2016-11-07 Thread Munagala Ramanath
Try changing *setFieldInfo* to* setFieldInfoItem*

Ram

On Mon, Nov 7, 2016 at 4:23 AM, Hitesh Kapoor 
wrote:

> Hi All,
>
>
> Currently in JdbcPOJOInsertOuput operator we cannot configure JdbcFieldInfo
> via properties.xml and the user has to do the necessary coding in his
> application.
> To start solving this issue I followed the steps mentioned on
> http://docs.datatorrent.com/application_packages/#operator-properties
>
> And added the following code in AbstractJdbcPOJOOutputOperator (just for
> learning/testing)
>
> public void setFieldInfo(int index, String value)
>   {
> LOG.info("In setting field info");
> JdbcFieldInfo jvalue = new JdbcFieldInfo();
> StringTokenizer st = new StringTokenizer(value);
> jvalue.setColumnName(st.nextToken());
> jvalue.setPojoFieldExpression(st.nextToken());
> jvalue.setType(FieldInfo.SupportType.valueOf(st.nextToken()));
> jvalue.setSqlType(Integer.parseInt(st.nextToken()));
>
> final int need = index - fieldInfos.size() + 1;
> for (int i = 0; i < need; i++) {
>   fieldInfos.add(null);
> }
> fieldInfos.set(index, jvalue);
>   }
>
> In my corresponding application I added the following lines in
> properties.xml:
> 
> dt.operator.jdbcOutput.fieldInfo[0]
> customerPhone customerPhone STRING 0
>   
> //Added similar properties for remaining field infos.
>
>
> The issue I am facing is that setFieldInfo() is not being called. Am I
> missing something?
>
> Regards,
> Hitesh
>


Re: [VOTE] Hadoop upgrade

2016-10-04 Thread Munagala Ramanath
+1 for 2.6.x

Ram

On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:

> Hi all,
>
> Thomas created this ticket for upgrading our Hadoop dependency version a
> couple weeks ago:
>
> https://issues.apache.org/jira/browse/APEXCORE-536
>
> We'd like to get the ball rolling and would like to take a vote from the
> community which version we would like to upgrade to. We have these choices:
>
> 2.2.0 (no upgrade)
> 2.4.x
> 2.5.x
> 2.6.x
>
> We are not considering 2.7.x because we already know that many Apex users
> are using Hadoop distros that are based on 2.6.
>
> Please note that Apex works with all versions of Hadoop higher or equal to
> the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> considering Hadoop 3.0.0-alpha yet at this time.
>
> When voting, please keep these in mind:
>
> - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively, and
> how useful those features are for Apache Apex
> - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR, EMR,
> etc) are supporting
> - The Hadoop versions what typical Apex users are using
>
> Thanks,
>
> David
>


Re: checkpoint statistics

2016-09-25 Thread Munagala Ramanath
We've seen  cases where operator state continues to grow without bound
either because
the developer was unaware of the importance of keeping state small or
because of some
anomaly downstream. In such cases, the operators could get killed with an
OOM exception because
these checkpoints are building up in memory faster than they can be written
to disk.

These stats may be useful in such cases to identify the root cause of
failure.

Ram

On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde 
wrote:

> Say it takes x MB size and y seconds to do the checkpoint. What does the
> user do with that information?
>
> On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi 
> wrote:
>
> > +1
> >
> > -Tushar
> >
> > On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare 
> > wrote:
> >
> > > +1
> > >
> > > Sanjay
> > >
> > >
> > > On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare <
> > > devend...@datatorrent.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Thanks,
> > > > Dev
> > > >
> > > > On Sep 25, 2016 1:17 AM, "Pramod Immaneni" 
> > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > > On Sep 24, 2016, at 10:01 AM, Vlad Rozov <
> v.ro...@datatorrent.com>
> > > > > wrote:
> > > > > >
> > > > > > IMO, it may be useful to provide checkpoint statistics for
> example,
> > > > > total size of checkpoint for particular window or average size of
> > > > > checkpoints for a particular operator. Also, how long it takes to
> > write
> > > > > checkpoints to storage.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Vlad
> > > > >
> > > >
> > >
> >
>


Re: excluding hadoop dependencies from application package

2016-09-22 Thread Munagala Ramanath
Definitely worth adding.

Ram

On Wed, Sep 21, 2016 at 1:20 PM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

> Candidate to be added here?
>
> https://apex.apache.org/docs/apex/development_best_practices/
>
> On Wed, Sep 21, 2016 at 12:24 PM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > Some info here:
> > http://docs.datatorrent.com/troubleshooting/#hadoop-
> dependencies-conflicts
> >
> > Ram
> >
> >
> > On Wed, Sep 21, 2016 at 12:00 PM, Vlad Rozov <v.ro...@datatorrent.com>
> > wrote:
> >
> > > Is subject already documented?
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> >
>


Re: excluding hadoop dependencies from application package

2016-09-21 Thread Munagala Ramanath
Some info here:
http://docs.datatorrent.com/troubleshooting/#hadoop-dependencies-conflicts

Ram


On Wed, Sep 21, 2016 at 12:00 PM, Vlad Rozov 
wrote:

> Is subject already documented?
>
> Thank you,
>
> Vlad
>


Re: Testing operators / CI

2016-09-12 Thread Munagala Ramanath
A good start would be to revise the archetype to include as many
illustrative tests as reasonably possible -- people seem more willing to
follow examples than to follow instructions.
Ram

On Sep 12, 2016 5:26 PM, "Thomas Weise"  wrote:

Hi,

Recently there was a bit of discussion on how to write tests for operators
that will result in good coverage and high confidence in the results of the
CI. Experience from past releases show that those operators with good
coverage are less likely to break down (with a user) due to subsequent
changes, while those that don't have coverage in the CI (think contrib) are
likely to suffer breakdown even due to trivial changes that are otherwise
easily caught.

IMO writing good tests is as important as the operator main code (and
documentation and examples..). It was also part of the maturity framework
that Ashwin proposed a while ago (Ashwin, maybe you can also share a few
points). I suggest we expand the contribution guidelines to reflect an
agreed set of expectations that contributors can follow when submitting PRs
or even come up with a checklist for submitting PRs:

http://apex.apache.org/malhar-contributing.html

Here are a few recurring problems and suggestions in nor particular order:

   - Unit tests are for testing small pieces of code in isolation ("unit").
   Running a DAG in embedded mode is not a unit test, it is an integration
   test.
   - When writing an operator or making changes to fix bugs etc., it is
   recommended to write or modify the granular test that exercises this
change
   and as little as possible around it. This happens before writing or
running
   an application and can be done in fast iterations inside the IDE without
   extensive test data setup or application assembly.
   - When an operator consists of multiple other components, then testing
   for those should also be broken down into units. For example, managed
state
   is not tested by testing dedup or join operator (which are special use
   cases), but through separate tests, that exercise the full spectrum (or
at
   least close to) of managed state.
   - So what about serialization, don't I need to create a DAG to test it?
   You only need Kryo to test serialization of an operator. Use the existing
   utilities or contribute to utilities that are shared between tests.
   - Don't I need to run a DAG to test the lifecycle of an operator? No,
   the sequence of calls to an operator's lifecycle methods are documented
(or
   how else would I implement an operator to start with). There are quite a
   few tests that "execute" the operator directly. They have access to the
   state and can assert that with a certain process invocation the expected
   changes occur. That is much more difficult when running a DAG.
   - I have to write a lot of code to do such testing and possibly I will
   forget some calls? Not when following test driven development. IMO that
   mostly happens when tests are written as afterthought and that's a waste
of
   time. I would suggest though to develop a single operator test driver
that
   will ensures all methods are called for basic sanity check.
   - Integration tests: with proper unit test coverage, the integration
   test is more like an example of how to use an operator. Nice for users,
   because they can use it as a starting point for writing their own app,
   including the configuration.
   - I wrote a nice integration test app with configuration. It runs  for
   exactly  seconds (localmode.run(n)) returns and all looks green. It
even
   prints some nice stuff in the console. What's wrong? You have not tested
   anything! An operator may fail in setup and the test still passes. Travis
   CI is not reading the console (instead, it will complain that tests are
   filling up 4MB too fast and really important logs go under). Instead,
   assert on your test code that the DAG execution produces the expected
   results. Instead of waiting for  seconds wait until expected results
are
   in and cap it with a timeout. This is yet another area where a few
   utilities for recurring test code will come in handy.
   - Tests sometimes fail, but they work on my local machine? Every
   environment is different and good tests don't depend on environment
   specific factors (timing dependency, excessive resource utilization
etc.).
   It is important that tests pass in the CI consistently and that issues
   found there are investigated and fixed. Isn't it nice to see the green
   check mark in the PR instead of having to close/reopen several times so
   that the unrelated flaky test does not fail. If we collectively track and
   fix such failures life will be better for everyone.

Looking forward to feedback, additions and most importantly volunteers that
will help making the Apex CI better.

Thanks,
Thomas


Re: Default value for cooldownMillis in StatsAwareStatelessPartitioner

2016-08-18 Thread Munagala Ramanath
Just out of curiosity what is the smallest value that fixes the issue ?

Ram

On Thu, Aug 18, 2016 at 3:55 AM, Yogi Devendra 
wrote:

> Hi,
>
> Current default value is 2000 ms.
>
> I tried an application which uses default value. It was noticed that it
> leads to continuous re-deployment of the operator. Reason could be because
> 2000 ms is not sufficient for re-deploy+reaching some steady state.
>
> How about changing default to 30 sec or 1 min?
>
> ~ Yogi
>


Re: can operators emit on a different from the operator itself thread?

2016-08-11 Thread Munagala Ramanath
gt; >  >>>  Regards,
> >  >>>  Ashwin.
> >  >>>
> >  >>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
> >  >>>  ashwinchand...@gmail.com> wrote:
> >  >>>
> >  >>>  > + dev@apex.apache.org
> >  >>>  > - us...@apex.apache.org
> >  >>>  >
> >  >>>  > This is one of those best practices that we learn by
> experience
> >  >>> during
> >  >>>  > operator development. It will save a lot of time
> during operator
> >  >>>  > development if we can catch and throw validation
> error when
> >  >>> someone
> >  >>> emits
> >  >>>  > tuples in a non separate thread.
> >  >>>  >
> >  >>>  > Regards,
> >  >>>  > Ashwin
> >  >>>  >
> >  >>>  > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
> >  >>> r...@datatorrent.com>
> >  >>>  > wrote:
> >  >>>  >
> >  >>>  >> For cases where use of a different thread is
> needed, it can write
> >  >>> tuples
> >  >>>  >> to a queue from where the operator thread pulls
> them --
> >  >>>  >> JdbcPollInputOperator in Malhar has an example.
> >  >>>  >>
> >  >>>  >> Ram
> >  >>>  >>
> >  >>>  >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
> >  >>> hsy...@gmail.com
> >  >>>  >> wrote:
> >  >>>  >>
> >  >>>  >>> Hey Vlad,
> >  >>>  >>>
> >  >>>  >>> Thanks for bringing this up. Is there an easy way
> to detect
> >  >>> unexpected
> >  >>>  >>> use of emit method without hurt the performance.
> Or at least if
> >  >>> we
> >  >>> can
> >  >>>  >>> detect this in debug mode.
> >  >>>  >>>
> >  >>>  >>> Regards,
> >  >>>  >>> Siyuan
> >  >>>  >>>
> >  >>>  >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
> >  >>> v.ro...@datatorrent.com>
> >  >>>  >>> wrote:
> >  >>>  >>>
> >  >>>  >>>> The short answer is no, creating worker thread to
> emit tuples
> >  >>> is
> >  >>> not
> >  >>>  >>>> supported by Apex and will lead to an undefined
> behavior.
> >  >>> Operators in Apex
> >  >>>  >>>> have strong thread affinity and all interaction
> with the
> >  >>> platform
> >  >>> must
> >  >>>  >>>> happen on the operator thread.
> >  >>>  >>>>
> >  >>>  >>>> Vlad
> >  >>>  >>>>
> >  >>>  >>>
> >  >>>  >>>
> >  >>>  >>
> >  >>>  >
> >  >>>  >
> >  >>>  > --
> >  >>>  >
> >  >>>  > Regards,
> >  >>>  > Ashwin.
> >  >>>  >
> >  >>>
> >  >>>
> >  >>>
> >  >>>  --
> >  >>>
> >  >>>  Regards,
> >  >>>  Ashwin.
> >  >>>
> >  >>>
> >  >>>
> >  >>>
> >  >>>
> >  >
> >
> >
> >
>
>
>
>
>


Re: empty operator/stream/module names

2016-08-04 Thread Munagala Ramanath
A couple of minor issues with empty names:

It will not be possible to configure such operators from an XML file other
than through
wildcards.

Also, the AM log messages will not be as informative with an empty name,
e.g.

2016-08-02 10:04:24,294 INFO com.datatorrent.stram.ResourceRequestHandler:
Strict anti-affinity = [] for container with operators
PTOperator[id=3,name=JdbcOutput]

Ram

On Thu, Aug 4, 2016 at 10:03 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> I differ. For the UI to render a DAG the names are useful, but if the name
> is not required by the engine i.e. the engine is able to execute your
> application fine with empty or null strings as names, is there any reason
> to make them mandatory?
>
> On the other hand, we can come up with a scheme for system generated names
> when the caller doesn’t provide a name. I have some ideas.
>
>
> On 8/4/16, 9:48 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
>
> I don't see any reason to allow either.
>
> Ram
>
> On Thu, Aug 4, 2016 at 8:51 AM, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
>
> > Currently addOperator/addStream/addModule allows both null and empty
> > string in the operator/stream/module names. Is there any reason to
> allow
> > empty string? Should empty string and null be disallowed in those
> APIs?
> >
> > Vlad
> >
>
>
>
>


Re: empty operator/stream/module names

2016-08-04 Thread Munagala Ramanath
It will not be possible to configure such operators from an XML file other
than through
wildcards -- but maybe that's OK.

Ram

On Thu, Aug 4, 2016 at 10:03 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

> I differ. For the UI to render a DAG the names are useful, but if the name
> is not required by the engine i.e. the engine is able to execute your
> application fine with empty or null strings as names, is there any reason
> to make them mandatory?
>
> On the other hand, we can come up with a scheme for system generated names
> when the caller doesn’t provide a name. I have some ideas.
>
>
> On 8/4/16, 9:48 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
>
> I don't see any reason to allow either.
>
> Ram
>
> On Thu, Aug 4, 2016 at 8:51 AM, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
>
> > Currently addOperator/addStream/addModule allows both null and empty
> > string in the operator/stream/module names. Is there any reason to
> allow
> > empty string? Should empty string and null be disallowed in those
> APIs?
> >
> > Vlad
> >
>
>
>
>


Re: empty operator/stream/module names

2016-08-04 Thread Munagala Ramanath
I don't see any reason to allow either.

Ram

On Thu, Aug 4, 2016 at 8:51 AM, Vlad Rozov  wrote:

> Currently addOperator/addStream/addModule allows both null and empty
> string in the operator/stream/module names. Is there any reason to allow
> empty string? Should empty string and null be disallowed in those APIs?
>
> Vlad
>


Re: [Proposal] Named Checkpoints

2016-08-04 Thread Munagala Ramanath
+1

Ram

On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde 
wrote:

> Hello Team,
>
> This thread is to discuss the Named Checkpoint feature for Apex. (
> https://issues.apache.org/jira/browse/APEXCORE-498)
>
> Named checkpoints allow following workflow,
>
> 1. Users can trigger a checkpoint and give it a name
> 2. Relaunch the application from the named checkpoint.
> 3. These checkpoints survive the "purge of old checkpoints".
>
> Current idea is to add a new control tuple, NamedCheckPointTuple, which
> contains the user specified name, it traverses the DAG and along the way
> necessary actions are taken.
>
> Please let me know your thoughts on this.
>
> Thanks
>


Re: Container & memory resource allocation

2016-07-20 Thread Munagala Ramanath
Please note that there are multiple sites making the claim that memory
allocation
is in multiples of *yarn.scheduler.minimum-allocation-mb*; this may have
been true
at one time but is no longer true (thanks to Sandesh for fact-checking
this).

There is a (?new?) parameter, *yarn.scheduler.increment-allocation-mb*,
which serves
this purpose as discussed here:
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/

Ram

On Tue, Jul 19, 2016 at 11:27 AM, Pradeep A. Dalvi <p...@apache.org> wrote:

> Thanks Chinmay & Ram.
>
> Troubleshooting page sounds the appropriate location. I shall raise PR with
> the given suggestions.
>
> --prad
>
> On Tue, Jul 19, 2016 at 5:49 AM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > There is already a link to a troubleshooting page at bottom of
> > https://apex.apache.org/docs.html
> > That page already has some discussion under the section entitled
> > "Calculating Container Memory"
> > so adding new content there seems like the right thing to do.
> >
> > Ram
> >
> > On Mon, Jul 18, 2016 at 11:27 PM, Chinmay Kolhatkar <
> > chin...@datatorrent.com
> > > wrote:
> >
> > > Hi Pradeep,
> > >
> > > This is a great content to add to the documents. These are the common
> set
> > > of errors which might get googled and hence great to get indexed as
> well.
> > >
> > > You can take a look at:
> > > https://github.com/apache/apex-core/tree/master/docs
> > >
> > > The docs for apex reside there in markdown format. Probably its good a
> > > create a troubleshooting page where all such common questions can
> reside.
> > >
> > > After you have the content ready, you can create a pull request to
> > > apex-core repo which can get merged to apex-core and later deployed to
> > the
> > > website by committers.
> > >
> > > -Chinmay.
> > >
> > >
> > >
> > >
> > > On Tue, Jul 19, 2016 at 10:46 AM, Pradeep A. Dalvi <p...@apache.org>
> > > wrote:
> > >
> > >> Container & memory resource allocation has been a common question
> around
> > >> and so I thought it would be good to explain related configuration
> > >> parameters.
> > >>
> > >> Please feel free to let me know your thoughts.
> > >>
> > >> Also I'm planning to add following set of information under Apex Docs.
> > How
> > >> could one add this to Apex Docs?
> > >>
> > >> =-=-=-=
> > >>
> > >> "Container is running beyond physical memory limits. Current usage: X
> GB
> > >> of
> > >> Y GB physical memory used; A GB of B GB virtual memory used. Killing
> > >> container."
> > >>
> > >> This is basically for some better understanding on Application
> Master's
> > >> container requests & Resource Manager's memory resource allocation.
> > Please
> > >> note that these are individual container request params. All these
> > >> parameters are in MB i.e. 1024 => 1GB.
> > >>
> > >> - AM's container requests to RM shall contain memory in the multiples
> of
> > >> *yarn.scheduler.minimum-**allocation-mb* & not exceeding
> > >> *yarn.scheduler.maximum-**allocation-mb*
> > >>- If *yarn.scheduler.minimum-**allocation-mb *is configured as 1024
> > and
> > >> container memory requirement is 1025 ( <= 2048 ), container will be
> > >> allocated with 2048 memory.
> > >>
> > >> - With Apex applications, operator memory can be specified by property
> > >> *dt.application..operator..attr.MEMORY_MB*
> > >>- Please note this parameter is at Operator level and container
> > memory
> > >> is calculated based on number of Operators deployed in a container +
> > >> additional memory required depending on physical deployment
> requirements
> > >> e.g. unifier or bufferserver
> > >>- Wildcard * can be used at APP_NAME and/or OPERATOR_NAME
> > >>
> > >> - If container memory is not specified, then AM would request for 1
> unit
> > >> of
> > >> *yarn.scheduler.minimum-**allocation-mb*, RM would provision container
> > >> taking that into consideration.
> > >>
> > >> Node Manager monitors memory usage of each of these containers and
> kills
> > >> the ones crossing the configured limit.
> > >>
> > >> Almost similar stuff is applicable for CPUs.
> > >>
> > >> --prad
> > >>
> > >
> > >
> >
>


Re: Container & memory resource allocation

2016-07-19 Thread Munagala Ramanath
There is already a link to a troubleshooting page at bottom of
https://apex.apache.org/docs.html
That page already has some discussion under the section entitled
"Calculating Container Memory"
so adding new content there seems like the right thing to do.

Ram

On Mon, Jul 18, 2016 at 11:27 PM, Chinmay Kolhatkar  wrote:

> Hi Pradeep,
>
> This is a great content to add to the documents. These are the common set
> of errors which might get googled and hence great to get indexed as well.
>
> You can take a look at:
> https://github.com/apache/apex-core/tree/master/docs
>
> The docs for apex reside there in markdown format. Probably its good a
> create a troubleshooting page where all such common questions can reside.
>
> After you have the content ready, you can create a pull request to
> apex-core repo which can get merged to apex-core and later deployed to the
> website by committers.
>
> -Chinmay.
>
>
>
>
> On Tue, Jul 19, 2016 at 10:46 AM, Pradeep A. Dalvi 
> wrote:
>
>> Container & memory resource allocation has been a common question around
>> and so I thought it would be good to explain related configuration
>> parameters.
>>
>> Please feel free to let me know your thoughts.
>>
>> Also I'm planning to add following set of information under Apex Docs. How
>> could one add this to Apex Docs?
>>
>> =-=-=-=
>>
>> "Container is running beyond physical memory limits. Current usage: X GB
>> of
>> Y GB physical memory used; A GB of B GB virtual memory used. Killing
>> container."
>>
>> This is basically for some better understanding on Application Master's
>> container requests & Resource Manager's memory resource allocation. Please
>> note that these are individual container request params. All these
>> parameters are in MB i.e. 1024 => 1GB.
>>
>> - AM's container requests to RM shall contain memory in the multiples of
>> *yarn.scheduler.minimum-**allocation-mb* & not exceeding
>> *yarn.scheduler.maximum-**allocation-mb*
>>- If *yarn.scheduler.minimum-**allocation-mb *is configured as 1024 and
>> container memory requirement is 1025 ( <= 2048 ), container will be
>> allocated with 2048 memory.
>>
>> - With Apex applications, operator memory can be specified by property
>> *dt.application..operator..attr.MEMORY_MB*
>>- Please note this parameter is at Operator level and container memory
>> is calculated based on number of Operators deployed in a container +
>> additional memory required depending on physical deployment requirements
>> e.g. unifier or bufferserver
>>- Wildcard * can be used at APP_NAME and/or OPERATOR_NAME
>>
>> - If container memory is not specified, then AM would request for 1 unit
>> of
>> *yarn.scheduler.minimum-**allocation-mb*, RM would provision container
>> taking that into consideration.
>>
>> Node Manager monitors memory usage of each of these containers and kills
>> the ones crossing the configured limit.
>>
>> Almost similar stuff is applicable for CPUs.
>>
>> --prad
>>
>
>


Re: auto-generated emails

2016-07-12 Thread Munagala Ramanath
+1

Ram

On Tue, Jul 12, 2016 at 2:35 PM, Pramod Immaneni 
wrote:

> Hi,
>
> I was wondering how everyone felt about the volume of auto-generated emails
> on this list. Looks like multiple emails are generated and sent to everyone
> on the list even for relatively smaller actions such as commenting on a
> pull request, one from git, another from JIRA etc.
>
> Understanding that there is a need for openness, how about finding a
> balance. Here are some ideas. I do not know if all of these are technically
> feasible.
>
> 1. An email is sent to all in the list when a new pull request is created
> or merged but email notifications for back and forth comments during the
> review are only sent to participants in that particular pull request.
> 2. Similar process as above with JIRA. If someone is interested in all the
> updates to JIRA, including those that come from the pull request, they can
> add themselves to the watch list for that particular JIRA.
>
> Thanks
>


Re: Getting partition null or empty error @Apache Apex-Kafka Integration

2016-07-11 Thread Munagala Ramanath
Anuj,

Could you provide additional details such as: What does your DAG look like
? What operators (custom as well as from Malhar)
are you using ? Does this exception happen immediately upon launch or after
some time ?

Ram

On Mon, Jul 11, 2016 at 10:12 PM, ANUJ THAKWANI 
wrote:

> Hi,
>
> I have subscribed to the dev list. Let me know the way forward.
>
> I am a newbie in this project .
>
> On Tue, Jul 12, 2016 at 10:31 AM, Amol Kekre  wrote:
>
> >
> > Anuj,
> > You are not yet subscribed to dev list, and hence will have trouble
> > posting to this forum. Please subscribe by clicking on "subscribe" on the
> > dev list as listed on http://apex.apache.org/community.html.
> >
> > OR you can send an email to dev-subscr...@apex.apache.org with a subject
> > text.
> >
> > Thks
> > Amol
> >
> >
> > On Mon, Jul 11, 2016 at 9:39 PM, ANUJ THAKWANI 
> > wrote:
> >
> >> Hi,
> >>
> >> I am getting below given exception while launching kafka Application.
> >> Please let me know how to fix this.
> >>
> >> java.lang.IllegalStateException: Partitioner returns null or empty.
> >> at
> >>
> com.datatorrent.stram.plan.physical.PhysicalPlan.initPartitioning(PhysicalPlan.java:605)
> >> at
> >>
> com.datatorrent.stram.plan.physical.PhysicalPlan.addLogicalOperator(PhysicalPlan.java:1497)
> >> at
> >>
> com.datatorrent.stram.plan.physical.PhysicalPlan.(PhysicalPlan.java:344)
> >> at
> >>
> com.datatorrent.stram.StreamingContainerManager.(StreamingContainerManager.java:362)
> >> at
> >>
> com.datatorrent.stram.StreamingContainerManager.getInstance(StreamingContainerManager.java:2979)
> >> at
> >>
> com.datatorrent.stram.StreamingAppMasterService.serviceInit(StreamingAppMasterService.java:550)
> >> at
> >> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> >> at
> >>
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:101)
> >> 2016-07-11 22:24:48,250 ERROR
> >> com.datatorrent.stram.StreamingAppMaster: Exiting Application Master
> >> java.lang.IllegalStateException: Partitioner returns null or empty.
> >> at
> >>
> com.datatorrent.stram.plan.physical.PhysicalPlan.initPartitioning(PhysicalPlan.java:605)
> >> at
> >>
> com.datatorrent.stram.plan.physical.PhysicalPlan.addLogicalOperator(PhysicalPlan.java:1497)
> >> at
> >>
> com.datatorrent.stram.plan.physical.PhysicalPlan.(PhysicalPlan.java:344)
> >> at
> >>
> com.datatorrent.stram.StreamingContainerManager.(StreamingContainerManager.java:362)
> >> at
> >>
> com.datatorrent.stram.StreamingContainerManager.getInstance(StreamingContainerManager.java:2979)
> >> at
> >>
> com.datatorrent.stram.StreamingAppMasterService.serviceInit(StreamingAppMasterService.java:550)
> >> at
> >> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> >> at
> >>
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:101)
> >>
> >
> >
>


Re: How to specify affinity rules from application properties file

2016-07-11 Thread Munagala Ramanath
Could you share how you resolved the issue ?

Ram

On Mon, Jul 11, 2016 at 9:59 PM, Akshay Gore  wrote:

> Thank you Pradeep for the response. I have already resolved the issue.
>
> Regards,
> Akshay
>
> On Tue, Jul 12, 2016 at 10:10 AM, Pradeep Kumbhar  >
> wrote:
>
> > Should
> > "dt.application.AffinityRulesSampleApplication.attr.AFFINITY_RULES_SET"
> > be
> > "dt.application.PiDemo.attr.AFFINITY_RULES_SET" ?
> > or it's just an example.
> >
> >
> > On Mon, Jul 11, 2016 at 4:02 PM, Akshay Gore 
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to explore affinity feature in Apex. For this, I am using
> > > pi-demo application from apex-malhar. As per the affinity document
> > > <
> > >
> >
> https://github.com/apache/apex-core/blob/master/docs/application_development.md
> > > >,
> > > I have updated the demo application's properties file as follows:
> > >
> > > 
> > >
> > >
> >
> dt.application.AffinityRulesSampleApplication.attr.AFFINITY_RULES_SET
> > >   
> > > {
> > >   "affinityRules": [
> > >  {
> > >   "operatorsList": [
> > >  "rand",
> > >  "picalc"
> > >],
> > >"locality":
> > > "NODE_LOCAL",
> > >"type": "AFFINITY",
> > >"relaxLocality":
> false
> > >  }
> > >]
> > >}
> > > 
> > >
> > >
> > > After launching the demo application, I don't see containers for
> operator
> > > "rand" and "picalc" on the same node. Although it's working fine when I
> > set
> > > the rule from application code. Can anyone please help me here?
> > >
> > > Thanks,
> > > Akshay
> > >
> >
> >
> >
> > --
> > *regards,*
> > *~pradeep*
> >
>


Bleeding edge branch ?

2016-07-11 Thread Munagala Ramanath
We've had a number of issues recently related to dependencies on old
versions
of various packages/libraries such as Hadoop itself, Google guava,
HTTPClient,
mbassador, etc.

How about we create a "bleeding-edge" branch in both Core and Malhar which
will use the latest versions of these various dependencies, upgrade to Java
8 so
we can use the new Java features, etc. ?

This will give us an opportunity to discover these sorts of problems early
and,
when we are ready to pull the trigger for a major version, we have a branch
ready
for merge with, hopefully, minimal additional effort.

There will be no guarantees w.r.t. this branch so people using it use it at
their own
risk.

Ram


Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Munagala Ramanath
What would the API look like for option 1 ? Another operator callback
called controlTuple() or does the operator code have to check each
incoming tuple to see if it was data or control ?

Ram

On Fri, Jun 24, 2016 at 11:42 PM, David Yan  wrote:

> It looks like option 1 is preferred by the community. But let me elaborate
> why I brought up the option of piggy backing BEGIN and END_WINDOW
>
> Option 2 implicitly enforces that the operations related to the custom
> control tuple be done at the streaming window boundary.
>
> For most operations, it makes sense to have that enforcement. Option 1
> opens the door to the possibility of sending and handling control tuples
> within a window, thus imposing a challenge of ensuring idempotency. In
> fact, allowing that would make idempotency extremely difficult to achieve.
>
> David
>
> On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov 
> wrote:
>
> > +1 for option 1.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 6/24/16 14:35, Bright Chen wrote:
> >
> >> +1
> >> It also can help to Shutdown the application gracefully.
> >> Bright
> >>
> >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua  wrote:
> >>>
> >>> +1
> >>>
> >>> I think it's good to have custom control tuple and I prefer the 1
> option.
> >>>
> >>> Also I think we should think about couple different callbacks, that
> could
> >>> be operator level(triggered when an operator receives an control tuple)
> >>> or
> >>> dag level(triggered when control tuple flow over the whole dag)
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan 
> >>> wrote:
> >>>
> >>> My initial thinking is that the custom control tuples, just like the
>  existing control tuples, will only be generated from the input
> operators
>  and will be propagated downstream to all operators in the DAG. So the
>  NxM
>  partitioning scenario works just like how other control tuples work,
>  i.e.
>  the callback will not be called unless all ports have received the
>  control
>  tuple for a particular window. This creates a little bit of
> complication
>  with multiple input operators though.
> 
>  David
> 
> 
>  On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi <
> tus...@datatorrent.com
>  >
>  wrote:
> 
>  +1 for the feature
> >
> > I am in favor of option 1, but we may need an helper method to avoid
> > compiler error on typed port, as calling port.emit(controlTuple) will
> > be an error if type of control tuple and port does not match. or new
> > method in outputPort object , emitControlTuple(ControlTuple).
> >
> > Can you give example of piggy backing tuple with current BEGIN_WINDOW
> > and END_WINDOW control tuples?
> >
> > In case of NxM partitioning, each downstream operator will receive N
> > control tuples. will it call user handler N times for each downstream
> > operator or just once.
> >
> > Regards,
> > - Tushar.
> >
> >
> >
> > On Fri, Jun 24, 2016 at 11:52 PM, David Yan 
> >
>  wrote:
> 
> > Hi all,
> >>
> >> I would like to propose a new feature to the Apex core engine -- the
> >> support of custom control tuples. Currently, we have control tuples
> >>
> > such
> 
> > as
> >
> >> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we don't have
> the
> >> support for applications to insert their own control tuples. The way
> >> currently to get around this is to use data tuples and have a
> separate
> >>
> > port
> >
> >> for such tuples that sends tuples to all partitions of the
> downstream
> >> operators, which is not exactly developer friendly.
> >>
> >> We have already seen a number of use cases that can use this
> feature:
> >>
> >> 1) Batch support: We need to tell all operators of the physical DAG
> >>
> > when
> 
> > a
> >
> >> batch starts and ends, so the operators can do whatever that is
> needed
> >>
> > upon
> >
> >> the start or the end of a batch.
> >>
> >> 2) Watermark: To support the concepts of event time windowing, the
> >> watermark control tuple is needed to tell which windows should be
> >> considered late.
> >>
> >> 3) Changing operator properties: We do have the support of changing
> >> operator properties on the fly, but with a custom control tuple, the
> >> command to change operator properties can be window aligned for all
> >> partitions and also across the DAG.
> >>
> >> 4) Recording tuples: Like changing operator properties, we do have
> >> this
> >> support now but only at the individual physical operator level, and
> >>
> > without
> >
> >> control of which window to record tuples for. With a 

Re: APC

2016-06-07 Thread Munagala Ramanath
Tim,

Are you building a config package by any chance, since that builds an *apc*
file.
Can you post the exact archetype command you ran ?

I just ran the archetype command for 3.4.0 and 3.5.0-SNAPSHOT and it built
an *apa* just as it always does.

Ram

On Tue, Jun 7, 2016 at 12:44 PM, Timothy Farkas <
timothytiborfar...@gmail.com> wrote:

> Hi All,
>
> I noticed that the new project template generates a apc, doesn't seem to
> launch in the sandbox I just downloaded. Can I get some hints about what
> version works with what, and what the differences between an apc and the
> old apa are?
>
> Thanks!
> Tim
>


Re: Proposal : DAG - SetOperatorAttribute

2016-06-07 Thread Munagala Ramanath
+1

Since we have *setInputPortAttribute* and *setOutputPortAttribute*, it
seems reasonable
to add *setOperatorAttribute*.

Ram

On Mon, Jun 6, 2016 at 1:39 PM, Sandesh Hegde 
wrote:

> Currently, *setAttribute* is used to set the operator attributes. Other 2
> Attribute setting APIs are specific to input ports
> (*setInputPortAttributes*) and output ports (*setOutputPortsAttributes*).
>
> Proposal is to have *SetOperatorAttribute*
> api, which will clearly indicate that user wants set attributes on the
> operator.
> ( setOperatorAttribute(Operator operator, Attribute key, T value) )
>
> Following will be the roles for the APIs
> *setAttributes* --> for setting Attributes for the whole DAG (
> setAttribute(Operator operator, Attribute key, T value) - can be
> deprecated )
> *setOperatorAttributes* --> for setting Attributes for the operator
>
> Let me know your thoughts.
>
> Thanks
>


Parquet output operator ?

2016-06-02 Thread Munagala Ramanath
Anybody know if there are plans for a Parquet writer operator ? If so, can
anyone
share status and timeline ?

Thanks.

Ram