Re: [VOTE] Apache Apex Malhar Release 3.6.0 (RC1)

2016-11-30 Thread Vlad Rozov

+1 (binding)

Verified checksums
Verified LICENSE, NOTICE and README.md.
Build with:

mvn clean apache-rat:check verify -Dlicense.skip=false -Pall-modules 
install -DskipTests


Thank you,

Vlad


On 11/30/16 22:56, Bhupesh Chawda wrote:

+1

  - Verified signatures
  - Build and test successful
  - LICENSE, NOTICE, README, CHANGELOG.md exist

~ Bhupesh



On Thu, Dec 1, 2016 at 11:24 AM, David Yan  wrote:


+1 (binding)

Verified existance of LICENSE, NOTICE, README.md and CHANGELOG.md files
Built with this command:

   mvn clean apache-rat:check verify -Dlicense.skip=false -Pall-modules
install

with no errors.
Verified pi demo


On Wed, Nov 30, 2016 at 11:37 AM, Siyuan Hua 
wrote:


+1

Verified checksums
Verified compilation
Verified build and test
Verified pi demo

On Wed, Nov 30, 2016 at 9:50 AM, Tushar Gosavi 
wrote:


+1

Verified checksums
Verified compilation

- Tushar.


On Wed, Nov 30, 2016 at 7:43 PM, Thomas Weise  wrote:

Can folks please verify the release.

Thanks

--
sent from mobile
On Nov 26, 2016 6:32 PM, "Thomas Weise"  wrote:


Dear Community,

Please vote on the following Apache Apex Malhar 3.6.0 release

candidate.

This is a source release with binary artifacts published to Maven.

This release is based on Apex Core 3.4 and resolves 69 issues.

The release adds first iteration of SQL support via Apache Calcite,

an

alternative Cassandra output operator (non-transactional, upsert

based),

enrichment operator, improvements to window storage and new user
documentation for several operators along with many other

enhancements

and

bug fixes.

List of all issues fixed: https://s.apache.org/9b0t
User documentation: http://apex.apache.org/docs/malhar-3.6/

Staging directory:
https://dist.apache.org/repos/dist/dev/apex/apache-apex-

malhar-3.6.0-RC1/

Source zip:
https://dist.apache.org/repos/dist/dev/apex/apache-apex-
malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.zip
Source tar.gz:
https://dist.apache.org/repos/dist/dev/apex/apache-apex-
malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.tar.gz
Maven staging repository:
https://repository.apache.org/content/repositories/

orgapacheapex-1020/

Git source:
https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
commit;h=refs/tags/v3.6.0-RC1
  (commit: 43d524dc5d5326b8d94593901cad026528bb62a1)

PGP key:
http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
KEYS file:
https://dist.apache.org/repos/dist/release/apex/KEYS

More information at:
http://apex.apache.org

Please try the release and vote; vote will be open util Wed, 11/30

EOD

PST

considering the US holiday weekend.

[ ] +1 approve (and what verification was done)
[ ] -1 disapprove (and reason why)

http://www.apache.org/foundation/voting.html

How to verify release candidate:

http://apex.apache.org/verification.html

Thanks,
Thomas






Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
Yes, Ram explained to me that in practice this would be a useful feature for 
Apex devops who typically have no control over Hadoop/Yarn cluster.

On 11/30/16, 9:22 PM, "Mohit Jotwani"  wrote:

This is a practical scenario where developers would be required to exclude
certain nodes as they might be required for some mission critical
applications. It would be good to have this feature.

I understand that Stram should not get into resourcing and still rely on
Yarn, however, as the App Master it should have the right to reject the
nodes offered by Yarn and request for other resources.

Regards,
Mohit

On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde 
wrote:

> Apex has automatic blacklisting of the troublesome nodes, please take a
> look at the following attributes,
>
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> FAILURES_FOR_BLACKLIST
>
> BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
>
> Thanks
>
>
>
> On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath 
> wrote:
>
> Not sure if this is what Milind had in mind but we often run into
> situations where the dev group
> working with Apex has no control over cluster configuration -- to make any
> changes to the cluster they need to
> go through an elaborate process that can take many days.
>
> Meanwhile, if they notice that a particular node is consistently causing
> problems for their
> app, having a simple way to exclude it would be very helpful since it 
gives
> them a way
> to bypass communication and process issues within their own organization.
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
> wrote:
>
> > To me both use cases appear to be generic resource management use cases.
> > For example, a randomly rebooting node is not good for any purpose esp.
> > long running apps so it is a bit of a stretch to imagine that these 
nodes
> > will be acceptable for some batch jobs in Yarn. So such a node should be
> > marked “Bad” or Unavailable in Yarn itself.
> >
> > Second use case is also typical anti-affinity use case which ideally
> > should be implemented in Yarn – Milind’s example can also apply to
> non-Apex
> > batch jobs. In any case it looks like Yarn still doesn’t have it (
> > https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> > will need to do it ourselves.
> >
> > On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
> >
> > But then, what's the solution to the 2 problem scenarios that Milind
> > describes ?
> >
> > Ram
> >
> > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> > wrote:
> >
> > > I think “exclude nodes” and such is really the job of the resource
> > manager
> > > i.e. Yarn. So I am not sure taking over some of these tasks in 
Apex
> > would
> > > be very useful.
> > >
> > > I agree with Amol that apps should be node neutral. Resource
> > management in
> > > Yarn together with fault tolerance in Apex should minimize the 
need
> > for
> > > this feature although I am sure one can find use cases.
> > >
> > >
> > > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> > >
> > > We do have this feature in Yarn, but that applies to all
> > applications.
> > > I am
> > > not sure if Yarn has anti-affinity. This feature may be used,
> > but in
> > > general there is danger is an application taking over resource
> > > allocation.
> > > Another quirk is that big data apps should ideally be
> > node-neutral.
> > > This is
> > > a good idea, if we are able to carve out something where need
> is
> > app
> > > specific.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> > mili...@gmail.com>
> > > wrote:
> > >
> > > > We have seen 2 cases mentioned below, where, it would have
> > been nice
> > > if
> > > > Apex allowed us to exclude a node from the cluster for an
> > > application.
> > > >
> > > > 1. A node in the cluster had gone bad (was randomly
> rebooting)
> > and
> > > so an
> > > > Apex app should not use it - other apps can use it as they
> were
> > > batch jobs.
> > > > 2. A node is 

Re: [VOTE] Apache Apex Malhar Release 3.6.0 (RC1)

2016-11-30 Thread David Yan
+1 (binding)

Verified existance of LICENSE, NOTICE, README.md and CHANGELOG.md files
Built with this command:

  mvn clean apache-rat:check verify -Dlicense.skip=false -Pall-modules
install

with no errors.
Verified pi demo


On Wed, Nov 30, 2016 at 11:37 AM, Siyuan Hua  wrote:

> +1
>
> Verified checksums
> Verified compilation
> Verified build and test
> Verified pi demo
>
> On Wed, Nov 30, 2016 at 9:50 AM, Tushar Gosavi 
> wrote:
>
> > +1
> >
> > Verified checksums
> > Verified compilation
> >
> > - Tushar.
> >
> >
> > On Wed, Nov 30, 2016 at 7:43 PM, Thomas Weise  wrote:
> > > Can folks please verify the release.
> > >
> > > Thanks
> > >
> > > --
> > > sent from mobile
> > > On Nov 26, 2016 6:32 PM, "Thomas Weise"  wrote:
> > >
> > >> Dear Community,
> > >>
> > >> Please vote on the following Apache Apex Malhar 3.6.0 release
> candidate.
> > >>
> > >> This is a source release with binary artifacts published to Maven.
> > >>
> > >> This release is based on Apex Core 3.4 and resolves 69 issues.
> > >>
> > >> The release adds first iteration of SQL support via Apache Calcite, an
> > >> alternative Cassandra output operator (non-transactional, upsert
> based),
> > >> enrichment operator, improvements to window storage and new user
> > >> documentation for several operators along with many other enhancements
> > and
> > >> bug fixes.
> > >>
> > >> List of all issues fixed: https://s.apache.org/9b0t
> > >> User documentation: http://apex.apache.org/docs/malhar-3.6/
> > >>
> > >> Staging directory:
> > >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > malhar-3.6.0-RC1/
> > >> Source zip:
> > >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > >> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.zip
> > >> Source tar.gz:
> > >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > >> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.tar.gz
> > >> Maven staging repository:
> > >> https://repository.apache.org/content/repositories/
> orgapacheapex-1020/
> > >>
> > >> Git source:
> > >> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> > >> commit;h=refs/tags/v3.6.0-RC1
> > >>  (commit: 43d524dc5d5326b8d94593901cad026528bb62a1)
> > >>
> > >> PGP key:
> > >> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
> > >> KEYS file:
> > >> https://dist.apache.org/repos/dist/release/apex/KEYS
> > >>
> > >> More information at:
> > >> http://apex.apache.org
> > >>
> > >> Please try the release and vote; vote will be open util Wed, 11/30 EOD
> > PST
> > >> considering the US holiday weekend.
> > >>
> > >> [ ] +1 approve (and what verification was done)
> > >> [ ] -1 disapprove (and reason why)
> > >>
> > >> http://www.apache.org/foundation/voting.html
> > >>
> > >> How to verify release candidate:
> > >>
> > >> http://apex.apache.org/verification.html
> > >>
> > >> Thanks,
> > >> Thomas
> > >>
> > >>
> >
>


[jira] [Resolved] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread Bhupesh Chawda (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhupesh Chawda resolved APEXMALHAR-2022.

   Resolution: Fixed
Fix Version/s: 3.7.0

> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
> Fix For: 3.7.0
>
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710953#comment-15710953
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSSION] Custom Control Tuples

2016-11-30 Thread Bhupesh Chawda
I would like to work on https://issues.apache.org/jira/browse/APEXCORE-580.

~ Bhupesh

On Thu, Dec 1, 2016 at 5:42 AM, Sandesh Hegde 
wrote:

> I am interested in working on the following subtask
>
> https://issues.apache.org/jira/browse/APEXCORE-581
>
> Thanks
>
>
> On Wed, Nov 30, 2016 at 2:07 PM David Yan  wrote:
>
> > I have created an umbrella ticket for control tuple support:
> >
> > https://issues.apache.org/jira/browse/APEXCORE-579
> >
> > Currently it has two subtasks. Please have a look at them and see whether
> > I'm missing anything or if you have anything to add. You are welcome to
> add
> > more subtasks or comment on the existing subtasks.
> >
> > We would like to kick start the implementation soon.
> >
> > Thanks!
> >
> > David
> >
> > On Mon, Nov 28, 2016 at 5:22 PM, Bhupesh Chawda  >
> > wrote:
> >
> > > +1 for the plan.
> > >
> > > I would be interested in contributing to this feature.
> > >
> > > ~ Bhupesh
> > >
> > > On Nov 29, 2016 03:26, "Sandesh Hegde" 
> wrote:
> > >
> > > > I am interested in contributing to this feature.
> > > >
> > > > On Mon, Nov 28, 2016 at 1:54 PM David Yan 
> > wrote:
> > > >
> > > > > I think we should probably go ahead with option 1 since this works
> > with
> > > > > most use cases and prevents developers from shooting themselves in
> > the
> > > > foot
> > > > > in terms of idempotency.
> > > > >
> > > > > We can have a configuration property that enables option 2 later if
> > we
> > > > have
> > > > > concrete use cases that call for it.
> > > > >
> > > > > Please share your thoughts if you think you don't agree with this
> > plan.
> > > > > Also, please indicate if you're interested in contributing to this
> > > > feature.
> > > > >
> > > > > David
> > > > >
> > > > > On Sun, Nov 27, 2016 at 9:02 PM, Bhupesh Chawda <
> > > bhup...@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > It appears that option 1 is more favored due to unavailability
> of a
> > > use
> > > > > > case which could use option 2.
> > > > > >
> > > > > > However, option 2 is problematic in specific cases, like presence
> > of
> > > > > > multiple input ports for example. In case of a linear DAG where
> > > control
> > > > > > tuples are flowing in order with the data tuples, it should not
> be
> > > > > > difficult to guarantee idempotency. For example, cases where
> there
> > > > could
> > > > > be
> > > > > > multiple changes in behavior of an operator during a single
> window,
> > > it
> > > > > > should not wait until end window for these changes to take
> effect.
> > > > Since,
> > > > > > we don't have a concrete use case right now, perhaps we do not
> want
> > > to
> > > > go
> > > > > > that road. This feature should be available through a platform
> > > > attribute
> > > > > > (may be at a later point in time) where the default is option 1.
> > > > > >
> > > > > > I think option 1 is suitable for a starting point in the
> > > implementation
> > > > > of
> > > > > > this feature and we should proceed with it.
> > > > > >
> > > > > > ~ Bhupesh
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 11, 2016 at 12:59 AM, David Yan <
> da...@datatorrent.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Good question Tushar. The callback should be called only once.
> > > > > > > The way to implement this is to keep a list of control tuple
> > hashes
> > > > for
> > > > > > the
> > > > > > > given streaming window and only do the callback when the
> operator
> > > has
> > > > > not
> > > > > > > seen it before.
> > > > > > >
> > > > > > > Other thoughts?
> > > > > > >
> > > > > > > David
> > > > > > >
> > > > > > > On Thu, Nov 10, 2016 at 9:32 AM, Tushar Gosavi <
> > > > tus...@datatorrent.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi David,
> > > > > > > >
> > > > > > > > What would be the behaviour in case where we have a DAG with
> > > > > following
> > > > > > > > operators, the number in bracket is number of partitions, X
> is
> > > NxM
> > > > > > > > partitioning.
> > > > > > > > A(1) X B(4) X C(2)
> > > > > > > >
> > > > > > > > If A sends a control tuple, it will be sent to all 4
> partition
> > of
> > > > B,
> > > > > > > > and from each partition from B it goes to C, i.e each
> partition
> > > of
> > > > C
> > > > > > > > will receive same control tuple originated from A multiple
> > times
> > > > > > > > (number of upstream partitions of C). In this case will the
> > > > callback
> > > > > > > > function get called multiple times or just once.
> > > > > > > >
> > > > > > > > -Tushar.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Nov 4, 2016 at 12:14 AM, David Yan <
> > > da...@datatorrent.com>
> > > > > > > wrote:
> > > > > > > > > Hi Bhupesh,
> > > > > > > > >
> > > > > > > > > Since each input port has its own incoming control tuple, I
> > > would
> > > > > > > imagine
> > > > > > > > > 

[jira] [Commented] (APEXMALHAR-2361) Optimise SpillableWindowedKeyedStorage remove(Window) to improve the performance

2016-11-30 Thread David Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710326#comment-15710326
 ] 

David Yan commented on APEXMALHAR-2361:
---

I think removing it is better since you don't want to trigger again next time 
with the default value.

> Optimise SpillableWindowedKeyedStorage remove(Window) to improve the 
> performance
> 
>
> Key: APEXMALHAR-2361
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2361
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Currently, SpillableWindowedKeyedStorage remove(Window) will go through each 
> key and mark all of them as deleted. It would be expensive when there are 
> lots of keys and especially these entry already spill out of memory (this the 
> common case when remove() was called).
> Suggest to mark whole window as deleted. When the window was marked as 
> deleted, it will not allowed to add/update any entry of this window ( this 
> should match the requirement as remove(Window) only be called after allowed 
> lateness



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2361) Optimise SpillableWindowedKeyedStorage remove(Window) to improve the performance

2016-11-30 Thread bright chen (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710303#comment-15710303
 ] 

bright chen commented on APEXMALHAR-2361:
-

Probably we can handle DISCARDING by update value to default value instead of 
clear the window. Then  remove(Window)  only used for window after lateness.

> Optimise SpillableWindowedKeyedStorage remove(Window) to improve the 
> performance
> 
>
> Key: APEXMALHAR-2361
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2361
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Currently, SpillableWindowedKeyedStorage remove(Window) will go through each 
> key and mark all of them as deleted. It would be expensive when there are 
> lots of keys and especially these entry already spill out of memory (this the 
> common case when remove() was called).
> Suggest to mark whole window as deleted. When the window was marked as 
> deleted, it will not allowed to add/update any entry of this window ( this 
> should match the requirement as remove(Window) only be called after allowed 
> lateness



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXCORE-581) Delivery of Custom Control Tuples

2016-11-30 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXCORE-581:
---
Description: 
The behavior should be as follow:

- The control tuples should only be sent to downstream at streaming window 
boundaries
- The control tuples should be sent to all partitions downstream
- The control tuples should be sent in the same order of arrival.
- Within a streaming window, do not send the same control tuple twice, even if 
the same control tuple is received multiple times within that window. This is 
possible if the operator has two input ports. (The LinkedHashMap should be 
easily able to ensure both order and uniqueness.)
- The delivery of control tuples needs to stop at DelayOperator. 
- When a streaming window is committed, remove the associated LinkedHashMap 
that belong to windows with IDs that are less than the committed window
- It's safe to assume the control tuples are rare enough and can fit in memory

This will involve an additional MessageType to represent a custom control 
tuple. 
We probably need to have a data structure (possibly a LinkedHashMap) per 
streaming window that stores the control tuple in the buffer server.


  was:

The behavior should be as follow:

- The control tuples should only be sent to downstream at streaming window 
boundaries
- The control tuples should be sent to all partitions downstream
- The control tuples should be sent in the same order of arrival.
- Within a streaming window, do not send the same control tuple twice, even if 
the same control tuple is received multiple times within that window. This is 
possible if the operator has two input ports. (The LinkedHashMap should be 
easily able to do ensure both order and uniqueness.)
- The delivery of control tuples needs to stop at DelayOperator. 
- When a streaming window is committed, remove the associated LinkedHashMap 
that belong to windows with IDs that are less than the committed window
- It's safe to assume the control tuples are rare enough and can fit in memory

This will involve an additional MessageType to represent a custom control 
tuple. 
We probably need to have a data structure (possibly a LinkedHashMap) per 
streaming window that stores the control tuple in the buffer server.



> Delivery of Custom Control Tuples
> -
>
> Key: APEXCORE-581
> URL: https://issues.apache.org/jira/browse/APEXCORE-581
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> The behavior should be as follow:
> - The control tuples should only be sent to downstream at streaming window 
> boundaries
> - The control tuples should be sent to all partitions downstream
> - The control tuples should be sent in the same order of arrival.
> - Within a streaming window, do not send the same control tuple twice, even 
> if the same control tuple is received multiple times within that window. This 
> is possible if the operator has two input ports. (The LinkedHashMap should be 
> easily able to ensure both order and uniqueness.)
> - The delivery of control tuples needs to stop at DelayOperator. 
> - When a streaming window is committed, remove the associated LinkedHashMap 
> that belong to windows with IDs that are less than the committed window
> - It's safe to assume the control tuples are rare enough and can fit in memory
> This will involve an additional MessageType to represent a custom control 
> tuple. 
> We probably need to have a data structure (possibly a LinkedHashMap) per 
> streaming window that stores the control tuple in the buffer server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSSION] Custom Control Tuples

2016-11-30 Thread Sandesh Hegde
I am interested in working on the following subtask

https://issues.apache.org/jira/browse/APEXCORE-581

Thanks


On Wed, Nov 30, 2016 at 2:07 PM David Yan  wrote:

> I have created an umbrella ticket for control tuple support:
>
> https://issues.apache.org/jira/browse/APEXCORE-579
>
> Currently it has two subtasks. Please have a look at them and see whether
> I'm missing anything or if you have anything to add. You are welcome to add
> more subtasks or comment on the existing subtasks.
>
> We would like to kick start the implementation soon.
>
> Thanks!
>
> David
>
> On Mon, Nov 28, 2016 at 5:22 PM, Bhupesh Chawda 
> wrote:
>
> > +1 for the plan.
> >
> > I would be interested in contributing to this feature.
> >
> > ~ Bhupesh
> >
> > On Nov 29, 2016 03:26, "Sandesh Hegde"  wrote:
> >
> > > I am interested in contributing to this feature.
> > >
> > > On Mon, Nov 28, 2016 at 1:54 PM David Yan 
> wrote:
> > >
> > > > I think we should probably go ahead with option 1 since this works
> with
> > > > most use cases and prevents developers from shooting themselves in
> the
> > > foot
> > > > in terms of idempotency.
> > > >
> > > > We can have a configuration property that enables option 2 later if
> we
> > > have
> > > > concrete use cases that call for it.
> > > >
> > > > Please share your thoughts if you think you don't agree with this
> plan.
> > > > Also, please indicate if you're interested in contributing to this
> > > feature.
> > > >
> > > > David
> > > >
> > > > On Sun, Nov 27, 2016 at 9:02 PM, Bhupesh Chawda <
> > bhup...@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > It appears that option 1 is more favored due to unavailability of a
> > use
> > > > > case which could use option 2.
> > > > >
> > > > > However, option 2 is problematic in specific cases, like presence
> of
> > > > > multiple input ports for example. In case of a linear DAG where
> > control
> > > > > tuples are flowing in order with the data tuples, it should not be
> > > > > difficult to guarantee idempotency. For example, cases where there
> > > could
> > > > be
> > > > > multiple changes in behavior of an operator during a single window,
> > it
> > > > > should not wait until end window for these changes to take effect.
> > > Since,
> > > > > we don't have a concrete use case right now, perhaps we do not want
> > to
> > > go
> > > > > that road. This feature should be available through a platform
> > > attribute
> > > > > (may be at a later point in time) where the default is option 1.
> > > > >
> > > > > I think option 1 is suitable for a starting point in the
> > implementation
> > > > of
> > > > > this feature and we should proceed with it.
> > > > >
> > > > > ~ Bhupesh
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Nov 11, 2016 at 12:59 AM, David Yan  >
> > > > wrote:
> > > > >
> > > > > > Good question Tushar. The callback should be called only once.
> > > > > > The way to implement this is to keep a list of control tuple
> hashes
> > > for
> > > > > the
> > > > > > given streaming window and only do the callback when the operator
> > has
> > > > not
> > > > > > seen it before.
> > > > > >
> > > > > > Other thoughts?
> > > > > >
> > > > > > David
> > > > > >
> > > > > > On Thu, Nov 10, 2016 at 9:32 AM, Tushar Gosavi <
> > > tus...@datatorrent.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi David,
> > > > > > >
> > > > > > > What would be the behaviour in case where we have a DAG with
> > > > following
> > > > > > > operators, the number in bracket is number of partitions, X is
> > NxM
> > > > > > > partitioning.
> > > > > > > A(1) X B(4) X C(2)
> > > > > > >
> > > > > > > If A sends a control tuple, it will be sent to all 4 partition
> of
> > > B,
> > > > > > > and from each partition from B it goes to C, i.e each partition
> > of
> > > C
> > > > > > > will receive same control tuple originated from A multiple
> times
> > > > > > > (number of upstream partitions of C). In this case will the
> > > callback
> > > > > > > function get called multiple times or just once.
> > > > > > >
> > > > > > > -Tushar.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 4, 2016 at 12:14 AM, David Yan <
> > da...@datatorrent.com>
> > > > > > wrote:
> > > > > > > > Hi Bhupesh,
> > > > > > > >
> > > > > > > > Since each input port has its own incoming control tuple, I
> > would
> > > > > > imagine
> > > > > > > > there would be an additional DefaultInputPort.processControl
> > > method
> > > > > > that
> > > > > > > > operator developers can override.
> > > > > > > > If we go for option 1, my thinking is that the control tuples
> > > would
> > > > > > > always
> > > > > > > > be delivered at the next window boundary, even if the emit
> > method
> > > > is
> > > > > > > called
> > > > > > > > within a window.
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > > > On Thu, Nov 3, 2016 at 

[GitHub] apex-malhar pull request #519: APEXMALHAR-2362 #resolve clearing the removed...

2016-11-30 Thread davidyan74
GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/519

APEXMALHAR-2362 #resolve clearing the removedSets at endWindow in 
SpillableSetMultimapImpl

@brightchen please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2362

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #519


commit 87e4cbddaf82c72d6a6b08bbda3b33b25fb01765
Author: David Yan 
Date:   2016-11-30T23:18:46Z

APEXMALHAR-2362 #resolve clearing the removedSets at endWindow in 
SpillableSetMultimapImpl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Moved] (APEXMALHAR-2362) SpillableSetMulitmapImpl.removedSets keeps growing

2016-11-30 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan moved APEXCORE-582 to APEXMALHAR-2362:


Workflow: Default workflow, editable Closed status  (was: jira)
 Key: APEXMALHAR-2362  (was: APEXCORE-582)
 Project: Apache Apex Malhar  (was: Apache Apex Core)

> SpillableSetMulitmapImpl.removedSets keeps growing
> --
>
> Key: APEXMALHAR-2362
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2362
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
>
> That list is only added to but not removed and it will grow over time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXCORE-582) SpillableSetMulitmapImpl.removedSets keeps growing

2016-11-30 Thread David Yan (JIRA)
David Yan created APEXCORE-582:
--

 Summary: SpillableSetMulitmapImpl.removedSets keeps growing
 Key: APEXCORE-582
 URL: https://issues.apache.org/jira/browse/APEXCORE-582
 Project: Apache Apex Core
  Issue Type: Bug
Reporter: David Yan
Assignee: David Yan


That list is only added to but not removed and it will grow over time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2361) Optimise SpillableWindowedKeyedStorage remove(Window) to improve the performance

2016-11-30 Thread David Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710094#comment-15710094
 ] 

David Yan commented on APEXMALHAR-2361:
---

It makes sense. This potentially skips a lot of lookups because we can simply 
delete the windows and not worry about the keys in the windows.

> Optimise SpillableWindowedKeyedStorage remove(Window) to improve the 
> performance
> 
>
> Key: APEXMALHAR-2361
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2361
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Currently, SpillableWindowedKeyedStorage remove(Window) will go through each 
> key and mark all of them as deleted. It would be expensive when there are 
> lots of keys and especially these entry already spill out of memory (this the 
> common case when remove() was called).
> Suggest to mark whole window as deleted. When the window was marked as 
> deleted, it will not allowed to add/update any entry of this window ( this 
> should match the requirement as remove(Window) only be called after allowed 
> lateness



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2339) Windowed Operator benchmarking

2016-11-30 Thread bright chen (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710031#comment-15710031
 ] 

bright chen commented on APEXMALHAR-2339:
-

SpillableWindowedKeyedStorage.remove(Window) will go through each entry of this 
window and mark as deleted. It would be expensive. Suggest to optimize it. 
See https://issues.apache.org/jira/browse/APEXMALHAR-2361

> Windowed Operator benchmarking
> --
>
> Key: APEXMALHAR-2339
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2339
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
> Attachments: Screen Shot 2016-11-21 at 10.34.38 AM.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2361) Optimise SpillableWindowedKeyedStorage remove(Window) to improve the performance

2016-11-30 Thread bright chen (JIRA)
bright chen created APEXMALHAR-2361:
---

 Summary: Optimise SpillableWindowedKeyedStorage remove(Window) to 
improve the performance
 Key: APEXMALHAR-2361
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2361
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: bright chen
Assignee: bright chen


Currently, SpillableWindowedKeyedStorage remove(Window) will go through each 
key and mark all of them as deleted. It would be expensive when there are lots 
of keys and especially these entry already spill out of memory (this the common 
case when remove() was called).

Suggest to mark whole window as deleted. When the window was marked as deleted, 
it will not allowed to add/update any entry of this window ( this should match 
the requirement as remove(Window) only be called after allowed lateness



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2359) Optimise fire trigger to avoid go through all data

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709958#comment-15709958
 ] 

ASF GitHub Bot commented on APEXMALHAR-2359:


GitHub user brightchen opened a pull request:

https://github.com/apache/apex-malhar/pull/518

APEXMALHAR-2359 #resolve #comment Optimise fire trigger to avoid go t…

…hrough all data

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2359

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #518


commit 665452248dfd437c176e5364665266839943c30b
Author: brightchen 
Date:   2016-11-29T23:05:09Z

APEXMALHAR-2359 #resolve #comment Optimise fire trigger to avoid go through 
all data




> Optimise fire trigger to avoid go through all data
> --
>
> Key: APEXMALHAR-2359
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2359
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 144h
>  Remaining Estimate: 144h
>
> KeyedWindowedOperatorImpl.fireNormalTrigger(Window, boolean) currently go 
> through each window and key to check value. The data collection could be very 
> huge as the discard period could be relative long time. If 
> fireOnlyUpdatedPanes is false probably there don't have much space to 
> improve. But if fireOnlyUpdatedPanes is true, we don't have to go through the 
> whole data collection. We only need to go through the window and key which 
> handle after last trigger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #518: APEXMALHAR-2359 #resolve #comment Optimise fi...

2016-11-30 Thread brightchen
GitHub user brightchen opened a pull request:

https://github.com/apache/apex-malhar/pull/518

APEXMALHAR-2359 #resolve #comment Optimise fire trigger to avoid go t…

…hrough all data

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2359

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #518


commit 665452248dfd437c176e5364665266839943c30b
Author: brightchen 
Date:   2016-11-29T23:05:09Z

APEXMALHAR-2359 #resolve #comment Optimise fire trigger to avoid go through 
all data




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSSION] Custom Control Tuples

2016-11-30 Thread David Yan
I have created an umbrella ticket for control tuple support:

https://issues.apache.org/jira/browse/APEXCORE-579

Currently it has two subtasks. Please have a look at them and see whether
I'm missing anything or if you have anything to add. You are welcome to add
more subtasks or comment on the existing subtasks.

We would like to kick start the implementation soon.

Thanks!

David

On Mon, Nov 28, 2016 at 5:22 PM, Bhupesh Chawda 
wrote:

> +1 for the plan.
>
> I would be interested in contributing to this feature.
>
> ~ Bhupesh
>
> On Nov 29, 2016 03:26, "Sandesh Hegde"  wrote:
>
> > I am interested in contributing to this feature.
> >
> > On Mon, Nov 28, 2016 at 1:54 PM David Yan  wrote:
> >
> > > I think we should probably go ahead with option 1 since this works with
> > > most use cases and prevents developers from shooting themselves in the
> > foot
> > > in terms of idempotency.
> > >
> > > We can have a configuration property that enables option 2 later if we
> > have
> > > concrete use cases that call for it.
> > >
> > > Please share your thoughts if you think you don't agree with this plan.
> > > Also, please indicate if you're interested in contributing to this
> > feature.
> > >
> > > David
> > >
> > > On Sun, Nov 27, 2016 at 9:02 PM, Bhupesh Chawda <
> bhup...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > It appears that option 1 is more favored due to unavailability of a
> use
> > > > case which could use option 2.
> > > >
> > > > However, option 2 is problematic in specific cases, like presence of
> > > > multiple input ports for example. In case of a linear DAG where
> control
> > > > tuples are flowing in order with the data tuples, it should not be
> > > > difficult to guarantee idempotency. For example, cases where there
> > could
> > > be
> > > > multiple changes in behavior of an operator during a single window,
> it
> > > > should not wait until end window for these changes to take effect.
> > Since,
> > > > we don't have a concrete use case right now, perhaps we do not want
> to
> > go
> > > > that road. This feature should be available through a platform
> > attribute
> > > > (may be at a later point in time) where the default is option 1.
> > > >
> > > > I think option 1 is suitable for a starting point in the
> implementation
> > > of
> > > > this feature and we should proceed with it.
> > > >
> > > > ~ Bhupesh
> > > >
> > > >
> > > >
> > > > On Fri, Nov 11, 2016 at 12:59 AM, David Yan 
> > > wrote:
> > > >
> > > > > Good question Tushar. The callback should be called only once.
> > > > > The way to implement this is to keep a list of control tuple hashes
> > for
> > > > the
> > > > > given streaming window and only do the callback when the operator
> has
> > > not
> > > > > seen it before.
> > > > >
> > > > > Other thoughts?
> > > > >
> > > > > David
> > > > >
> > > > > On Thu, Nov 10, 2016 at 9:32 AM, Tushar Gosavi <
> > tus...@datatorrent.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > What would be the behaviour in case where we have a DAG with
> > > following
> > > > > > operators, the number in bracket is number of partitions, X is
> NxM
> > > > > > partitioning.
> > > > > > A(1) X B(4) X C(2)
> > > > > >
> > > > > > If A sends a control tuple, it will be sent to all 4 partition of
> > B,
> > > > > > and from each partition from B it goes to C, i.e each partition
> of
> > C
> > > > > > will receive same control tuple originated from A multiple times
> > > > > > (number of upstream partitions of C). In this case will the
> > callback
> > > > > > function get called multiple times or just once.
> > > > > >
> > > > > > -Tushar.
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 4, 2016 at 12:14 AM, David Yan <
> da...@datatorrent.com>
> > > > > wrote:
> > > > > > > Hi Bhupesh,
> > > > > > >
> > > > > > > Since each input port has its own incoming control tuple, I
> would
> > > > > imagine
> > > > > > > there would be an additional DefaultInputPort.processControl
> > method
> > > > > that
> > > > > > > operator developers can override.
> > > > > > > If we go for option 1, my thinking is that the control tuples
> > would
> > > > > > always
> > > > > > > be delivered at the next window boundary, even if the emit
> method
> > > is
> > > > > > called
> > > > > > > within a window.
> > > > > > >
> > > > > > > David
> > > > > > >
> > > > > > > On Thu, Nov 3, 2016 at 1:46 AM, Bhupesh Chawda <
> > > > > bhup...@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> I have a question regarding the callback for a control tuple.
> > Will
> > > > it
> > > > > be
> > > > > > >> similar to InputPort::process() method? Something like
> > > > > > >> InputPort::processControlTuple(t)
> > > > > > >> ? Or will it be a method of the operator similar to
> > beginWindow()?
> > > > > > >>
> > > > > > >> When we say that the control tuple will be delivered at 

[jira] [Updated] (APEXCORE-581) Delivery of Custom Control Tuples

2016-11-30 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXCORE-581:
---
Description: 

The behavior should be as follow:

- The control tuples should only be sent to downstream at streaming window 
boundaries
- The control tuples should be sent to all partitions downstream
- The control tuples should be sent in the same order of arrival.
- Within a streaming window, do not send the same control tuple twice, even if 
the same control tuple is received multiple times within that window. This is 
possible if the operator has two input ports. (The LinkedHashMap should be 
easily able to do ensure both order and uniqueness.)
- The delivery of control tuples needs to stop at DelayOperator. 
- When a streaming window is committed, remove the associated LinkedHashMap 
that belong to windows with IDs that are less than the committed window
- It's safe to assume the control tuples are rare enough and can fit in memory

This will involve an additional MessageType to represent a custom control 
tuple. 
We probably need to have a data structure (possibly a LinkedHashMap) per 
streaming window that stores the control tuple in the buffer server.


  was:
This will involve an additional MessageType to represent a custom control 
tuple. 
We probably need to have a data structure (possibly a LinkedHashMap) per 
streaming window that stores the control tuple in the buffer server.

The behavior should be as follow:

- The control tuples should only be sent to downstream at streaming window 
boundaries
- The control tuples should be sent to all partitions downstream
- The control tuples should be sent in the same order of arrival.
- Within a streaming window, do not send the same control tuple twice, even if 
the same control tuple is received multiple times within that window. This is 
possible if the operator has two input ports. (The LinkedHashMap should be 
easily able to do ensure both order and uniqueness.)
- The delivery of control tuples needs to stop at DelayOperator. 
- When a streaming window is committed, remove the associated LinkedHashMap 
that belong to windows with IDs that are less than the committed window
- It's safe to assume the control tuples are rare enough and can fit in memory



> Delivery of Custom Control Tuples
> -
>
> Key: APEXCORE-581
> URL: https://issues.apache.org/jira/browse/APEXCORE-581
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> The behavior should be as follow:
> - The control tuples should only be sent to downstream at streaming window 
> boundaries
> - The control tuples should be sent to all partitions downstream
> - The control tuples should be sent in the same order of arrival.
> - Within a streaming window, do not send the same control tuple twice, even 
> if the same control tuple is received multiple times within that window. This 
> is possible if the operator has two input ports. (The LinkedHashMap should be 
> easily able to do ensure both order and uniqueness.)
> - The delivery of control tuples needs to stop at DelayOperator. 
> - When a streaming window is committed, remove the associated LinkedHashMap 
> that belong to windows with IDs that are less than the committed window
> - It's safe to assume the control tuples are rare enough and can fit in memory
> This will involve an additional MessageType to represent a custom control 
> tuple. 
> We probably need to have a data structure (possibly a LinkedHashMap) per 
> streaming window that stores the control tuple in the buffer server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXCORE-580) Interface for processing and emitting control tuples

2016-11-30 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXCORE-580:
---
Description: 
DefaultOutputPort needs to have a emitControl method so that operator code can 
call to emit a control tuple.

DefaultInputPort needs to have a processControl method so that operator would 
be able to act on the arrival of a control tuple.

Similar to a regular data tuple, we also need to provide a way for the user to 
provide custom serialization for the control tuple.

We need to design this so that the default behavior is to propagate control 
tuples to all output ports, and it should allow the user to easily change that 
behavior. The user can selectively propagate control tuples to certain output 
ports, or block the propagation altogether.

  was:
DefaultOutputPort needs to have a emitControl method so that operator code can 
call to emit a control tuple.

DefaultInputPort needs to have a processControl method so that operator would 
be able to act on the arrival of a control tuple.

We need to design this so that the default behavior is to propagate control 
tuples to all output ports, and it should allow the user tp easily change that 
behavior. The user can selectively propagate control tuples to certain output 
ports, or block the propagation altogether.


> Interface for processing and emitting control tuples
> 
>
> Key: APEXCORE-580
> URL: https://issues.apache.org/jira/browse/APEXCORE-580
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> DefaultOutputPort needs to have a emitControl method so that operator code 
> can call to emit a control tuple.
> DefaultInputPort needs to have a processControl method so that operator would 
> be able to act on the arrival of a control tuple.
> Similar to a regular data tuple, we also need to provide a way for the user 
> to provide custom serialization for the control tuple.
> We need to design this so that the default behavior is to propagate control 
> tuples to all output ports, and it should allow the user to easily change 
> that behavior. The user can selectively propagate control tuples to certain 
> output ports, or block the propagation altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXCORE-580) Interface for processing and emitting control tuples

2016-11-30 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXCORE-580:
---
Summary: Interface for processing and emitting control tuples  (was: Add 
methods for processing and emitting control tuples)

> Interface for processing and emitting control tuples
> 
>
> Key: APEXCORE-580
> URL: https://issues.apache.org/jira/browse/APEXCORE-580
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> DefaultOutputPort needs to have a emitControl method so that operator code 
> can call to emit a control tuple.
> DefaultInputPort needs to have a processControl method so that operator would 
> be able to act on the arrival of a control tuple.
> We need to design this so that the default behavior is to propagate control 
> tuples to all output ports, and it should allow the user tp easily change 
> that behavior. The user can selectively propagate control tuples to certain 
> output ports, or block the propagation altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXCORE-580) Add methods for processing and emitting control tuples

2016-11-30 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXCORE-580:
---
Description: 
DefaultOutputPort needs to have a emitControl method so that operator code can 
call to emit a control tuple.

DefaultInputPort needs to have a processControl method so that operator would 
be able to act on the arrival of a control tuple.

We need to design this so that the default behavior is to propagate control 
tuples to all output ports, and it should allow the user tp easily change that 
behavior. The user can selectively propagate control tuples to certain output 
ports, or block the propagation altogether.

  was:
DefaultOutputPort needs to have a emitControl method so that operator code can 
call to emit a control tuple.

DefaultInputPort needs to have a processControl method so that operator would 
be able to act on the arrival of a control tuple. 




> Add methods for processing and emitting control tuples
> --
>
> Key: APEXCORE-580
> URL: https://issues.apache.org/jira/browse/APEXCORE-580
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> DefaultOutputPort needs to have a emitControl method so that operator code 
> can call to emit a control tuple.
> DefaultInputPort needs to have a processControl method so that operator would 
> be able to act on the arrival of a control tuple.
> We need to design this so that the default behavior is to propagate control 
> tuples to all output ports, and it should allow the user tp easily change 
> that behavior. The user can selectively propagate control tuples to certain 
> output ports, or block the propagation altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXCORE-581) Delivery of Custom Control Tuples

2016-11-30 Thread David Yan (JIRA)
David Yan created APEXCORE-581:
--

 Summary: Delivery of Custom Control Tuples
 Key: APEXCORE-581
 URL: https://issues.apache.org/jira/browse/APEXCORE-581
 Project: Apache Apex Core
  Issue Type: Sub-task
Reporter: David Yan


This will involve an additional MessageType to represent a custom control 
tuple. 
We probably need to have a data structure (possibly a LinkedHashMap) per 
streaming window that stores the control tuple in the buffer server.

The behavior should be as follow:

- The control tuples should only be sent to downstream at streaming window 
boundaries
- The control tuples should be sent to all partitions downstream
- The control tuples should be sent in the same order of arrival.
- Within a streaming window, do not send the same control tuple twice, even if 
the same control tuple is received multiple times within that window. This is 
possible if the operator has two input ports. (The LinkedHashMap should be 
easily able to do ensure both order and uniqueness.)
- The delivery of control tuples needs to stop at DelayOperator. 
- When a streaming window is committed, remove the associated LinkedHashMap 
that belong to windows with IDs that are less than the committed window
- It's safe to assume the control tuples are rare enough and can fit in memory




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXCORE-580) Add methods for processing and emitting control tuples

2016-11-30 Thread David Yan (JIRA)
David Yan created APEXCORE-580:
--

 Summary: Add methods for processing and emitting control tuples
 Key: APEXCORE-580
 URL: https://issues.apache.org/jira/browse/APEXCORE-580
 Project: Apache Apex Core
  Issue Type: Sub-task
Reporter: David Yan


DefaultOutputPort needs to have a emitControl method so that operator code can 
call to emit a control tuple.

DefaultInputPort needs to have a processControl method so that operator would 
be able to act on the arrival of a control tuple. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sandesh Hegde
Apex has automatic blacklisting of the troublesome nodes, please take a
look at the following attributes,

MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST

BLACKLISTED_NODE_REMOVAL_TIME_MILLIS

Thanks



On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath 
wrote:

Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.

Meanwhile, if they notice that a particular node is consistently causing
problems for their
app, having a simple way to exclude it would be very helpful since it gives
them a way
to bypass communication and process issues within their own organization.

Ram

On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to
non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they
were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.

Meanwhile, if they notice that a particular node is consistently causing
problems for their
app, having a simple way to exclude it would be very helpful since it gives
them a way
to bypass communication and process issues within their own organization.

Ram

On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: [VOTE] Apache Apex Malhar Release 3.6.0 (RC1)

2016-11-30 Thread Siyuan Hua
+1

Verified checksums
Verified compilation
Verified build and test
Verified pi demo

On Wed, Nov 30, 2016 at 9:50 AM, Tushar Gosavi 
wrote:

> +1
>
> Verified checksums
> Verified compilation
>
> - Tushar.
>
>
> On Wed, Nov 30, 2016 at 7:43 PM, Thomas Weise  wrote:
> > Can folks please verify the release.
> >
> > Thanks
> >
> > --
> > sent from mobile
> > On Nov 26, 2016 6:32 PM, "Thomas Weise"  wrote:
> >
> >> Dear Community,
> >>
> >> Please vote on the following Apache Apex Malhar 3.6.0 release candidate.
> >>
> >> This is a source release with binary artifacts published to Maven.
> >>
> >> This release is based on Apex Core 3.4 and resolves 69 issues.
> >>
> >> The release adds first iteration of SQL support via Apache Calcite, an
> >> alternative Cassandra output operator (non-transactional, upsert based),
> >> enrichment operator, improvements to window storage and new user
> >> documentation for several operators along with many other enhancements
> and
> >> bug fixes.
> >>
> >> List of all issues fixed: https://s.apache.org/9b0t
> >> User documentation: http://apex.apache.org/docs/malhar-3.6/
> >>
> >> Staging directory:
> >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> malhar-3.6.0-RC1/
> >> Source zip:
> >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> >> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.zip
> >> Source tar.gz:
> >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> >> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.tar.gz
> >> Maven staging repository:
> >> https://repository.apache.org/content/repositories/orgapacheapex-1020/
> >>
> >> Git source:
> >> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> >> commit;h=refs/tags/v3.6.0-RC1
> >>  (commit: 43d524dc5d5326b8d94593901cad026528bb62a1)
> >>
> >> PGP key:
> >> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
> >> KEYS file:
> >> https://dist.apache.org/repos/dist/release/apex/KEYS
> >>
> >> More information at:
> >> http://apex.apache.org
> >>
> >> Please try the release and vote; vote will be open util Wed, 11/30 EOD
> PST
> >> considering the US holiday weekend.
> >>
> >> [ ] +1 approve (and what verification was done)
> >> [ ] -1 disapprove (and reason why)
> >>
> >> http://www.apache.org/foundation/voting.html
> >>
> >> How to verify release candidate:
> >>
> >> http://apex.apache.org/verification.html
> >>
> >> Thanks,
> >> Thomas
> >>
> >>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Amol Kekre
I agree, Randomly rebooting node is Yarn issue. Even anti-affinity between
apps should be Yarn in long run. We could contribute the above jira.

Thks
Amol


On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
To me both use cases appear to be generic resource management use cases. For 
example, a randomly rebooting node is not good for any purpose esp. long 
running apps so it is a bit of a stretch to imagine that these nodes will be 
acceptable for some batch jobs in Yarn. So such a node should be marked “Bad” 
or Unavailable in Yarn itself.

Second use case is also typical anti-affinity use case which ideally should be 
implemented in Yarn – Milind’s example can also apply to non-Apex batch jobs. 
In any case it looks like Yarn still doesn’t have it 
(https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we will 
need to do it ourselves.

On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:

But then, what's the solution to the 2 problem scenarios that Milind
describes ?

Ram

On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare 
wrote:

> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of these tasks in Apex would
> be very useful.
>
> I agree with Amol that apps should be node neutral. Resource management in
> Yarn together with fault tolerance in Apex should minimize the need for
> this feature although I am sure one can find use cases.
>
>
> On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
>
> We do have this feature in Yarn, but that applies to all applications.
> I am
> not sure if Yarn has anti-affinity. This feature may be used, but in
> general there is danger is an application taking over resource
> allocation.
> Another quirk is that big data apps should ideally be node-neutral.
> This is
> a good idea, if we are able to carve out something where need is app
> specific.
>
> Thks
> Amol
>
>
> On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve 
> wrote:
>
> > We have seen 2 cases mentioned below, where, it would have been nice
> if
> > Apex allowed us to exclude a node from the cluster for an
> application.
> >
> > 1. A node in the cluster had gone bad (was randomly rebooting) and
> so an
> > Apex app should not use it - other apps can use it as they were
> batch jobs.
> > 2. A node is being used for a mission critical app (Could be an Apex
> app
> > itself), but another Apex app which is mission critical should not
> be using
> > resources on that node.
> >
> > Can we have a way in which, Stram and YARN can coordinate between
> each
> > other to not use a set of nodes for the application. It an be done
> in 2 way
> > s-
> >
> > 1. Have a list of "exclude" nodes with Stram- when YARN allcates
> resources
> > on either of these, STRAM rejects and gets resources allocated again
> frm
> > YARN
> > 2. Have a list of nodes that can be used for an app - This can be a
> part of
> > config. Hwever, I don't think this would be a right way to do so as
> we will
> > need support from YARN as well. Further, this might be difficult to
> change
> > at runtim if need be.
> >
> > Any thoughts?
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
>





Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
But then, what's the solution to the 2 problem scenarios that Milind
describes ?

Ram

On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare 
wrote:

> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of these tasks in Apex would
> be very useful.
>
> I agree with Amol that apps should be node neutral. Resource management in
> Yarn together with fault tolerance in Apex should minimize the need for
> this feature although I am sure one can find use cases.
>
>
> On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
>
> We do have this feature in Yarn, but that applies to all applications.
> I am
> not sure if Yarn has anti-affinity. This feature may be used, but in
> general there is danger is an application taking over resource
> allocation.
> Another quirk is that big data apps should ideally be node-neutral.
> This is
> a good idea, if we are able to carve out something where need is app
> specific.
>
> Thks
> Amol
>
>
> On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve 
> wrote:
>
> > We have seen 2 cases mentioned below, where, it would have been nice
> if
> > Apex allowed us to exclude a node from the cluster for an
> application.
> >
> > 1. A node in the cluster had gone bad (was randomly rebooting) and
> so an
> > Apex app should not use it - other apps can use it as they were
> batch jobs.
> > 2. A node is being used for a mission critical app (Could be an Apex
> app
> > itself), but another Apex app which is mission critical should not
> be using
> > resources on that node.
> >
> > Can we have a way in which, Stram and YARN can coordinate between
> each
> > other to not use a set of nodes for the application. It an be done
> in 2 way
> > s-
> >
> > 1. Have a list of "exclude" nodes with Stram- when YARN allcates
> resources
> > on either of these, STRAM rejects and gets resources allocated again
> frm
> > YARN
> > 2. Have a list of nodes that can be used for an app - This can be a
> part of
> > config. Hwever, I don't think this would be a right way to do so as
> we will
> > need support from YARN as well. Further, this might be difficult to
> change
> > at runtim if need be.
> >
> > Any thoughts?
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
I think “exclude nodes” and such is really the job of the resource manager i.e. 
Yarn. So I am not sure taking over some of these tasks in Apex would be very 
useful.

I agree with Amol that apps should be node neutral. Resource management in Yarn 
together with fault tolerance in Apex should minimize the need for this feature 
although I am sure one can find use cases.


On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:

We do have this feature in Yarn, but that applies to all applications. I am
not sure if Yarn has anti-affinity. This feature may be used, but in
general there is danger is an application taking over resource allocation.
Another quirk is that big data apps should ideally be node-neutral. This is
a good idea, if we are able to carve out something where need is app
specific.

Thks
Amol


On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve  wrote:

> We have seen 2 cases mentioned below, where, it would have been nice if
> Apex allowed us to exclude a node from the cluster for an application.
>
> 1. A node in the cluster had gone bad (was randomly rebooting) and so an
> Apex app should not use it - other apps can use it as they were batch 
jobs.
> 2. A node is being used for a mission critical app (Could be an Apex app
> itself), but another Apex app which is mission critical should not be 
using
> resources on that node.
>
> Can we have a way in which, Stram and YARN can coordinate between each
> other to not use a set of nodes for the application. It an be done in 2 
way
> s-
>
> 1. Have a list of "exclude" nodes with Stram- when YARN allcates resources
> on either of these, STRAM rejects and gets resources allocated again frm
> YARN
> 2. Have a list of nodes that can be used for an app - This can be a part 
of
> config. Hwever, I don't think this would be a right way to do so as we 
will
> need support from YARN as well. Further, this might be difficult to change
> at runtim if need be.
>
> Any thoughts?
>
>
> --
> ~Milind bee at gee mail dot com
>





Re: [VOTE] Apache Apex Malhar Release 3.6.0 (RC1)

2016-11-30 Thread Thomas Weise
Can folks please verify the release.

Thanks

--
sent from mobile
On Nov 26, 2016 6:32 PM, "Thomas Weise"  wrote:

> Dear Community,
>
> Please vote on the following Apache Apex Malhar 3.6.0 release candidate.
>
> This is a source release with binary artifacts published to Maven.
>
> This release is based on Apex Core 3.4 and resolves 69 issues.
>
> The release adds first iteration of SQL support via Apache Calcite, an
> alternative Cassandra output operator (non-transactional, upsert based),
> enrichment operator, improvements to window storage and new user
> documentation for several operators along with many other enhancements and
> bug fixes.
>
> List of all issues fixed: https://s.apache.org/9b0t
> User documentation: http://apex.apache.org/docs/malhar-3.6/
>
> Staging directory:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-malhar-3.6.0-RC1/
> Source zip:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.zip
> Source tar.gz:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.tar.gz
> Maven staging repository:
> https://repository.apache.org/content/repositories/orgapacheapex-1020/
>
> Git source:
> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> commit;h=refs/tags/v3.6.0-RC1
>  (commit: 43d524dc5d5326b8d94593901cad026528bb62a1)
>
> PGP key:
> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
> KEYS file:
> https://dist.apache.org/repos/dist/release/apex/KEYS
>
> More information at:
> http://apex.apache.org
>
> Please try the release and vote; vote will be open util Wed, 11/30 EOD PST
> considering the US holiday weekend.
>
> [ ] +1 approve (and what verification was done)
> [ ] -1 disapprove (and reason why)
>
> http://www.apache.org/foundation/voting.html
>
> How to verify release candidate:
>
> http://apex.apache.org/verification.html
>
> Thanks,
> Thomas
>
>


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708479#comment-15708479
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708481#comment-15708481
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya 
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module




> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya 
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708392#comment-15708392
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya 
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708393#comment-15708393
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya 
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module




> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---