[GitHub] apex-malhar pull request #441: Apexmalhar 2017

2016-10-03 Thread prasannapramod
GitHub user prasannapramod opened a pull request:

https://github.com/apache/apex-malhar/pull/441

Apexmalhar 2017

@tweise @PramodSSImmaneni please review. Originally submitted by pramod. 
Multiple tests were failing with latest, so fixed tests.

I had accidentally pushed wrong branch in the earlier pull request and I 
have since closed it. Please review this one instead. 

@tweise regarding your comments on earlier pull request

a) Extending both the interfaces is for japicmp 
b) Regarding getting the file output operator changes reviewed, Pramod had 
already asked Chandni in his original pull request and she was ok with the 
changes, I will ask her again.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasannapramod/apex-malhar APEXMALHAR-2017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #441


commit 3bebe3b329390c8678b850890d85adff8b61f627
Author: Pramod Immaneni 
Date:   2016-03-17T03:57:39Z

Using CheckpointNotificationListener and beforeCheckpoint callback to do IO 
in a more optimized fashion

commit c6c0185c6eadf357b4e9cadad32f907866a2b53c
Author: Lakshmi Prasanna Velineni 
Date:   2016-10-03T16:01:30Z

APEXMALHAR-2017 - Fixed & Tested




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #440: Apexmalhar 2017

2016-10-03 Thread prasannapramod
Github user prasannapramod closed the pull request at:

https://github.com/apache/apex-malhar/pull/440


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Hadoop upgrade

2016-10-03 Thread Bhupesh Chawda
+1 for 2.6

~ Bhupesh

On Oct 4, 2016 4:14 AM, "Siyuan Hua"  wrote:

> +1 for 2.6.x
>
> On Mon, Oct 3, 2016 at 3:41 PM, Pramod Immaneni 
> wrote:
>
> > +1 for 2.6.x
> >
> > On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:
> >
> > > Hi all,
> > >
> > > Thomas created this ticket for upgrading our Hadoop dependency version
> a
> > > couple weeks ago:
> > >
> > > https://issues.apache.org/jira/browse/APEXCORE-536
> > >
> > > We'd like to get the ball rolling and would like to take a vote from
> the
> > > community which version we would like to upgrade to. We have these
> > choices:
> > >
> > > 2.2.0 (no upgrade)
> > > 2.4.x
> > > 2.5.x
> > > 2.6.x
> > >
> > > We are not considering 2.7.x because we already know that many Apex
> users
> > > are using Hadoop distros that are based on 2.6.
> > >
> > > Please note that Apex works with all versions of Hadoop higher or equal
> > to
> > > the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> > > considering Hadoop 3.0.0-alpha yet at this time.
> > >
> > > When voting, please keep these in mind:
> > >
> > > - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively,
> > and
> > > how useful those features are for Apache Apex
> > > - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR,
> > EMR,
> > > etc) are supporting
> > > - The Hadoop versions what typical Apex users are using
> > >
> > > Thanks,
> > >
> > > David
> > >
> >
>


[jira] [Resolved] (APEXCORE-543) Enhance the ContainerInfo to contain operator id and name.

2016-10-03 Thread Sandesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandesh resolved APEXCORE-543.
--
   Resolution: Fixed
Fix Version/s: 3.5.0

> Enhance the ContainerInfo to contain operator id and name.
> --
>
> Key: APEXCORE-543
> URL: https://issues.apache.org/jira/browse/APEXCORE-543
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sandesh
>Assignee: Sandesh
> Fix For: 3.5.0
>
>
> ContainerInfo contains only the number of operators, enhance it to contain 
> operator id and name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-543) Enhance the ContainerInfo to contain operator id and name.

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543783#comment-15543783
 ] 

ASF GitHub Bot commented on APEXCORE-543:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/400


> Enhance the ContainerInfo to contain operator id and name.
> --
>
> Key: APEXCORE-543
> URL: https://issues.apache.org/jira/browse/APEXCORE-543
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sandesh
>Assignee: Sandesh
>
> ContainerInfo contains only the number of operators, enhance it to contain 
> operator id and name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #400: APEXCORE-543 ContainerInfo will contain Map of ...

2016-10-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/400


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXMALHAR-2273) Retraction trigger is fired incorrectly when fireOnlyUpdatedPanes is true

2016-10-03 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXMALHAR-2273:
--
Fix Version/s: 3.6.0

> Retraction trigger is fired incorrectly when fireOnlyUpdatedPanes is true
> -
>
> Key: APEXMALHAR-2273
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2273
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
> Fix For: 3.6.0
>
>
> Retraction trigger is fired incorrectly if the retraction value is the same 
> as the current value and when fireOnlyUpdatedPanes is true. 
> When fireOnlyUpdatedPanes is true, if the value does not change, there should 
> not be any trigger fired for that value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2273) Retraction trigger is fired incorrectly when fireOnlyUpdatedPanes is true

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543648#comment-15543648
 ] 

ASF GitHub Bot commented on APEXMALHAR-2273:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/436


> Retraction trigger is fired incorrectly when fireOnlyUpdatedPanes is true
> -
>
> Key: APEXMALHAR-2273
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2273
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
>
> Retraction trigger is fired incorrectly if the retraction value is the same 
> as the current value and when fireOnlyUpdatedPanes is true. 
> When fireOnlyUpdatedPanes is true, if the value does not change, there should 
> not be any trigger fired for that value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #436: APEXMALHAR-2273 #resolve do not fire retracti...

2016-10-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/436


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Hadoop upgrade

2016-10-03 Thread Pramod Immaneni
+1 for 2.6.x

On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:

> Hi all,
>
> Thomas created this ticket for upgrading our Hadoop dependency version a
> couple weeks ago:
>
> https://issues.apache.org/jira/browse/APEXCORE-536
>
> We'd like to get the ball rolling and would like to take a vote from the
> community which version we would like to upgrade to. We have these choices:
>
> 2.2.0 (no upgrade)
> 2.4.x
> 2.5.x
> 2.6.x
>
> We are not considering 2.7.x because we already know that many Apex users
> are using Hadoop distros that are based on 2.6.
>
> Please note that Apex works with all versions of Hadoop higher or equal to
> the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> considering Hadoop 3.0.0-alpha yet at this time.
>
> When voting, please keep these in mind:
>
> - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively, and
> how useful those features are for Apache Apex
> - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR, EMR,
> etc) are supporting
> - The Hadoop versions what typical Apex users are using
>
> Thanks,
>
> David
>


Re: [Discuss] Hadoop upgrade

2016-10-03 Thread Pramod Immaneni
Thanks for the information.

On Mon, Oct 3, 2016 at 3:30 PM, David Yan  wrote:

> Just from scanning the API, I see the following features additions from 2.4
> to 2.6:
>
> - Ability to get memory-seconds and vcore-seconds
> from ApplicationResourceUsageReport.
> - Node labels
> - Resource reservation
>
> We already know we would like to use the following features that were added
> since 2.2 to 2.4:
> - Application tags
> - API that returns information about Application Attempts
>
> David
>
> On Mon, Oct 3, 2016 at 2:02 PM, Pramod Immaneni 
> wrote:
>
> > It would be good to know the API additions between 2.4.x and 2.6.x that
> we
> > think we can use immediately for some feature, to have a more informed
> > vote.
> >
> > Thanks
> >
> > On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:
> >
> > > Hi all,
> > >
> > > Thomas created this ticket for upgrading our Hadoop dependency version
> a
> > > couple weeks ago:
> > >
> > > https://issues.apache.org/jira/browse/APEXCORE-536
> > >
> > > We'd like to get the ball rolling and would like to take a vote from
> the
> > > community which version we would like to upgrade to. We have these
> > choices:
> > >
> > > 2.2.0 (no upgrade)
> > > 2.4.x
> > > 2.5.x
> > > 2.6.x
> > >
> > > We are not considering 2.7.x because we already know that many Apex
> users
> > > are using Hadoop distros that are based on 2.6.
> > >
> > > Please note that Apex works with all versions of Hadoop higher or equal
> > to
> > > the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> > > considering Hadoop 3.0.0-alpha yet at this time.
> > >
> > > When voting, please keep these in mind:
> > >
> > > - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively,
> > and
> > > how useful those features are for Apache Apex
> > > - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR,
> > EMR,
> > > etc) are supporting
> > > - The Hadoop versions what typical Apex users are using
> > >
> > > Thanks,
> > >
> > > David
> > >
> >
>


[jira] [Resolved] (APEXCORE-519) Add support for DIGEST enabled hadoop web services environment

2016-10-03 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov resolved APEXCORE-519.
-
Resolution: Implemented

> Add support for DIGEST enabled hadoop web services environment
> --
>
> Key: APEXCORE-519
> URL: https://issues.apache.org/jira/browse/APEXCORE-519
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> Add support for Apex to work with DIGEST enabled Hadoop web services. Work 
> would have common elements with BASIC authentication, APEXCORE-517



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXCORE-519) Add support for DIGEST enabled hadoop web services environment

2016-10-03 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov resolved APEXCORE-519.
-
Resolution: Fixed

> Add support for DIGEST enabled hadoop web services environment
> --
>
> Key: APEXCORE-519
> URL: https://issues.apache.org/jira/browse/APEXCORE-519
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> Add support for Apex to work with DIGEST enabled Hadoop web services. Work 
> would have common elements with BASIC authentication, APEXCORE-517



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (APEXCORE-519) Add support for DIGEST enabled hadoop web services environment

2016-10-03 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov reopened APEXCORE-519:
-

> Add support for DIGEST enabled hadoop web services environment
> --
>
> Key: APEXCORE-519
> URL: https://issues.apache.org/jira/browse/APEXCORE-519
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> Add support for Apex to work with DIGEST enabled Hadoop web services. Work 
> would have common elements with BASIC authentication, APEXCORE-517



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-474) Unifier placement during M*1 deployment

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543634#comment-15543634
 ] 

ASF GitHub Bot commented on APEXCORE-474:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/395


> Unifier placement during M*1 deployment
> ---
>
> Key: APEXCORE-474
> URL: https://issues.apache.org/jira/browse/APEXCORE-474
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sandesh
>Assignee: Sandesh
> Fix For: 3.5.0
>
>
> During M*1 deployment, unifier was deployed in the separate container. But 
> there is no advantage in doing that. 
> It is better to make the unifier THREAD_LOCAL with the downstream operator.
> ( https://issues.apache.org/jira/browse/APEXCORE-482 )
> Note:
> Recently saw one Kafka ETL app, that had a total of 18 containers allocated, 
> but out of that 5 containers were allocated for default unifiers. It also 
> means, lots of time is spent in SerDe. 
> Implementing this feature will improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-519) Add support for DIGEST enabled hadoop web services environment

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543635#comment-15543635
 ] 

ASF GitHub Bot commented on APEXCORE-519:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/390


> Add support for DIGEST enabled hadoop web services environment
> --
>
> Key: APEXCORE-519
> URL: https://issues.apache.org/jira/browse/APEXCORE-519
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> Add support for Apex to work with DIGEST enabled Hadoop web services. Work 
> would have common elements with BASIC authentication, APEXCORE-517



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #390: APEXCORE-519 Added DIGEST authentication

2016-10-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/390


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-core pull request #395: APEXCORE-474 In M*1 case, deploy the unifier in...

2016-10-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/395


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [Discuss] Hadoop upgrade

2016-10-03 Thread David Yan
Just from scanning the API, I see the following features additions from 2.4
to 2.6:

- Ability to get memory-seconds and vcore-seconds
from ApplicationResourceUsageReport.
- Node labels
- Resource reservation

We already know we would like to use the following features that were added
since 2.2 to 2.4:
- Application tags
- API that returns information about Application Attempts

David

On Mon, Oct 3, 2016 at 2:02 PM, Pramod Immaneni 
wrote:

> It would be good to know the API additions between 2.4.x and 2.6.x that we
> think we can use immediately for some feature, to have a more informed
> vote.
>
> Thanks
>
> On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:
>
> > Hi all,
> >
> > Thomas created this ticket for upgrading our Hadoop dependency version a
> > couple weeks ago:
> >
> > https://issues.apache.org/jira/browse/APEXCORE-536
> >
> > We'd like to get the ball rolling and would like to take a vote from the
> > community which version we would like to upgrade to. We have these
> choices:
> >
> > 2.2.0 (no upgrade)
> > 2.4.x
> > 2.5.x
> > 2.6.x
> >
> > We are not considering 2.7.x because we already know that many Apex users
> > are using Hadoop distros that are based on 2.6.
> >
> > Please note that Apex works with all versions of Hadoop higher or equal
> to
> > the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> > considering Hadoop 3.0.0-alpha yet at this time.
> >
> > When voting, please keep these in mind:
> >
> > - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively,
> and
> > how useful those features are for Apache Apex
> > - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR,
> EMR,
> > etc) are supporting
> > - The Hadoop versions what typical Apex users are using
> >
> > Thanks,
> >
> > David
> >
>


[jira] [Commented] (APEXMALHAR-2254) File input operator is not idempotent with closing files on replay

2016-10-03 Thread Matt Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543549#comment-15543549
 ] 

Matt Zhang commented on APEXMALHAR-2254:


I'm interested in working on this.

Thanks,
Matt

> File input operator is not idempotent with closing files on replay
> --
>
> Key: APEXMALHAR-2254
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2254
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> With the file input operator, on a replay in a failure scenario, the same 
> data is output as before the failure, for every window that is being replayed 
> after checkpoint. To do this the operator keeps track of the files and 
> offsets for every window and replays the data based on that. 
> However, if it so happens that before the failure the processing of a file 
> was finished and it was closed exactly before the end window and the next 
> file was opened and processed in a new window, in the replay the closing of 
> the first file does not happen in earlier window but happens in the latter 
> window. This can cause problems if an operator depends on the closing file 
> also to happen in an idempotent manner.
> Improve the operator to save the closing and opening of files in the 
> idempotent state as well so that it can also happen in an idempotent manner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Hadoop upgrade

2016-10-03 Thread Devendra Tagare
+1 for 2.6 upgrade.

Thanks,
Dev

On Mon, Oct 3, 2016 at 2:07 PM, Sandesh Hegde 
wrote:

> +1 for 2.6
>
> On Mon, Oct 3, 2016, 2:06 PM Sasha Parfenov  wrote:
>
> > +1 for Hadoop 2.6 upgrade.
> >
> > Thanks,
> > Sasha
> >
> > On Monday, October 3, 2016, Thomas Weise  wrote:
> >
> > > +1 for 2.6 upgrade
> > >
> > >
> > > On Mon, Oct 3, 2016 at 1:47 PM, David Yan  > > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thomas created this ticket for upgrading our Hadoop dependency
> version
> > a
> > > > couple weeks ago:
> > > >
> > > > https://issues.apache.org/jira/browse/APEXCORE-536
> > > >
> > > > We'd like to get the ball rolling and would like to take a vote from
> > the
> > > > community which version we would like to upgrade to. We have these
> > > choices:
> > > >
> > > > 2.2.0 (no upgrade)
> > > > 2.4.x
> > > > 2.5.x
> > > > 2.6.x
> > > >
> > > > We are not considering 2.7.x because we already know that many Apex
> > users
> > > > are using Hadoop distros that are based on 2.6.
> > > >
> > > > Please note that Apex works with all versions of Hadoop higher or
> equal
> > > to
> > > > the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> > > > considering Hadoop 3.0.0-alpha yet at this time.
> > > >
> > > > When voting, please keep these in mind:
> > > >
> > > > - The features that are added in 2.4.x, 2.5.x, and 2.6.x
> respectively,
> > > and
> > > > how useful those features are for Apache Apex
> > > > - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR,
> > > EMR,
> > > > etc) are supporting
> > > > - The Hadoop versions what typical Apex users are using
> > > >
> > > > Thanks,
> > > >
> > > > David
> > > >
> > >
> >
>


Re: [VOTE] Hadoop upgrade

2016-10-03 Thread Sandesh Hegde
+1 for 2.6

On Mon, Oct 3, 2016, 2:06 PM Sasha Parfenov  wrote:

> +1 for Hadoop 2.6 upgrade.
>
> Thanks,
> Sasha
>
> On Monday, October 3, 2016, Thomas Weise  wrote:
>
> > +1 for 2.6 upgrade
> >
> >
> > On Mon, Oct 3, 2016 at 1:47 PM, David Yan  > > wrote:
> >
> > > Hi all,
> > >
> > > Thomas created this ticket for upgrading our Hadoop dependency version
> a
> > > couple weeks ago:
> > >
> > > https://issues.apache.org/jira/browse/APEXCORE-536
> > >
> > > We'd like to get the ball rolling and would like to take a vote from
> the
> > > community which version we would like to upgrade to. We have these
> > choices:
> > >
> > > 2.2.0 (no upgrade)
> > > 2.4.x
> > > 2.5.x
> > > 2.6.x
> > >
> > > We are not considering 2.7.x because we already know that many Apex
> users
> > > are using Hadoop distros that are based on 2.6.
> > >
> > > Please note that Apex works with all versions of Hadoop higher or equal
> > to
> > > the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> > > considering Hadoop 3.0.0-alpha yet at this time.
> > >
> > > When voting, please keep these in mind:
> > >
> > > - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively,
> > and
> > > how useful those features are for Apache Apex
> > > - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR,
> > EMR,
> > > etc) are supporting
> > > - The Hadoop versions what typical Apex users are using
> > >
> > > Thanks,
> > >
> > > David
> > >
> >
>


Re: [VOTE] Hadoop upgrade

2016-10-03 Thread Sasha Parfenov
+1 for Hadoop 2.6 upgrade.

Thanks,
Sasha

On Monday, October 3, 2016, Thomas Weise  wrote:

> +1 for 2.6 upgrade
>
>
> On Mon, Oct 3, 2016 at 1:47 PM, David Yan  > wrote:
>
> > Hi all,
> >
> > Thomas created this ticket for upgrading our Hadoop dependency version a
> > couple weeks ago:
> >
> > https://issues.apache.org/jira/browse/APEXCORE-536
> >
> > We'd like to get the ball rolling and would like to take a vote from the
> > community which version we would like to upgrade to. We have these
> choices:
> >
> > 2.2.0 (no upgrade)
> > 2.4.x
> > 2.5.x
> > 2.6.x
> >
> > We are not considering 2.7.x because we already know that many Apex users
> > are using Hadoop distros that are based on 2.6.
> >
> > Please note that Apex works with all versions of Hadoop higher or equal
> to
> > the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> > considering Hadoop 3.0.0-alpha yet at this time.
> >
> > When voting, please keep these in mind:
> >
> > - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively,
> and
> > how useful those features are for Apache Apex
> > - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR,
> EMR,
> > etc) are supporting
> > - The Hadoop versions what typical Apex users are using
> >
> > Thanks,
> >
> > David
> >
>


Re: [VOTE] Hadoop upgrade

2016-10-03 Thread Thomas Weise
+1 for 2.6 upgrade


On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:

> Hi all,
>
> Thomas created this ticket for upgrading our Hadoop dependency version a
> couple weeks ago:
>
> https://issues.apache.org/jira/browse/APEXCORE-536
>
> We'd like to get the ball rolling and would like to take a vote from the
> community which version we would like to upgrade to. We have these choices:
>
> 2.2.0 (no upgrade)
> 2.4.x
> 2.5.x
> 2.6.x
>
> We are not considering 2.7.x because we already know that many Apex users
> are using Hadoop distros that are based on 2.6.
>
> Please note that Apex works with all versions of Hadoop higher or equal to
> the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> considering Hadoop 3.0.0-alpha yet at this time.
>
> When voting, please keep these in mind:
>
> - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively, and
> how useful those features are for Apache Apex
> - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR, EMR,
> etc) are supporting
> - The Hadoop versions what typical Apex users are using
>
> Thanks,
>
> David
>


[Discuss] Hadoop upgrade

2016-10-03 Thread Pramod Immaneni
It would be good to know the API additions between 2.4.x and 2.6.x that we
think we can use immediately for some feature, to have a more informed vote.

Thanks

On Mon, Oct 3, 2016 at 1:47 PM, David Yan  wrote:

> Hi all,
>
> Thomas created this ticket for upgrading our Hadoop dependency version a
> couple weeks ago:
>
> https://issues.apache.org/jira/browse/APEXCORE-536
>
> We'd like to get the ball rolling and would like to take a vote from the
> community which version we would like to upgrade to. We have these choices:
>
> 2.2.0 (no upgrade)
> 2.4.x
> 2.5.x
> 2.6.x
>
> We are not considering 2.7.x because we already know that many Apex users
> are using Hadoop distros that are based on 2.6.
>
> Please note that Apex works with all versions of Hadoop higher or equal to
> the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
> considering Hadoop 3.0.0-alpha yet at this time.
>
> When voting, please keep these in mind:
>
> - The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively, and
> how useful those features are for Apache Apex
> - The Hadoop versions the major distros (Cloudera, Hortonworks, MapR, EMR,
> etc) are supporting
> - The Hadoop versions what typical Apex users are using
>
> Thanks,
>
> David
>


[VOTE] Hadoop upgrade

2016-10-03 Thread David Yan
Hi all,

Thomas created this ticket for upgrading our Hadoop dependency version a
couple weeks ago:

https://issues.apache.org/jira/browse/APEXCORE-536

We'd like to get the ball rolling and would like to take a vote from the
community which version we would like to upgrade to. We have these choices:

2.2.0 (no upgrade)
2.4.x
2.5.x
2.6.x

We are not considering 2.7.x because we already know that many Apex users
are using Hadoop distros that are based on 2.6.

Please note that Apex works with all versions of Hadoop higher or equal to
the Hadoop version Apex depends on, as long as it's 2.x.x. We are not
considering Hadoop 3.0.0-alpha yet at this time.

When voting, please keep these in mind:

- The features that are added in 2.4.x, 2.5.x, and 2.6.x respectively, and
how useful those features are for Apache Apex
- The Hadoop versions the major distros (Cloudera, Hortonworks, MapR, EMR,
etc) are supporting
- The Hadoop versions what typical Apex users are using

Thanks,

David


[jira] [Created] (APEXCORE-546) Web site: Add use case section to powered by page

2016-10-03 Thread Thomas Weise (JIRA)
Thomas Weise created APEXCORE-546:
-

 Summary: Web site: Add use case section to powered by page
 Key: APEXCORE-546
 URL: https://issues.apache.org/jira/browse/APEXCORE-546
 Project: Apache Apex Core
  Issue Type: Task
Reporter: Thomas Weise


http://apex.apache.org/powered-by-apex.html

The page currently has a list of users. For some of the organizations, we have 
presentations on the use cases. There should be a section on the page that 
lists those use cases with links to slide shares and videos.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXCORE-474) Unifier placement during M*1 deployment

2016-10-03 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved APEXCORE-474.
---
   Resolution: Fixed
Fix Version/s: 3.5.0

> Unifier placement during M*1 deployment
> ---
>
> Key: APEXCORE-474
> URL: https://issues.apache.org/jira/browse/APEXCORE-474
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sandesh
>Assignee: Sandesh
> Fix For: 3.5.0
>
>
> During M*1 deployment, unifier was deployed in the separate container. But 
> there is no advantage in doing that. 
> It is better to make the unifier THREAD_LOCAL with the downstream operator.
> ( https://issues.apache.org/jira/browse/APEXCORE-482 )
> Note:
> Recently saw one Kafka ETL app, that had a total of 18 containers allocated, 
> but out of that 5 containers were allocated for default unifiers. It also 
> means, lots of time is spent in SerDe. 
> Implementing this feature will improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-10-03 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543057#comment-15543057
 ] 

Siyuan Hua commented on APEXMALHAR-2220:


Sorry, I mean org.apache.apex.malhar.lib.function would be good

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-10-03 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542972#comment-15542972
 ] 

Siyuan Hua commented on APEXMALHAR-2220:


Maybe call them com.datatorrent.lib.function?

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kudu store operators

2016-10-03 Thread Mohit Jotwani
+1

Regards,
Mohit

On Mon, Oct 3, 2016 at 8:42 AM, Chaitanya Chebolu  wrote:

> +1
>
> Regards,
> Chaitanya
>
> On Mon, Oct 3, 2016 at 6:01 PM, Sanjay Pujare 
> wrote:
>
> > +1
> >
> > On Oct 3, 2016 5:33 PM, "Sandeep Deshmukh" 
> > wrote:
> >
> > > +1
> > >
> > > Regards,
> > > Sandeep
> > >
> > > On Mon, Oct 3, 2016 at 10:16 AM, Tushar Gosavi  >
> > > wrote:
> > >
> > > > +1, It will be great to have this operator.
> > > >
> > > > - Tushar.
> > > >
> > > > On Mon, Oct 3, 2016 at 8:15 AM, Chinmay Kolhatkar
> > > >  wrote:
> > > > > +1.
> > > > >
> > > > > - Chinmay.
> > > > >
> > > > > On 3 Oct 2016 7:25 a.m., "Amol Kekre" 
> wrote:
> > > > >
> > > > >> Ananth,
> > > > >> This would be great to have. +1
> > > > >>
> > > > >> Thks
> > > > >> Amol
> > > > >>
> > > > >> On Sun, Oct 2, 2016 at 8:38 AM, Munagala Ramanath <
> > > r...@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >> > +1
> > > > >> >
> > > > >> > Kudu looks impressive from the overview, though it seems to
> still
> > be
> > > > >> > maturing.
> > > > >> >
> > > > >> > Ram
> > > > >> >
> > > > >> >
> > > > >> > On Sat, Oct 1, 2016 at 11:42 PM, ananth  >
> > > > wrote:
> > > > >> >
> > > > >> > > Hello All,
> > > > >> > >
> > > > >> > > I was wondering if it would be worthwhile for the community to
> > > > consider
> > > > >> > > support for Apache Kudu as a store ( as a contrib operator
> > inside
> > > > >> Apache
> > > > >> > > Malhar ) .
> > > > >> > >
> > > > >> > > Here are some benefits I see:
> > > > >> > >
> > > > >> > > 1. Kudu is just declared 1.0 and has just been declared
> > production
> > > > >> ready.
> > > > >> > > 2. Kudu as a store might a good a fit for many architectures
> in
> > > the
> > > > >> > >years to come because of its capabilities to provide
> > mutability
> > > > of
> > > > >> > >data ( unlike HDFS ) and optimized storage formats for
> scans.
> > > > >> > > 3. It seems to also withstand high-throughput write patterns
> > which
> > > > >> > >makes it a stable sink for Apex workflows which operate at
> > very
> > > > high
> > > > >> > >volumes.
> > > > >> > >
> > > > >> > >
> > > > >> > > Here are some links
> > > > >> > >
> > > > >> > >  *  From the recent Strata conference
> > > > >> > >https://kudu.apache.org/2016/09/26/strata-nyc-kudu-talks.
> > html
> > > > >> > >  * https://kudu.apache.org/overview.html
> > > > >> > >
> > > > >> > > I can implement this operator if the community feels it is
> worth
> > > > adding
> > > > >> > it
> > > > >> > > to our code base. If so, could someone please assign the JIRA
> to
> > > > me. I
> > > > >> > have
> > > > >> > > created this JIRA to track this :
> > https://issues.apache.org/jira
> > > > >> > > /browse/APEXMALHAR-2278
> > > > >> > >
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > >
> > > > >> > > Ananth
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
>


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-10-03 Thread Dongming Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542762#comment-15542762
 ] 

Dongming Liang commented on APEXMALHAR-2220:


[~siyuan]I'm not sure where we should merge them into. Should it be a new 
package? Or should we put them under somewhere under 
"com.datatorrent.lib.stream"? Please clarify. Thanks!

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #440: Apexmalhar 2017

2016-10-03 Thread prasannapramod
GitHub user prasannapramod opened a pull request:

https://github.com/apache/apex-malhar/pull/440

Apexmalhar 2017

@tweise @PramodSSImmaneni please review. Originally submitted by pramod. 
Multiple tests were failing with latest, so fixed tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasannapramod/apex-malhar APEXMALHAR-2017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #440


commit 0339e2cea68b6c10b6082a1c9bbd5bd293ef8612
Author: Pramod Immaneni 
Date:   2016-03-17T03:57:39Z

Using CheckpointNotificationListener and beforeCheckpoint callback to do IO 
in a more optimized fashion

commit 3599abbba304f7ab7fa333484ca701d817625c24
Author: Pramod Immaneni 
Date:   2016-03-17T17:21:07Z

Fixed fileoutput operator failing tests

commit ae1c2eabb64382af5e20f905112a7d67b7fe80e1
Author: Pramod Immaneni 
Date:   2016-03-17T17:42:45Z

Fixes for japicmp

commit ebd5e72e24b6d6da5e4ce8dde3508465c1bf08a8
Author: Pramod Immaneni 
Date:   2016-03-17T18:08:31Z

Fixed checkstyle issues

commit b410e2055c685fce98d4e7db81c7fc5d19d96a88
Author: Pramod Immaneni 
Date:   2016-03-20T19:15:28Z

Added missing method causing build error

commit 196258b0142a0b8acf9510e97779938b63cf3256
Author: Pramod Immaneni 
Date:   2016-03-20T19:21:37Z

Fixed japicmp error

commit e9af80b61eb42bf481ee78a6a61f03eb3bf70331
Author: Lakshmi Prasanna Velineni 
Date:   2016-10-02T18:42:36Z

Cleared Merge Conflicts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2017) Use pre checkpoint notification to optimize operator IO

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542746#comment-15542746
 ] 

ASF GitHub Bot commented on APEXMALHAR-2017:


Github user PramodSSImmaneni closed the pull request at:

https://github.com/apache/apex-malhar/pull/218


> Use pre checkpoint notification to optimize operator IO
> ---
>
> Key: APEXMALHAR-2017
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2017
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Velineni Lakshmi Prasanna
>
> Currently many output operators enforce persistence of data on endWindow by 
> calling flush, hflush or equivalent calls. This was done to help recovery. 
> Doing this always ensures that the data corresponding to checkpoint state at 
> recovery is always present.
> A recent addition to the engine lets the operators know about an impending 
> checkpoint just before it happens using a callback. Operators can now enforce 
> persistence of data one time in this in this callback instead of end of every 
> window. This results in better performance as data is not being frequently 
> written to persistent storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kudu store operators

2016-10-03 Thread Chaitanya Chebolu
+1

Regards,
Chaitanya

On Mon, Oct 3, 2016 at 6:01 PM, Sanjay Pujare 
wrote:

> +1
>
> On Oct 3, 2016 5:33 PM, "Sandeep Deshmukh" 
> wrote:
>
> > +1
> >
> > Regards,
> > Sandeep
> >
> > On Mon, Oct 3, 2016 at 10:16 AM, Tushar Gosavi 
> > wrote:
> >
> > > +1, It will be great to have this operator.
> > >
> > > - Tushar.
> > >
> > > On Mon, Oct 3, 2016 at 8:15 AM, Chinmay Kolhatkar
> > >  wrote:
> > > > +1.
> > > >
> > > > - Chinmay.
> > > >
> > > > On 3 Oct 2016 7:25 a.m., "Amol Kekre"  wrote:
> > > >
> > > >> Ananth,
> > > >> This would be great to have. +1
> > > >>
> > > >> Thks
> > > >> Amol
> > > >>
> > > >> On Sun, Oct 2, 2016 at 8:38 AM, Munagala Ramanath <
> > r...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > Kudu looks impressive from the overview, though it seems to still
> be
> > > >> > maturing.
> > > >> >
> > > >> > Ram
> > > >> >
> > > >> >
> > > >> > On Sat, Oct 1, 2016 at 11:42 PM, ananth 
> > > wrote:
> > > >> >
> > > >> > > Hello All,
> > > >> > >
> > > >> > > I was wondering if it would be worthwhile for the community to
> > > consider
> > > >> > > support for Apache Kudu as a store ( as a contrib operator
> inside
> > > >> Apache
> > > >> > > Malhar ) .
> > > >> > >
> > > >> > > Here are some benefits I see:
> > > >> > >
> > > >> > > 1. Kudu is just declared 1.0 and has just been declared
> production
> > > >> ready.
> > > >> > > 2. Kudu as a store might a good a fit for many architectures in
> > the
> > > >> > >years to come because of its capabilities to provide
> mutability
> > > of
> > > >> > >data ( unlike HDFS ) and optimized storage formats for scans.
> > > >> > > 3. It seems to also withstand high-throughput write patterns
> which
> > > >> > >makes it a stable sink for Apex workflows which operate at
> very
> > > high
> > > >> > >volumes.
> > > >> > >
> > > >> > >
> > > >> > > Here are some links
> > > >> > >
> > > >> > >  *  From the recent Strata conference
> > > >> > >https://kudu.apache.org/2016/09/26/strata-nyc-kudu-talks.
> html
> > > >> > >  * https://kudu.apache.org/overview.html
> > > >> > >
> > > >> > > I can implement this operator if the community feels it is worth
> > > adding
> > > >> > it
> > > >> > > to our code base. If so, could someone please assign the JIRA to
> > > me. I
> > > >> > have
> > > >> > > created this JIRA to track this :
> https://issues.apache.org/jira
> > > >> > > /browse/APEXMALHAR-2278
> > > >> > >
> > > >> > >
> > > >> > > Regards,
> > > >> > >
> > > >> > > Ananth
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>


[jira] [Commented] (APEXCORE-542) Fix debug level verbose option for apex cli

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542492#comment-15542492
 ] 

ASF GitHub Bot commented on APEXCORE-542:
-

GitHub user deepak-narkhede opened a pull request:

https://github.com/apache/apex-core/pull/403

APEXCORE-542 - Fix debug level verbose option for Apex cli.

This changes fixes the debug level console logging for Apex cli.

The default logger level is INFO level for Apex cli. So  even if we specify 
the threshold level as DEBUG for ConsoleAppender, the default parent level less 
that the Console Appender level. Hence it is not considered for logging. 
Fix is that logger level is set to DEBUG initially and threshold level for 
ConsoleAppender is set as per the argument passed to Apex Cli.

Note: Current two appenders are used Event and Console Appenders.

For instrumentation details and unit testing please find the attached file.
[Uploading apex-cli-debug-verbose-fix.log…]()

 

[apex-cli-debug-verbose-fix.txt](https://github.com/apache/apex-core/files/506397/apex-cli-debug-verbose-fix.txt)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-core APEXCORE-542-review

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #403


commit cf3d5de028bf195478bfd5ec333e4c8411a57194
Author: deepak-narkhede 
Date:   2016-10-03T13:53:07Z

APEXCORE-542 - Fix debug level verbose option for Apex cli.




> Fix debug level verbose option for apex cli
> ---
>
> Key: APEXCORE-542
> URL: https://issues.apache.org/jira/browse/APEXCORE-542
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
>Priority: Minor
>
> Fix debug level verbose option for apex cli. Currently "" option displays 
> INFO level but it must display debug level messages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kudu store operators

2016-10-03 Thread Sanjay Pujare
+1

On Oct 3, 2016 5:33 PM, "Sandeep Deshmukh"  wrote:

> +1
>
> Regards,
> Sandeep
>
> On Mon, Oct 3, 2016 at 10:16 AM, Tushar Gosavi 
> wrote:
>
> > +1, It will be great to have this operator.
> >
> > - Tushar.
> >
> > On Mon, Oct 3, 2016 at 8:15 AM, Chinmay Kolhatkar
> >  wrote:
> > > +1.
> > >
> > > - Chinmay.
> > >
> > > On 3 Oct 2016 7:25 a.m., "Amol Kekre"  wrote:
> > >
> > >> Ananth,
> > >> This would be great to have. +1
> > >>
> > >> Thks
> > >> Amol
> > >>
> > >> On Sun, Oct 2, 2016 at 8:38 AM, Munagala Ramanath <
> r...@datatorrent.com>
> > >> wrote:
> > >>
> > >> > +1
> > >> >
> > >> > Kudu looks impressive from the overview, though it seems to still be
> > >> > maturing.
> > >> >
> > >> > Ram
> > >> >
> > >> >
> > >> > On Sat, Oct 1, 2016 at 11:42 PM, ananth 
> > wrote:
> > >> >
> > >> > > Hello All,
> > >> > >
> > >> > > I was wondering if it would be worthwhile for the community to
> > consider
> > >> > > support for Apache Kudu as a store ( as a contrib operator inside
> > >> Apache
> > >> > > Malhar ) .
> > >> > >
> > >> > > Here are some benefits I see:
> > >> > >
> > >> > > 1. Kudu is just declared 1.0 and has just been declared production
> > >> ready.
> > >> > > 2. Kudu as a store might a good a fit for many architectures in
> the
> > >> > >years to come because of its capabilities to provide mutability
> > of
> > >> > >data ( unlike HDFS ) and optimized storage formats for scans.
> > >> > > 3. It seems to also withstand high-throughput write patterns which
> > >> > >makes it a stable sink for Apex workflows which operate at very
> > high
> > >> > >volumes.
> > >> > >
> > >> > >
> > >> > > Here are some links
> > >> > >
> > >> > >  *  From the recent Strata conference
> > >> > >https://kudu.apache.org/2016/09/26/strata-nyc-kudu-talks.html
> > >> > >  * https://kudu.apache.org/overview.html
> > >> > >
> > >> > > I can implement this operator if the community feels it is worth
> > adding
> > >> > it
> > >> > > to our code base. If so, could someone please assign the JIRA to
> > me. I
> > >> > have
> > >> > > created this JIRA to track this : https://issues.apache.org/jira
> > >> > > /browse/APEXMALHAR-2278
> > >> > >
> > >> > >
> > >> > > Regards,
> > >> > >
> > >> > > Ananth
> > >> > >
> > >> > >
> > >> >
> > >>
> >
>


Re: Kudu store operators

2016-10-03 Thread Sandeep Deshmukh
+1

Regards,
Sandeep

On Mon, Oct 3, 2016 at 10:16 AM, Tushar Gosavi 
wrote:

> +1, It will be great to have this operator.
>
> - Tushar.
>
> On Mon, Oct 3, 2016 at 8:15 AM, Chinmay Kolhatkar
>  wrote:
> > +1.
> >
> > - Chinmay.
> >
> > On 3 Oct 2016 7:25 a.m., "Amol Kekre"  wrote:
> >
> >> Ananth,
> >> This would be great to have. +1
> >>
> >> Thks
> >> Amol
> >>
> >> On Sun, Oct 2, 2016 at 8:38 AM, Munagala Ramanath 
> >> wrote:
> >>
> >> > +1
> >> >
> >> > Kudu looks impressive from the overview, though it seems to still be
> >> > maturing.
> >> >
> >> > Ram
> >> >
> >> >
> >> > On Sat, Oct 1, 2016 at 11:42 PM, ananth 
> wrote:
> >> >
> >> > > Hello All,
> >> > >
> >> > > I was wondering if it would be worthwhile for the community to
> consider
> >> > > support for Apache Kudu as a store ( as a contrib operator inside
> >> Apache
> >> > > Malhar ) .
> >> > >
> >> > > Here are some benefits I see:
> >> > >
> >> > > 1. Kudu is just declared 1.0 and has just been declared production
> >> ready.
> >> > > 2. Kudu as a store might a good a fit for many architectures in the
> >> > >years to come because of its capabilities to provide mutability
> of
> >> > >data ( unlike HDFS ) and optimized storage formats for scans.
> >> > > 3. It seems to also withstand high-throughput write patterns which
> >> > >makes it a stable sink for Apex workflows which operate at very
> high
> >> > >volumes.
> >> > >
> >> > >
> >> > > Here are some links
> >> > >
> >> > >  *  From the recent Strata conference
> >> > >https://kudu.apache.org/2016/09/26/strata-nyc-kudu-talks.html
> >> > >  * https://kudu.apache.org/overview.html
> >> > >
> >> > > I can implement this operator if the community feels it is worth
> adding
> >> > it
> >> > > to our code base. If so, could someone please assign the JIRA to
> me. I
> >> > have
> >> > > created this JIRA to track this : https://issues.apache.org/jira
> >> > > /browse/APEXMALHAR-2278
> >> > >
> >> > >
> >> > > Regards,
> >> > >
> >> > > Ananth
> >> > >
> >> > >
> >> >
> >>
>


Re: Fixed Width Record Parser

2016-10-03 Thread Chinmay Kolhatkar
Hi Hitesh,

In general I'm not in favor of reinventing the wheels. Because, for one, it
takes effort to maintain the library, secondly, self written library might
take longer time to mature and become stable for production use.

Hence, -1 from me for creating own library for fixed length parsing.

I saw the libraries that you proposed and want to add one more library to
the list - jFFP (http://jffp.sourceforge.net/).

To me jFFP and univocity looks good options. I'm personally more inclined
towards univocity because it seems to be active in development (last commit
4 days ago) and secondly this library has been used in Fixed Length File
Loader for Enrichment.

My overall vote is to use univocity as much as possible and if there is any
missing (& important to us) feature in univocity, that should be added over
top in our operator.

Thanks,
Chinmay.


On Mon, Oct 3, 2016 at 2:12 PM, Hitesh Kapoor 
wrote:

> Hi All,
>
> Thank you for your feedback.
> So as per the votes/comments, I will not be going ahead with approach 2 as
> it is not clean.
>
> For approach 1, I have looked at the possibility to use existing parsing
> libraries like flatworm, flatpack, univocity,
> following are the problems with using exisiting libraries:
> 1) These libraries take input schema in a specific format and are
> complicated to use.
> For example the most famous library (as per stackoverflow) flatworm will
> involve giving the input schema in Xml format (refer
> http://flatworm.sourceforge.net/) so we will loose our consistency with
> existing parsers like CsvParser, where we take i/p in JSON format. Not only
> the consistency it will be more difficult for the user to give input in
> flatworm specific XML.
> If we decide to convert our JSON to Flatworm specific Xml, it will involve
> lot more work then to write your own library.
> 2)  Does only limited type checking for example for a Date type if it
> adheres to dd/mm/, a date may parse correctly for i/p 12/13/2000 (month
> is beyond 12) .
> 3) Difficult to handle Boolean and Date datatypes.
> 4) Future scalability may take a hit. For example if we want to add more
> constraints to our parser like min value for an integer or a pattern for a
> string , it won't be possible to do it with existing libraries.
> 5) To retrieve the values to create a POJO is not user (coder) friendly.
>
> According to me we should write our own library to do the parsing and
> validation  as to use an existing library will involve more work.
> The work involved in coding the library is easy and straightforward.
> It will be easier for us to scale and also provide an easy life for the end
> user to provide the input schema.
> The reason we are not going ahead with approach 2 is that it is not clean,
> the twisting and turning involved in using (forcefully using) existing
> libraries appears more dirty to me.
>
> Regards,
> Hitesh
>
>
>
> On Thu, Sep 8, 2016 at 1:37 PM, Yogi Devendra <
> devendra.vyavah...@gmail.com>
> wrote:
>
> > If we specify order of the fields and length for each field then start,
> end
> > can be computed.
> > Why do we need end user to specify start position for each field?
> >
> > ~ Yogi
> >
> > On 8 September 2016 at 12:48, Chinmay Kolhatkar  >
> > wrote:
> >
> > > Few points/questions:
> > > 1. Agree with Yogi. Approach 2 does not look clean.
> > > 2. Do we need "recordwidthlength"?
> > > 3. "recordseperator" should be "\n" and not "/n".
> > > 4. In general, providing schema as a JSON is tedious from user
> > perspective.
> > > I suggest we find a simpler format for specifying schema. For eg.
> > > ,,,
> > > 5. I suggest we provide basic parser first to malhar which does only
> > > parsing and type checking. Constraints, IMO are not part of parsing
> > module
> > > OR if needed can be added as phase 2 improvisation of this parser.
> > > 6. I would suggest to use some existing library for parsing. There is
> no
> > > point in re-inventing the wheels and trying to make something robust
> can
> > be
> > > time consuming.
> > >
> > > -Chinmay.
> > >
> > >
> > > On Wed, Sep 7, 2016 at 4:33 PM, Yogi Devendra <
> > > devendra.vyavah...@gmail.com>
> > > wrote:
> > >
> > > > Approach 2 does not look like a clean solution.
> > > >
> > > > -1 for Approach 2.
> > > >
> > > > ~ Yogi
> > > >
> > > > On 7 September 2016 at 15:25, Hitesh Kapoor 
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > An operator for parsing fixed width records has to be implemented.
> > > > > This operator shall be used to parse fixed width byte array/tuples
> > > based
> > > > on
> > > > > a JSON Schema and emit the parsed bytearray on one port; converted
> > POJO
> > > > > object on another port and the failed bytearray/tuples on an error
> > > port.
> > > > >
> > > > >
> > > > > User will provide a JSON schema definition based on the schema
> > > definition
> > > > > as mentioned below.
> > > > >
> > > > > {
> > > > >
> > > 

[jira] [Updated] (APEXMALHAR-2022) S3 Output Module for file copy

2016-10-03 Thread Chaitanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaitanya updated APEXMALHAR-2022:
--
Assignee: Hitesh Kapoor  (was: Chaitanya)

> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Hitesh Kapoor
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Fixed Width Record Parser

2016-10-03 Thread Hitesh Kapoor
Hi All,

Thank you for your feedback.
So as per the votes/comments, I will not be going ahead with approach 2 as
it is not clean.

For approach 1, I have looked at the possibility to use existing parsing
libraries like flatworm, flatpack, univocity,
following are the problems with using exisiting libraries:
1) These libraries take input schema in a specific format and are
complicated to use.
For example the most famous library (as per stackoverflow) flatworm will
involve giving the input schema in Xml format (refer
http://flatworm.sourceforge.net/) so we will loose our consistency with
existing parsers like CsvParser, where we take i/p in JSON format. Not only
the consistency it will be more difficult for the user to give input in
flatworm specific XML.
If we decide to convert our JSON to Flatworm specific Xml, it will involve
lot more work then to write your own library.
2)  Does only limited type checking for example for a Date type if it
adheres to dd/mm/, a date may parse correctly for i/p 12/13/2000 (month
is beyond 12) .
3) Difficult to handle Boolean and Date datatypes.
4) Future scalability may take a hit. For example if we want to add more
constraints to our parser like min value for an integer or a pattern for a
string , it won't be possible to do it with existing libraries.
5) To retrieve the values to create a POJO is not user (coder) friendly.

According to me we should write our own library to do the parsing and
validation  as to use an existing library will involve more work.
The work involved in coding the library is easy and straightforward.
It will be easier for us to scale and also provide an easy life for the end
user to provide the input schema.
The reason we are not going ahead with approach 2 is that it is not clean,
the twisting and turning involved in using (forcefully using) existing
libraries appears more dirty to me.

Regards,
Hitesh



On Thu, Sep 8, 2016 at 1:37 PM, Yogi Devendra 
wrote:

> If we specify order of the fields and length for each field then start, end
> can be computed.
> Why do we need end user to specify start position for each field?
>
> ~ Yogi
>
> On 8 September 2016 at 12:48, Chinmay Kolhatkar 
> wrote:
>
> > Few points/questions:
> > 1. Agree with Yogi. Approach 2 does not look clean.
> > 2. Do we need "recordwidthlength"?
> > 3. "recordseperator" should be "\n" and not "/n".
> > 4. In general, providing schema as a JSON is tedious from user
> perspective.
> > I suggest we find a simpler format for specifying schema. For eg.
> > ,,,
> > 5. I suggest we provide basic parser first to malhar which does only
> > parsing and type checking. Constraints, IMO are not part of parsing
> module
> > OR if needed can be added as phase 2 improvisation of this parser.
> > 6. I would suggest to use some existing library for parsing. There is no
> > point in re-inventing the wheels and trying to make something robust can
> be
> > time consuming.
> >
> > -Chinmay.
> >
> >
> > On Wed, Sep 7, 2016 at 4:33 PM, Yogi Devendra <
> > devendra.vyavah...@gmail.com>
> > wrote:
> >
> > > Approach 2 does not look like a clean solution.
> > >
> > > -1 for Approach 2.
> > >
> > > ~ Yogi
> > >
> > > On 7 September 2016 at 15:25, Hitesh Kapoor 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > An operator for parsing fixed width records has to be implemented.
> > > > This operator shall be used to parse fixed width byte array/tuples
> > based
> > > on
> > > > a JSON Schema and emit the parsed bytearray on one port; converted
> POJO
> > > > object on another port and the failed bytearray/tuples on an error
> > port.
> > > >
> > > >
> > > > User will provide a JSON schema definition based on the schema
> > definition
> > > > as mentioned below.
> > > >
> > > > {
> > > >
> > > > “recordwidthlength”: “Integer”
> > > >
> > > > "recordseparator": "/n", // this would be blank if there is no record
> > > > separator, default - a newline character
> > > >
> > > > "fields": [
> > > >
> > > > {
> > > >
> > > > "name": "",
> > > >
> > > > "type": "",
> > > >
> > > > “startCharNum”: “”,
> > > >
> > > > “endCharNum”: “”,
> > > >
> > > > "constraints": {
> > > >
> > > > }
> > > >
> > > > },
> > > >
> > > > {
> > > >
> > > > "name": "adName",
> > > >
> > > > "type": "String",
> > > >
> > > > “startCharNum”: “Integer”,
> > > >
> > > > “endCharNum”: “Integer”,
> > > >
> > > > "constraints": {
> > > >
> > > > "required": "true",
> > > >
> > > > "pattern": "[a­z].*[a­z]$",
> > > >
> > > > }
> > > >
> > > > }
> > > > ]
> > > > }
> > > >
> > > >
> > > > Below are the options to implement this operator.
> > > >
> > > > 1) Write a new custom library for parsing fixed width records as
> > existing
> > > > libraries for the same(e.g. flatowrm jffp etc.) do not have mechanism
> > for
> > > > constraint checking.
> > > > The challenges in this approach will be to write a robust library
> from
> > > > scratch to handle all our 

[jira] [Updated] (APEXMALHAR-2279) Add build status symbol for malhar master in README.md

2016-10-03 Thread Chinmay Kolhatkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kolhatkar updated APEXMALHAR-2279:
--
Description: 
Pick build status symbol from here: https://travis-ci.org/apache/apex-malhar

Take a look at other apache projects in github to see how its done.

> Add build status symbol for malhar master in README.md
> --
>
> Key: APEXMALHAR-2279
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2279
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Chinmay Kolhatkar
>Priority: Trivial
>  Labels: newbie
>
> Pick build status symbol from here: https://travis-ci.org/apache/apex-malhar
> Take a look at other apache projects in github to see how its done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXCORE-545) Add build status symbol for apex core master in README.md

2016-10-03 Thread Chinmay Kolhatkar (JIRA)
Chinmay Kolhatkar created APEXCORE-545:
--

 Summary: Add build status symbol for apex core master in README.md
 Key: APEXCORE-545
 URL: https://issues.apache.org/jira/browse/APEXCORE-545
 Project: Apache Apex Core
  Issue Type: Improvement
Reporter: Chinmay Kolhatkar
Priority: Trivial


Pick build status symbol from here: https://travis-ci.org/apache/apex-core

Take a look at other apache projects in github to see how its done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2279) Add build status symbol for malhar master in README.md

2016-10-03 Thread Chinmay Kolhatkar (JIRA)
Chinmay Kolhatkar created APEXMALHAR-2279:
-

 Summary: Add build status symbol for malhar master in README.md
 Key: APEXMALHAR-2279
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2279
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Chinmay Kolhatkar
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #402: APEXCORE-532: Fix issue where new operators add...

2016-10-03 Thread tushargosavi
GitHub user tushargosavi opened a pull request:

https://github.com/apache/apex-core/pull/402

APEXCORE-532: Fix issue where new operators added to dag starts from 
initial checkpoint

In case new operators are added dynamically, they should start from 
windowId of upstream operator. The issue was that new operators gets added 
using addLogicalOpeartor which sets recoveryWindowId to INITIAL_CHECKPOINT, 
which prevents getActivationCheckpoint to return correct window id of the 
operator during deploy.

@PramodSSImmaneni  please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tushargosavi/incubator-apex-core APEXCORE-532

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #402


commit 30091217dced6d1581a8630374c942a811ab6ed2
Author: Tushar R. Gosavi 
Date:   2016-10-03T07:07:59Z

APEXCORE-532: Fix issue where new operators added to dag starts from 
initial checkpoint




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-532) New dynamically added operator does not start with correct windowId.

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541757#comment-15541757
 ] 

ASF GitHub Bot commented on APEXCORE-532:
-

GitHub user tushargosavi opened a pull request:

https://github.com/apache/apex-core/pull/402

APEXCORE-532: Fix issue where new operators added to dag starts from 
initial checkpoint

In case new operators are added dynamically, they should start from 
windowId of upstream operator. The issue was that new operators gets added 
using addLogicalOpeartor which sets recoveryWindowId to INITIAL_CHECKPOINT, 
which prevents getActivationCheckpoint to return correct window id of the 
operator during deploy.

@PramodSSImmaneni  please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tushargosavi/incubator-apex-core APEXCORE-532

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #402


commit 30091217dced6d1581a8630374c942a811ab6ed2
Author: Tushar R. Gosavi 
Date:   2016-10-03T07:07:59Z

APEXCORE-532: Fix issue where new operators added to dag starts from 
initial checkpoint




> New dynamically added operator does not start with correct windowId.
> 
>
> Key: APEXCORE-532
> URL: https://issues.apache.org/jira/browse/APEXCORE-532
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Tushar Gosavi
>Priority: Critical
>
> During dynamic DAG change, If new operator is added and connected to existing 
> operator, it does not starts with correct windowId. The baseSeconds is set to 
> 0 causing windowId management problems at master effectively halting purge 
> from buffer server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2272) sequencialFileRead property on FSInputModule not functioning as expected

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541741#comment-15541741
 ] 

ASF GitHub Bot commented on APEXMALHAR-2272:


GitHub user yogidevendra opened a pull request:

https://github.com/apache/apex-malhar/pull/439

APEXMALHAR-2272 : Fixed sequentialFileRead on FSInputModule



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yogidevendra/apex-malhar 
APEXMALHAR-2272-sequencialFileRead

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #439


commit f01b56b5a671572f37e487f422db4cbe5537
Author: yogidevendra 
Date:   2016-10-01T02:49:56Z

APEXMALHAR-2272 : Fixed sequencialFileRead on FSInputModule




> sequencialFileRead property on FSInputModule not functioning as expected
> 
>
> Key: APEXMALHAR-2272
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2272
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
>Priority: Minor
>
> When there is single large file in the input directory, and we have multiple 
> partitions for BlockReader and sequencialFileRead is set to true.
> Only one BlockReader instance should be active; other BlockReader instances 
> should remain idle.
> This is because sequencialFileRead makes sure that blocks of the file are 
> read serially by same BlockReader instance. 
> Observed behavior is all BlockReader instances are reading data which means 
> sequencialFileRead property is not functioning as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #439: APEXMALHAR-2272 : Fixed sequentialFileRead on...

2016-10-03 Thread yogidevendra
GitHub user yogidevendra opened a pull request:

https://github.com/apache/apex-malhar/pull/439

APEXMALHAR-2272 : Fixed sequentialFileRead on FSInputModule



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yogidevendra/apex-malhar 
APEXMALHAR-2272-sequencialFileRead

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #439


commit f01b56b5a671572f37e487f422db4cbe5537
Author: yogidevendra 
Date:   2016-10-01T02:49:56Z

APEXMALHAR-2272 : Fixed sequencialFileRead on FSInputModule




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---