Re: Megh operator library

2016-09-08 Thread Lakshmi Velineni
i would like to help as well.

thanks
Lakshmi Prasanna

On Thu, Sep 8, 2016 at 9:15 PM, Dongming Liang 
wrote:

> +1 for me.
>
> Thanks,
> - Dongming
>
> Dongming LIANG
> 
> dongming.li...@gmail.com
>
> On Thu, Sep 8, 2016 at 9:02 PM, Devendra Tagare  >
> wrote:
>
> > Hi,
> >
> > Count me in for one operator as well.
> >
> > Thanks,
> > Dev
> >
> > On Fri, Sep 9, 2016 at 8:33 AM, Yogi Devendra <
> > devendra.vyavah...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I would like to contribute for this effort for one operator. Please
> count
> > > me in for this effort.
> > >
> > > ~ Yogi
> > >
> > > On 9 September 2016 at 03:10, Pramod Immaneni 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > DataTorrent, the initial contributor to Apex and the company I work
> > for,
> > > > has opened up a library of operators called Megh recently to the
> public
> > > and
> > > > has made the repository available under the Apache License. The link
> to
> > > the
> > > > repository is below. These operators, for the most part, contain
> > > > functionality that is complementary to what Malhar library provides
> and
> > > > were developed to solve business use cases that arose over time.
> Also,
> > > some
> > > > operators in Malhar were inspired from early implementations in the
> > Megh
> > > > library and were built upon knowledge gained in doing the original
> > > > implementations.
> > > >
> > > > Our goal is to not have Megh as a separate library but rather bring
> > these
> > > > operators into Malhar in a fashion that it is consistent with the
> > Malhar
> > > > project and repository. In the upcoming days, in a gradual fashion,
> we
> > > will
> > > > have more details on the individual operators that we would like to
> > > > contribute. Also, if you are interested in helping with this effort
> > > please
> > > > raise your hand.
> > > >
> > > > https://github.com/DataTorrent/Megh/
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: Megh operator library

2016-09-08 Thread Dongming Liang
+1 for me.

Thanks,
- Dongming

Dongming LIANG

dongming.li...@gmail.com

On Thu, Sep 8, 2016 at 9:02 PM, Devendra Tagare 
wrote:

> Hi,
>
> Count me in for one operator as well.
>
> Thanks,
> Dev
>
> On Fri, Sep 9, 2016 at 8:33 AM, Yogi Devendra <
> devendra.vyavah...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I would like to contribute for this effort for one operator. Please count
> > me in for this effort.
> >
> > ~ Yogi
> >
> > On 9 September 2016 at 03:10, Pramod Immaneni 
> > wrote:
> >
> > > Hi,
> > >
> > > DataTorrent, the initial contributor to Apex and the company I work
> for,
> > > has opened up a library of operators called Megh recently to the public
> > and
> > > has made the repository available under the Apache License. The link to
> > the
> > > repository is below. These operators, for the most part, contain
> > > functionality that is complementary to what Malhar library provides and
> > > were developed to solve business use cases that arose over time. Also,
> > some
> > > operators in Malhar were inspired from early implementations in the
> Megh
> > > library and were built upon knowledge gained in doing the original
> > > implementations.
> > >
> > > Our goal is to not have Megh as a separate library but rather bring
> these
> > > operators into Malhar in a fashion that it is consistent with the
> Malhar
> > > project and repository. In the upcoming days, in a gradual fashion, we
> > will
> > > have more details on the individual operators that we would like to
> > > contribute. Also, if you are interested in helping with this effort
> > please
> > > raise your hand.
> > >
> > > https://github.com/DataTorrent/Megh/
> > >
> > > Thanks
> > >
> >
>


[jira] [Commented] (APEXMALHAR-2130) implement scalable windowed storage

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475007#comment-15475007
 ] 

ASF GitHub Bot commented on APEXMALHAR-2130:


GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/405

APEXMALHAR-2130 #resolve Added a spillable map that takes two keys with 
support of iterating through all entries with a given first key

@siyuanh @tweise @ilooner Please review. This is mostly for the scalable 
implementation of a WindowedKeyedStorage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2130

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit 183c9ff01e3f197c9088cdcf460827023d414ce1
Author: David Yan 
Date:   2016-09-08T20:59:50Z

APEXMALHAR-2130 #resolve Added a spillable map that takes two keys with 
support of iterating through all entries with a given first key




> implement scalable windowed storage
> ---
>
> Key: APEXMALHAR-2130
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: David Yan
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the 
> checkpointing window id.  This should be done incrementally (ManagedState) to 
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that 
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged 
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage 
> interfaces, and because of 2 and 3, we may want to add methods to the 
> WindowedStorage interface so that the implementation of WindowedOperator can 
> notify the storage of checkpointing, recovering and committing of a window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #405: APEXMALHAR-2130 #resolve Added a spillable ma...

2016-09-08 Thread davidyan74
GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/405

APEXMALHAR-2130 #resolve Added a spillable map that takes two keys with 
support of iterating through all entries with a given first key

@siyuanh @tweise @ilooner Please review. This is mostly for the scalable 
implementation of a WindowedKeyedStorage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2130

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit 183c9ff01e3f197c9088cdcf460827023d414ce1
Author: David Yan 
Date:   2016-09-08T20:59:50Z

APEXMALHAR-2130 #resolve Added a spillable map that takes two keys with 
support of iterating through all entries with a given first key




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-08 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474990#comment-15474990
 ] 

Siyuan Hua commented on APEXMALHAR-2230:


Not necessarily related to Kafka. This is caused by another exception 
2016-09-07 20:13:14,825 [6/Kafka 
inputtesttopic6.outputPort#unifier:DefaultUnifier] ERROR 
engine.StreamingContainer run - Operator set 
[OperatorDeployInfo.UnifierDeployInfo[id=6,name=Kafka 
inputtesttopic6.outputPort#unifier,type=UNIFIER,checkpoint={57d074d30001, 
0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=(1.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=1,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=(2.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=2,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=(3.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=3,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=(4.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=4,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=Kafka
 
messagetesttopic6,bufferServer=testing-worker-linux-docker-c25f223a-3424-linux-7
 stopped running due to an exception.
java.lang.NullPointerException
at com.datatorrent.bufferserver.packet.Tuple.getTuple(Tuple.java:54)
at 
com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:309)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:259)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)
2016-09-07 20:13:14,826 [6/Kafka 
inputtesttopic6.outputPort#unifier:DefaultUnifier] INFO  
stram.StramLocalCluster log - container-1 msg: Stopped running due to an 
exception. java.lang.NullPointerException
at com.datatorrent.bufferserver.packet.Tuple.getTuple(Tuple.java:54)
at 
com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:309)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:259)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)This
 is related to some 

Basically on the bufferserver side it already received some unexpected data.  

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Thomas Weise
>Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-08 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua reassigned APEXMALHAR-2230:
--

Assignee: Siyuan Hua

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Thomas Weise
>Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2231) Implement a Spillable map that takes two keys

2016-09-08 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2231:
-

 Summary: Implement a Spillable map that takes two keys
 Key: APEXMALHAR-2231
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2231
 Project: Apache Apex Malhar
  Issue Type: Sub-task
Reporter: David Yan
Assignee: David Yan


This is similar to Map> with the ability to get all K2's given a 
K1, and remove all entries with a given K1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2228) Implement SpillableByteLinkedListMultimap<K, V>

2016-09-08 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan resolved APEXMALHAR-2228.
---
Resolution: Invalid

Not valid any more

> Implement SpillableByteLinkedListMultimap
> ---
>
> Key: APEXMALHAR-2228
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2228
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: Siyuan Hua
>
> We need the support of the following functions:
> - Iterator on the values given a key, including Iterator.remove
> - removing an entry given a key and a value
> This will probably need a new SpillableLinkedListImpl class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-522) Promote singleton usage pattern for String2String, Long2String and other StringCodecs

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474522#comment-15474522
 ] 

ASF GitHub Bot commented on APEXCORE-522:
-

Github user vrozov closed the pull request at:

https://github.com/apache/apex-core/pull/385


> Promote singleton usage pattern for String2String, Long2String and other 
> StringCodecs
> -
>
> Key: APEXCORE-522
> URL: https://issues.apache.org/jira/browse/APEXCORE-522
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> String2String codec does not hold state, a single instance of the same 
> String2String class may be reused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-522) Promote singleton usage pattern for String2String, Long2String and other StringCodecs

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474523#comment-15474523
 ] 

ASF GitHub Bot commented on APEXCORE-522:
-

GitHub user vrozov reopened a pull request:

https://github.com/apache/apex-core/pull/385

APEXCORE-522 - Promote singleton usage pattern for String2String, 
Long2String and other StringCodecs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/apex-core APEXCORE-522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #385


commit 9a9a34e825c2204ffd5edae9fd8641892dfe2a3f
Author: Vlad Rozov 
Date:   2016-09-07T16:12:54Z

APEXCORE-522 - Promote singleton usage pattern for String2String, 
Long2String and other StringCodecs




> Promote singleton usage pattern for String2String, Long2String and other 
> StringCodecs
> -
>
> Key: APEXCORE-522
> URL: https://issues.apache.org/jira/browse/APEXCORE-522
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> String2String codec does not hold state, a single instance of the same 
> String2String class may be reused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #385: APEXCORE-522 - Promote singleton usage pattern ...

2016-09-08 Thread vrozov
GitHub user vrozov reopened a pull request:

https://github.com/apache/apex-core/pull/385

APEXCORE-522 - Promote singleton usage pattern for String2String, 
Long2String and other StringCodecs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/apex-core APEXCORE-522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #385


commit 9a9a34e825c2204ffd5edae9fd8641892dfe2a3f
Author: Vlad Rozov 
Date:   2016-09-07T16:12:54Z

APEXCORE-522 - Promote singleton usage pattern for String2String, 
Long2String and other StringCodecs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (APEXCORE-524) Add support for custom maven repository to ClassPathResolverTest.testManifestClassPathResolver

2016-09-08 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved APEXCORE-524.
---
   Resolution: Fixed
Fix Version/s: 3.5.0

> Add support for custom maven repository to 
> ClassPathResolverTest.testManifestClassPathResolver 
> ---
>
> Key: APEXCORE-524
> URL: https://issues.apache.org/jira/browse/APEXCORE-524
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
> Fix For: 3.5.0
>
>
> ClassPathResolverTest.testManifestClassPathResolver fails when maven is 
> configured with non-default repository location (-Dmaven.repo.local). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #386: APEXCORE-524 - Add support for custom maven rep...

2016-09-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/386


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-524) Add support for custom maven repository to ClassPathResolverTest.testManifestClassPathResolver

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474316#comment-15474316
 ] 

ASF GitHub Bot commented on APEXCORE-524:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/386


> Add support for custom maven repository to 
> ClassPathResolverTest.testManifestClassPathResolver 
> ---
>
> Key: APEXCORE-524
> URL: https://issues.apache.org/jira/browse/APEXCORE-524
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> ClassPathResolverTest.testManifestClassPathResolver fails when maven is 
> configured with non-default repository location (-Dmaven.repo.local). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-524) Add support for custom maven repository to ClassPathResolverTest.testManifestClassPathResolver

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474186#comment-15474186
 ] 

ASF GitHub Bot commented on APEXCORE-524:
-

GitHub user vrozov opened a pull request:

https://github.com/apache/apex-core/pull/386

APEXCORE-524 - Add support for custom maven repository to 
ClassPathResolverTest.testManifestClassPathResolver

@tweise Please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/apex-core APEXCORE-524

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #386


commit 989536773e2f0c54b66b34a1680cb199c0e159b4
Author: Vlad Rozov 
Date:   2016-09-08T15:34:47Z

APEXCORE-524 - Add support for custom maven repository to 
ClassPathResolverTest.testManifestClassPathResolver




> Add support for custom maven repository to 
> ClassPathResolverTest.testManifestClassPathResolver 
> ---
>
> Key: APEXCORE-524
> URL: https://issues.apache.org/jira/browse/APEXCORE-524
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> ClassPathResolverTest.testManifestClassPathResolver fails when maven is 
> configured with non-default repository location (-Dmaven.repo.local). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #386: APEXCORE-524 - Add support for custom maven rep...

2016-09-08 Thread vrozov
GitHub user vrozov opened a pull request:

https://github.com/apache/apex-core/pull/386

APEXCORE-524 - Add support for custom maven repository to 
ClassPathResolverTest.testManifestClassPathResolver

@tweise Please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/apex-core APEXCORE-524

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #386


commit 989536773e2f0c54b66b34a1680cb199c0e159b4
Author: Vlad Rozov 
Date:   2016-09-08T15:34:47Z

APEXCORE-524 - Add support for custom maven repository to 
ClassPathResolverTest.testManifestClassPathResolver




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXCORE-524) Add support for custom maven repository to ClassPathResolverTest.testManifestClassPathResolver

2016-09-08 Thread Vlad Rozov (JIRA)
Vlad Rozov created APEXCORE-524:
---

 Summary: Add support for custom maven repository to 
ClassPathResolverTest.testManifestClassPathResolver 
 Key: APEXCORE-524
 URL: https://issues.apache.org/jira/browse/APEXCORE-524
 Project: Apache Apex Core
  Issue Type: Improvement
Reporter: Vlad Rozov
Assignee: Vlad Rozov
Priority: Minor


ClassPathResolverTest.testManifestClassPathResolver fails when maven is 
configured with non-default repository location (-Dmaven.repo.local). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2152) Enricher - Add fixed length file format support to FSLoader

2016-09-08 Thread Chinmay Kolhatkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kolhatkar resolved APEXMALHAR-2152.
---
   Resolution: Done
Fix Version/s: 3.6.0

> Enricher - Add fixed length file format support to FSLoader
> ---
>
> Key: APEXMALHAR-2152
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2152
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: algorithms
>Reporter: Chinmay Kolhatkar
>Assignee: shubham pathak
> Fix For: 3.6.0
>
>
> Enricher - Add fixed length file format support to FSLoader
> NOTE: Details will follow in discussion on dev@apex mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #376: APEXMALHAR-2152 - FSLoader fixed length suppo...

2016-09-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/376


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2152) Enricher - Add fixed length file format support to FSLoader

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473687#comment-15473687
 ] 

ASF GitHub Bot commented on APEXMALHAR-2152:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/376


> Enricher - Add fixed length file format support to FSLoader
> ---
>
> Key: APEXMALHAR-2152
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2152
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: algorithms
>Reporter: Chinmay Kolhatkar
>Assignee: shubham pathak
>
> Enricher - Add fixed length file format support to FSLoader
> NOTE: Details will follow in discussion on dev@apex mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2176) expressionFunctions for FilterOperator throws IndexOutOfBounds

2016-09-08 Thread Chinmay Kolhatkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kolhatkar resolved APEXMALHAR-2176.
---
   Resolution: Fixed
Fix Version/s: 3.6.0

> expressionFunctions for FilterOperator throws IndexOutOfBounds
> --
>
> Key: APEXMALHAR-2176
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2176
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
>Priority: Minor
> Fix For: 3.6.0
>
>
> If expressionFunctions are added through xml conf file as follows
> {code}
>   
>   
> dt.application.FilterExample.operator.filterOperator.prop.expressionFunctions[5]
> org.apache.commons.lang3.BooleanUtils.*
>   
> {code}
> This gives IndexOutOfBoundsException 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2176) expressionFunctions for FilterOperator throws IndexOutOfBounds

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473546#comment-15473546
 ] 

ASF GitHub Bot commented on APEXMALHAR-2176:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/361


> expressionFunctions for FilterOperator throws IndexOutOfBounds
> --
>
> Key: APEXMALHAR-2176
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2176
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
>Priority: Minor
>
> If expressionFunctions are added through xml conf file as follows
> {code}
>   
>   
> dt.application.FilterExample.operator.filterOperator.prop.expressionFunctions[5]
> org.apache.commons.lang3.BooleanUtils.*
>   
> {code}
> This gives IndexOutOfBoundsException 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #361: APEXMALHAR-2176 expressionFunctions for Filte...

2016-09-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/361


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Fixed Width Record Parser

2016-09-08 Thread Yogi Devendra
If we specify order of the fields and length for each field then start, end
can be computed.
Why do we need end user to specify start position for each field?

~ Yogi

On 8 September 2016 at 12:48, Chinmay Kolhatkar 
wrote:

> Few points/questions:
> 1. Agree with Yogi. Approach 2 does not look clean.
> 2. Do we need "recordwidthlength"?
> 3. "recordseperator" should be "\n" and not "/n".
> 4. In general, providing schema as a JSON is tedious from user perspective.
> I suggest we find a simpler format for specifying schema. For eg.
> ,,,
> 5. I suggest we provide basic parser first to malhar which does only
> parsing and type checking. Constraints, IMO are not part of parsing module
> OR if needed can be added as phase 2 improvisation of this parser.
> 6. I would suggest to use some existing library for parsing. There is no
> point in re-inventing the wheels and trying to make something robust can be
> time consuming.
>
> -Chinmay.
>
>
> On Wed, Sep 7, 2016 at 4:33 PM, Yogi Devendra <
> devendra.vyavah...@gmail.com>
> wrote:
>
> > Approach 2 does not look like a clean solution.
> >
> > -1 for Approach 2.
> >
> > ~ Yogi
> >
> > On 7 September 2016 at 15:25, Hitesh Kapoor 
> > wrote:
> >
> > > Hi All,
> > >
> > > An operator for parsing fixed width records has to be implemented.
> > > This operator shall be used to parse fixed width byte array/tuples
> based
> > on
> > > a JSON Schema and emit the parsed bytearray on one port; converted POJO
> > > object on another port and the failed bytearray/tuples on an error
> port.
> > >
> > >
> > > User will provide a JSON schema definition based on the schema
> definition
> > > as mentioned below.
> > >
> > > {
> > >
> > > “recordwidthlength”: “Integer”
> > >
> > > "recordseparator": "/n", // this would be blank if there is no record
> > > separator, default - a newline character
> > >
> > > "fields": [
> > >
> > > {
> > >
> > > "name": "",
> > >
> > > "type": "",
> > >
> > > “startCharNum”: “”,
> > >
> > > “endCharNum”: “”,
> > >
> > > "constraints": {
> > >
> > > }
> > >
> > > },
> > >
> > > {
> > >
> > > "name": "adName",
> > >
> > > "type": "String",
> > >
> > > “startCharNum”: “Integer”,
> > >
> > > “endCharNum”: “Integer”,
> > >
> > > "constraints": {
> > >
> > > "required": "true",
> > >
> > > "pattern": "[a­z].*[a­z]$",
> > >
> > > }
> > >
> > > }
> > > ]
> > > }
> > >
> > >
> > > Below are the options to implement this operator.
> > >
> > > 1) Write a new custom library for parsing fixed width records as
> existing
> > > libraries for the same(e.g. flatowrm jffp etc.) do not have mechanism
> for
> > > constraint checking.
> > > The challenges in this approach will be to write a robust library from
> > > scratch to handle all our requirements.
> > >
> > > 2) Extend our already written CsvParser to handle fixed width record.
> In
> > > this approach in the incoming tuple we will have to add a delimiter
> > > "character" after every field in the record.
> > > The challenges in this approach would be to select a delimiter
> character
> > > and then if the character appears in the stream we will have to escape
> > that
> > > character.
> > > This approach will increase the memory overhead (as extra characters
> are
> > > inserted as delimiters) but will be comparatively more easy to maintain
> > and
> > > operate.
> > >
> > > Please let me know your thoughts and votes on above approaches.
> > >
> > > Regards,
> > > Hitesh
> > >
> >
>


Re: Fixed Width Record Parser

2016-09-08 Thread Chinmay Kolhatkar
Few points/questions:
1. Agree with Yogi. Approach 2 does not look clean.
2. Do we need "recordwidthlength"?
3. "recordseperator" should be "\n" and not "/n".
4. In general, providing schema as a JSON is tedious from user perspective.
I suggest we find a simpler format for specifying schema. For eg.
,,,
5. I suggest we provide basic parser first to malhar which does only
parsing and type checking. Constraints, IMO are not part of parsing module
OR if needed can be added as phase 2 improvisation of this parser.
6. I would suggest to use some existing library for parsing. There is no
point in re-inventing the wheels and trying to make something robust can be
time consuming.

-Chinmay.


On Wed, Sep 7, 2016 at 4:33 PM, Yogi Devendra 
wrote:

> Approach 2 does not look like a clean solution.
>
> -1 for Approach 2.
>
> ~ Yogi
>
> On 7 September 2016 at 15:25, Hitesh Kapoor 
> wrote:
>
> > Hi All,
> >
> > An operator for parsing fixed width records has to be implemented.
> > This operator shall be used to parse fixed width byte array/tuples based
> on
> > a JSON Schema and emit the parsed bytearray on one port; converted POJO
> > object on another port and the failed bytearray/tuples on an error port.
> >
> >
> > User will provide a JSON schema definition based on the schema definition
> > as mentioned below.
> >
> > {
> >
> > “recordwidthlength”: “Integer”
> >
> > "recordseparator": "/n", // this would be blank if there is no record
> > separator, default - a newline character
> >
> > "fields": [
> >
> > {
> >
> > "name": "",
> >
> > "type": "",
> >
> > “startCharNum”: “”,
> >
> > “endCharNum”: “”,
> >
> > "constraints": {
> >
> > }
> >
> > },
> >
> > {
> >
> > "name": "adName",
> >
> > "type": "String",
> >
> > “startCharNum”: “Integer”,
> >
> > “endCharNum”: “Integer”,
> >
> > "constraints": {
> >
> > "required": "true",
> >
> > "pattern": "[a­z].*[a­z]$",
> >
> > }
> >
> > }
> > ]
> > }
> >
> >
> > Below are the options to implement this operator.
> >
> > 1) Write a new custom library for parsing fixed width records as existing
> > libraries for the same(e.g. flatowrm jffp etc.) do not have mechanism for
> > constraint checking.
> > The challenges in this approach will be to write a robust library from
> > scratch to handle all our requirements.
> >
> > 2) Extend our already written CsvParser to handle fixed width record. In
> > this approach in the incoming tuple we will have to add a delimiter
> > "character" after every field in the record.
> > The challenges in this approach would be to select a delimiter character
> > and then if the character appears in the stream we will have to escape
> that
> > character.
> > This approach will increase the memory overhead (as extra characters are
> > inserted as delimiters) but will be comparatively more easy to maintain
> and
> > operate.
> >
> > Please let me know your thoughts and votes on above approaches.
> >
> > Regards,
> > Hitesh
> >
>