Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-18 Thread Chinmay Kolhatkar
Very nice ..:)
I would like to contribute as well.

Thanks,
Chinmay.

On Tue, Oct 18, 2016 at 2:19 AM, David Yan  wrote:

> I would also like to contribute. Thanks!
>
> David
>
> On Mon, Oct 17, 2016 at 10:01 AM, Thomas Weise  wrote:
>
> > FYI, the first cut of Apex runner for Beam is in a feature branch.
> >
> > https://github.com/apache/incubator-beam/tree/apex-runner
> >
> > The next step will be to detail the work needed to make it a top-level
> > runner and lay it out in JIRA.
> >
> > We will also be looking for folks that are interested to contribute, so
> if
> > you have time and interest, please raise your hand!
> >
> > Thanks,
> > Thomas
> >
> > -- Forwarded message --
> > From: Kenneth Knowles 
> > Date: Mon, Oct 17, 2016 at 9:51 AM
> > Subject: [KUDOS] Contributed runner: Apache Apex!
> > To: "d...@beam.incubator.apache.org" 
> >
> >
> > Hi all,
> >
> > I would to, once again, call attention to a great addition to Beam: a
> > runner for Apache Apex.
> >
> > After lots of review and much thoughtful revision, pull request #540 has
> > been merged to the apex-runner feature branch today. Please do take a
> look,
> > and help us put the finishing touches on it to get it ready for the
> master
> > branch.
> >
> > And please also congratulate and thank Thomas Weise for this large
> > endeavor, Vlad Rosov who helped get the integration tests working, and
> > Guarav Gupta who contributed review comments.
> >
> > Kenn
> >
>


[jira] [Created] (APEXMALHAR-2308) BlockReader must consider fixed width length in emitted block alignment.

2016-10-18 Thread Deepak Narkhede (JIRA)
Deepak Narkhede created APEXMALHAR-2308:
---

 Summary: BlockReader must consider fixed width length in emitted 
block alignment.
 Key: APEXMALHAR-2308
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2308
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Deepak Narkhede
Assignee: Deepak Narkhede


Current BlockReader doesn't consider the fixed length mode while emitting 
blocks, as single tuple may be split across two blocks hence leading to data 
inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587668#comment-15587668
 ] 

ASF GitHub Bot commented on APEXMALHAR-2190:


GitHub user brightchen reopened a pull request:

https://github.com/apache/apex-malhar/pull/404

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill…

…able data structure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #404


commit 66d8defb767ced52c6563c9710a832395269625e
Author: brightchen 
Date:   2016-08-16T00:46:27Z

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable 
data structure




> Use reusable buffer to serial spillable data structure
> --
>
> Key: APEXMALHAR-2190
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Spillable Data Structure created lots of temporary memory to serial data lot 
> of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up 
> memory very quickly. See APEXMALHAR-2182.
> Use a shared memory to avoid allocate temporary memory and memory copy
> some basic ideas
> - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer 
> buffer): instead of create a memory and then return the serialized data, this 
> method let the caller pass in the buffer. So different objects or object with 
> embed objects can share the same LengthValueBuffer
> - LengthValueBuffer: It is a buffer which manage the memory as length and 
> value(which is the generic format of serialized data). which provide length 
> placeholder mechanism to avoid temporary memory and data copy when the length 
> can be know after data serialized
> - memory management classes: includes interface ByteStream and it's 
> implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism 
> to dynamic allocate and manage memory. Which basically provides following 
> function. I tried other some other stream mechamism such as 
> ByteArrayInputStream, but it can meet 3rd criteria, and don't have good 
> performance(50% loss) 
>   - dynamic allocate memory
>   - reset memory for reuse
>   - BlocksStream make sure the output slices will not be changed when need 
> extra memory; Block can change the reference of output slices buffer is data 
> was moved due to reallocate of memory(BlocksStream is better solution).
>   - WindowableBlocksStream extends from BlocksStream and provides function to 
> reset memory window by window instead of reset all memory. It provides 
> certain amount of cache( as bytes ) in memory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587667#comment-15587667
 ] 

ASF GitHub Bot commented on APEXMALHAR-2190:


Github user brightchen closed the pull request at:

https://github.com/apache/apex-malhar/pull/404


> Use reusable buffer to serial spillable data structure
> --
>
> Key: APEXMALHAR-2190
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Spillable Data Structure created lots of temporary memory to serial data lot 
> of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up 
> memory very quickly. See APEXMALHAR-2182.
> Use a shared memory to avoid allocate temporary memory and memory copy
> some basic ideas
> - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer 
> buffer): instead of create a memory and then return the serialized data, this 
> method let the caller pass in the buffer. So different objects or object with 
> embed objects can share the same LengthValueBuffer
> - LengthValueBuffer: It is a buffer which manage the memory as length and 
> value(which is the generic format of serialized data). which provide length 
> placeholder mechanism to avoid temporary memory and data copy when the length 
> can be know after data serialized
> - memory management classes: includes interface ByteStream and it's 
> implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism 
> to dynamic allocate and manage memory. Which basically provides following 
> function. I tried other some other stream mechamism such as 
> ByteArrayInputStream, but it can meet 3rd criteria, and don't have good 
> performance(50% loss) 
>   - dynamic allocate memory
>   - reset memory for reuse
>   - BlocksStream make sure the output slices will not be changed when need 
> extra memory; Block can change the reference of output slices buffer is data 
> was moved due to reallocate of memory(BlocksStream is better solution).
>   - WindowableBlocksStream extends from BlocksStream and provides function to 
> reset memory window by window instead of reset all memory. It provides 
> certain amount of cache( as bytes ) in memory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...

2016-10-18 Thread brightchen
Github user brightchen closed the pull request at:

https://github.com/apache/apex-malhar/pull/404


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...

2016-10-18 Thread brightchen
GitHub user brightchen reopened a pull request:

https://github.com/apache/apex-malhar/pull/404

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill…

…able data structure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #404


commit 66d8defb767ced52c6563c9710a832395269625e
Author: brightchen 
Date:   2016-08-16T00:46:27Z

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable 
data structure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (APEXMALHAR-2303) S3 Line By Line Module

2016-10-18 Thread Ajay Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Gupta reassigned APEXMALHAR-2303:
--

Assignee: Ajay Gupta

> S3 Line By Line Module
> --
>
> Key: APEXMALHAR-2303
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2303
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Ajay Gupta
>Assignee: Ajay Gupta
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This is a new module which will consist of 2 operators
> 1) File Splitter -- Already existing in Malhar library
> 2) S3RecordReader -- Read a file from S3 and output the records (delimited or 
> fixed width) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2306) Tests should allow for additions to OperatorContext interface

2016-10-18 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2306:
-
Summary: Tests should allow for additions to OperatorContext interface  
(was: malhar unit test(s) have a dependency on OperatorContext in apex-core)

> Tests should allow for additions to OperatorContext interface
> -
>
> Key: APEXMALHAR-2306
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306
> Project: Apache Apex Malhar
>  Issue Type: Bug
>  Components: test benches
>Reporter: Sanjay M Pujare
>
> While trying to build/test malhar against the latest apex-core, the following 
> failure was noticed
> .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17]
>  com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext 
> is not abstract and does not override abstract method getName() in 
> com.datatorrent.api.Context.OperatorContext
> Ideally, malhar should mock up the OperatorContext interface or we should 
> find other ways to eliminate this kind of dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2306) malhar unit test(s) have a dependency on OperatorContext in apex-core

2016-10-18 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587351#comment-15587351
 ] 

Thomas Weise commented on APEXMALHAR-2306:
--

Have a look at FSInputModuleTest, for example.

Since the tests that use the test context were written with constructor 
dependency, it will be more work than just changing a factory method, but looks 
straightforward.

> malhar unit test(s) have a dependency on OperatorContext in apex-core
> -
>
> Key: APEXMALHAR-2306
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306
> Project: Apache Apex Malhar
>  Issue Type: Bug
>  Components: test benches
>Reporter: Sanjay M Pujare
>
> While trying to build/test malhar against the latest apex-core, the following 
> failure was noticed
> .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17]
>  com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext 
> is not abstract and does not override abstract method getName() in 
> com.datatorrent.api.Context.OperatorContext
> Ideally, malhar should mock up the OperatorContext interface or we should 
> find other ways to eliminate this kind of dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2307) Session windows are not deleted properly after merge or extend in InMemorySessionWindowedStorage

2016-10-18 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2307:
-

 Summary: Session windows are not deleted properly after merge or 
extend in InMemorySessionWindowedStorage
 Key: APEXMALHAR-2307
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2307
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: David Yan
Assignee: David Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2306) malhar unit test(s) have a dependency on OperatorContext in apex-core

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587143#comment-15587143
 ] 

ASF GitHub Bot commented on APEXMALHAR-2306:


Github user sanjaypujare closed the pull request at:

https://github.com/apache/apex-malhar/pull/459


> malhar unit test(s) have a dependency on OperatorContext in apex-core
> -
>
> Key: APEXMALHAR-2306
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306
> Project: Apache Apex Malhar
>  Issue Type: Bug
>  Components: test benches
>Reporter: Sanjay M Pujare
>
> While trying to build/test malhar against the latest apex-core, the following 
> failure was noticed
> .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17]
>  com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext 
> is not abstract and does not override abstract method getName() in 
> com.datatorrent.api.Context.OperatorContext
> Ideally, malhar should mock up the OperatorContext interface or we should 
> find other ways to eliminate this kind of dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #459: APEXMALHAR-2306 fix OperatorContext implement...

2016-10-18 Thread sanjaypujare
Github user sanjaypujare closed the pull request at:

https://github.com/apache/apex-malhar/pull/459


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2306) malhar unit test(s) have a dependency on OperatorContext in apex-core

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587040#comment-15587040
 ] 

ASF GitHub Bot commented on APEXMALHAR-2306:


GitHub user sanjaypujare opened a pull request:

https://github.com/apache/apex-malhar/pull/459

APEXMALHAR-2306  fix OperatorContext implementations to add the new 
required method

@vrozov   these changes were required to run malhar successfully against 
latest apex-core. This is just for review for now

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sanjaypujare/apex-malhar 
apex-core-3.5.0-changes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/459.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #459


commit 7d106b4abd42dcf4e9e7bf54a63eb042bcdc32bb
Author: Sanjay Pujare 
Date:   2016-10-18T23:35:35Z

fix OperatorContext implementations to add the new required method




> malhar unit test(s) have a dependency on OperatorContext in apex-core
> -
>
> Key: APEXMALHAR-2306
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306
> Project: Apache Apex Malhar
>  Issue Type: Bug
>  Components: test benches
>Reporter: Sanjay M Pujare
>
> While trying to build/test malhar against the latest apex-core, the following 
> failure was noticed
> .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17]
>  com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext 
> is not abstract and does not override abstract method getName() in 
> com.datatorrent.api.Context.OperatorContext
> Ideally malhar should mock up the OperstorContext interface or we should find 
> other ways to eliminate this kind of dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2305) Change implementation of session window to reflect what is described in streaming 102 blog

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586908#comment-15586908
 ] 

ASF GitHub Bot commented on APEXMALHAR-2305:


GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/458

APEXMALHAR-2305 #resolve Mirror the proto-session window behavior des…

…cribed in the streaming 102 blog

@siyuanh  please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2305

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #458


commit 0e598eba46fc39e706eda2ff9a7eedd1c8b0f2de
Author: David Yan 
Date:   2016-10-18T22:41:28Z

APEXMALHAR-2305 #resolve Mirror the proto-session window behavior described 
in the streaming 102 blog




> Change implementation of session window to reflect what is described in 
> streaming 102 blog
> --
>
> Key: APEXMALHAR-2305
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2305
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
>
> The proto-session windows described in the streaming 102 blog have a minimum 
> duration that is equal to the session gap. We should do the same in our 
> session window implementation. 
> https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2305) Change implementation of session window to reflect what is described in streaming 102 blog

2016-10-18 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2305:
-

 Summary: Change implementation of session window to reflect what 
is described in streaming 102 blog
 Key: APEXMALHAR-2305
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2305
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: David Yan
Assignee: David Yan


The proto-session windows described in the streaming 102 blog have a minimum 
duration that is equal to the session gap. We should do the same in our session 
window implementation. 

https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-561) Container info needs to be persisted even after application has been killed

2016-10-18 Thread David Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586503#comment-15586503
 ] 

David Yan commented on APEXCORE-561:


This is a good feature to have to diagnostics.

> Container info needs to be persisted even after application has been killed
> ---
>
> Key: APEXCORE-561
> URL: https://issues.apache.org/jira/browse/APEXCORE-561
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sanjay M Pujare
>
> This is required so that info can be displayed (by Apex cli for example) even 
> for killed apps.
> At least one piece of info missing, which is the finishedTime. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2299) TimeBasedDedupOperator throws exception during time bucket assignment in certain edge cases

2016-10-18 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh resolved APEXMALHAR-2299.
---
Resolution: Fixed

> TimeBasedDedupOperator throws exception during time bucket assignment in 
> certain edge cases
> ---
>
> Key: APEXMALHAR-2299
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2299
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Francis Fernandes
>Assignee: Francis Fernandes
> Fix For: 3.6.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following exception is thrown under certain edge cases:
> {noformat}
> Stopped running due to an exception. java.lang.IllegalArgumentException: new 
> time bucket should have a value greater than the old time bucket
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.apex.malhar.lib.state.managed.ManagedTimeUnifiedStateImpl.handleBucketConflict(ManagedTimeUnifiedStateImpl.java:126)
>   at 
> org.apache.apex.malhar.lib.state.managed.AbstractManagedStateImpl.prepareBucket(AbstractManagedStateImpl.java:265)
>   at 
> org.apache.apex.malhar.lib.state.managed.AbstractManagedStateImpl.putInBucket(AbstractManagedStateImpl.java:276)
>   at 
> org.apache.apex.malhar.lib.state.managed.ManagedTimeUnifiedStateImpl.put(ManagedTimeUnifiedStateImpl.java:71)
>   at 
> org.apache.apex.malhar.lib.dedup.TimeBasedDedupOperator.putManagedState(TimeBasedDedupOperator.java:191)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #411: APEXCORE-510 Implement check in DefaultOutputPo...

2016-10-18 Thread sanjaypujare
GitHub user sanjaypujare opened a pull request:

https://github.com/apache/apex-core/pull/411

APEXCORE-510  Implement check in DefaultOutputPort for thread affinity 
between setup and emit

@vrozov  pls review and merge as appropriate

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sanjaypujare/apex-core 
malhar-510.thread_affinity

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/411.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #411


commit 41f0c94409891f37478ca5e3f72daefaddc1fcaf
Author: Sanjay Pujare 
Date:   2016-09-29T00:23:06Z

ALEXCORE-510 thread affinity check
also implement a property to suppress thread affinity checks:




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #454: APEXMALHAR-2299 Modified to move window when ...

2016-10-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/454


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXMALHAR-2304) Apex SQL: Add examples for SQL in Apex in demos folder

2016-10-18 Thread Chinmay Kolhatkar (JIRA)
Chinmay Kolhatkar created APEXMALHAR-2304:
-

 Summary: Apex SQL: Add examples for SQL in Apex in demos folder
 Key: APEXMALHAR-2304
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2304
 Project: Apache Apex Malhar
  Issue Type: New Feature
Reporter: Chinmay Kolhatkar
Assignee: Chinmay Kolhatkar






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI

2016-10-18 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585751#comment-15585751
 ] 

Chandni Singh edited comment on APEXMALHAR-2284 at 10/18/16 3:40 PM:
-

One concept that I think I should point is how getAsync improves performance is 
that if the value is not available immediately, then the client can cache the 
future and work on other keys. Look at the implementations of AbstractDeduper. 

1. If SpillableArrayListMultimap only lacks asynchronous get, then that can be 
added to it. That should not be the only reason to duplicate the functionality.

2. ManagedTimeMultiValueState has no tests and is full of warnings.

We should not create a complete separate implementation which duplicates a 
larger part of functionality provided by something existing otherwise we will 
have too many slightly different variations of the same thing and that will be 
very difficult to manage. 

IMO this is a good opportunity to improve SpillableArrayListMultimapImpl and 
use that.



was (Author: csingh):
One concept that I think I should point is that getAsync improves performance 
is that if the value is not available immediately, then the client can cache 
the future and work on other keys. Look at the implementations of 
AbstractDeduper. 

1. If SpillableArrayListMultimap only lacks asynchronous get, then that can be 
added to it. That should not be the only reason to duplicate the functionality.

2. ManagedTimeMultiValueState has no tests and is full of warnings.

We should not create a complete separate implementation which duplicates a 
larger part of functionality provided by something existing otherwise we will 
have too many slightly different variations of the same thing and that will be 
very difficult to manage. 

IMO this is a good opportunity to improve SpillableArrayListMultimapImpl and 
use that.


> POJOInnerJoinOperatorTest fails in Travis CI
> 
>
> Key: APEXMALHAR-2284
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Thomas Weise
>Assignee: Chaitanya
>Priority: Blocker
> Fix For: 3.6.0
>
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt
> {code}
> Failed tests: 
>   POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of 
> tuple emitted  expected:<2> but was:<4>
>   POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted 
>  expected:<1> but was:<2>
>   POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted  
> expected:<2> but was:<3>
>   POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple 
> emitted  expected:<1> but was:<2>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI

2016-10-18 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585751#comment-15585751
 ] 

Chandni Singh commented on APEXMALHAR-2284:
---

One concept that I think I should point is that getAsync improves performance 
is that if the value is not available immediately, then the client can cache 
the future and work on other keys. Look at the implementations of 
AbstractDeduper. 

1. If SpillableArrayListMultimap only lacks asynchronous get, then that can be 
added to it. That should not be the only reason to duplicate the functionality.

2. ManagedTimeMultiValueState has no tests and is full of warnings.

We should not create a complete separate implementation which duplicates a 
larger part of functionality provided by something existing otherwise we will 
have too many slightly different variations of the same thing and that will be 
very difficult to manage. 

IMO this is a good opportunity to improve SpillableArrayListMultimapImpl and 
use that.


> POJOInnerJoinOperatorTest fails in Travis CI
> 
>
> Key: APEXMALHAR-2284
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Thomas Weise
>Assignee: Chaitanya
>Priority: Blocker
> Fix For: 3.6.0
>
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt
> {code}
> Failed tests: 
>   POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of 
> tuple emitted  expected:<2> but was:<4>
>   POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted 
>  expected:<1> but was:<2>
>   POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted  
> expected:<2> but was:<3>
>   POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple 
> emitted  expected:<1> but was:<2>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #439: APEXMALHAR-2272 : Fixed sequentialFileRead on...

2016-10-18 Thread yogidevendra
Github user yogidevendra closed the pull request at:

https://github.com/apache/apex-malhar/pull/439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2272) sequentialFileRead property on FSInputModule not functioning as expected

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585644#comment-15585644
 ] 

ASF GitHub Bot commented on APEXMALHAR-2272:


Github user yogidevendra closed the pull request at:

https://github.com/apache/apex-malhar/pull/439


> sequentialFileRead property on FSInputModule not functioning as expected
> 
>
> Key: APEXMALHAR-2272
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2272
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
>Priority: Minor
> Fix For: 3.6.0
>
>
> When there is single large file in the input directory, and we have multiple 
> partitions for BlockReader and sequencialFileRead is set to true.
> Only one BlockReader instance should be active; other BlockReader instances 
> should remain idle.
> This is because sequencialFileRead makes sure that blocks of the file are 
> read serially by same BlockReader instance. 
> Observed behavior is all BlockReader instances are reading data which means 
> sequencialFileRead property is not functioning as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI

2016-10-18 Thread Bhupesh Chawda (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585604#comment-15585604
 ] 

Bhupesh Chawda commented on APEXMALHAR-2284:


As far as I understand, the issue is with race conditions involved in 
simultaneous get() and put() on a store based on managed state.
For the join use case, asynchronous get() is important for efficiency reasons.

Upon some discussion with Chaitanya, here are the possible options:
1. Make all get() and put() calls blocking. That way the issue is avoided at 
the cost of decreased throughput.
2. Allow an option to specify the source (Bucket.ReadSource).
3  All data inserted by the put() call is stored in a separate space and 
synched with managed state at end window. This ensures that put() does not 
interfere with asynchronous get() calls. This is similar to the way 
SpillableArrayListMultimap works, but which cannot be used directly as it lacks 
some functionality like asynchronous get.
4. Override the getAsync() method to achieve the desired functionality.


> POJOInnerJoinOperatorTest fails in Travis CI
> 
>
> Key: APEXMALHAR-2284
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Thomas Weise
>Assignee: Chaitanya
>Priority: Blocker
> Fix For: 3.6.0
>
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt
> {code}
> Failed tests: 
>   POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of 
> tuple emitted  expected:<2> but was:<4>
>   POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted 
>  expected:<1> but was:<2>
>   POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted  
> expected:<2> but was:<3>
>   POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple 
> emitted  expected:<1> but was:<2>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #457: APEXMALHAR 2302 Add blockSize property to FSR...

2016-10-18 Thread deepak-narkhede
GitHub user deepak-narkhede reopened a pull request:

https://github.com/apache/apex-malhar/pull/457

APEXMALHAR 2302 Add blockSize property to FSRecordReaderModule

This change adds blockSize property from FileSplitter to 
FSRecordReaderModule.
Tested with RecordReader Application.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit 4d52444f40f8e5bd1eb99bb54ca49574439202d4
Author: deepak-narkhede 
Date:   2016-10-17T16:19:27Z

APEXMALHAR-2302 Add blockSize property to FSRecordReaderModule.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #457: APEXMALHAR 2302 Add blockSize property to FSR...

2016-10-18 Thread deepak-narkhede
Github user deepak-narkhede closed the pull request at:

https://github.com/apache/apex-malhar/pull/457


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXMALHAR-2303) S3 Line By Line Module

2016-10-18 Thread Ajay Gupta (JIRA)
Ajay Gupta created APEXMALHAR-2303:
--

 Summary: S3 Line By Line Module
 Key: APEXMALHAR-2303
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2303
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Ajay Gupta


This is a new module which will consist of 2 operators
1) File Splitter -- Already existing in Malhar library
2) S3RecordReader -- Read a file from S3 and output the records (delimited or 
fixed width) 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584614#comment-15584614
 ] 

ASF GitHub Bot commented on APEXMALHAR-2190:


Github user brightchen closed the pull request at:

https://github.com/apache/apex-malhar/pull/404


> Use reusable buffer to serial spillable data structure
> --
>
> Key: APEXMALHAR-2190
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Spillable Data Structure created lots of temporary memory to serial data lot 
> of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up 
> memory very quickly. See APEXMALHAR-2182.
> Use a shared memory to avoid allocate temporary memory and memory copy
> some basic ideas
> - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer 
> buffer): instead of create a memory and then return the serialized data, this 
> method let the caller pass in the buffer. So different objects or object with 
> embed objects can share the same LengthValueBuffer
> - LengthValueBuffer: It is a buffer which manage the memory as length and 
> value(which is the generic format of serialized data). which provide length 
> placeholder mechanism to avoid temporary memory and data copy when the length 
> can be know after data serialized
> - memory management classes: includes interface ByteStream and it's 
> implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism 
> to dynamic allocate and manage memory. Which basically provides following 
> function. I tried other some other stream mechamism such as 
> ByteArrayInputStream, but it can meet 3rd criteria, and don't have good 
> performance(50% loss) 
>   - dynamic allocate memory
>   - reset memory for reuse
>   - BlocksStream make sure the output slices will not be changed when need 
> extra memory; Block can change the reference of output slices buffer is data 
> was moved due to reallocate of memory(BlocksStream is better solution).
>   - WindowableBlocksStream extends from BlocksStream and provides function to 
> reset memory window by window instead of reset all memory. It provides 
> certain amount of cache( as bytes ) in memory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...

2016-10-18 Thread brightchen
Github user brightchen closed the pull request at:

https://github.com/apache/apex-malhar/pull/404


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584615#comment-15584615
 ] 

ASF GitHub Bot commented on APEXMALHAR-2190:


GitHub user brightchen reopened a pull request:

https://github.com/apache/apex-malhar/pull/404

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill…

…able data structure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #404


commit 5e5e968086a774ab8067ae7e6d29e31f3b68af54
Author: brightchen 
Date:   2016-08-16T00:46:27Z

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable 
data structure




> Use reusable buffer to serial spillable data structure
> --
>
> Key: APEXMALHAR-2190
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Spillable Data Structure created lots of temporary memory to serial data lot 
> of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up 
> memory very quickly. See APEXMALHAR-2182.
> Use a shared memory to avoid allocate temporary memory and memory copy
> some basic ideas
> - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer 
> buffer): instead of create a memory and then return the serialized data, this 
> method let the caller pass in the buffer. So different objects or object with 
> embed objects can share the same LengthValueBuffer
> - LengthValueBuffer: It is a buffer which manage the memory as length and 
> value(which is the generic format of serialized data). which provide length 
> placeholder mechanism to avoid temporary memory and data copy when the length 
> can be know after data serialized
> - memory management classes: includes interface ByteStream and it's 
> implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism 
> to dynamic allocate and manage memory. Which basically provides following 
> function. I tried other some other stream mechamism such as 
> ByteArrayInputStream, but it can meet 3rd criteria, and don't have good 
> performance(50% loss) 
>   - dynamic allocate memory
>   - reset memory for reuse
>   - BlocksStream make sure the output slices will not be changed when need 
> extra memory; Block can change the reference of output slices buffer is data 
> was moved due to reallocate of memory(BlocksStream is better solution).
>   - WindowableBlocksStream extends from BlocksStream and provides function to 
> reset memory window by window instead of reset all memory. It provides 
> certain amount of cache( as bytes ) in memory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...

2016-10-18 Thread brightchen
GitHub user brightchen reopened a pull request:

https://github.com/apache/apex-malhar/pull/404

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill…

…able data structure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #404


commit 5e5e968086a774ab8067ae7e6d29e31f3b68af54
Author: brightchen 
Date:   2016-08-16T00:46:27Z

APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable 
data structure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-561) Container info needs to be persisted even after application has been killed

2016-10-18 Thread Sanjay M Pujare (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584603#comment-15584603
 ] 

Sanjay M Pujare commented on APEXCORE-561:
--

[~davidyan] pls add your comments

> Container info needs to be persisted even after application has been killed
> ---
>
> Key: APEXCORE-561
> URL: https://issues.apache.org/jira/browse/APEXCORE-561
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sanjay M Pujare
>
> This is required so that info can be displayed (by Apex cli for example) even 
> for killed apps.
> At least one piece of info missing, which is the finishedTime. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)