[jira] [Commented] (APEXMALHAR-2327) BucketsFileSystem.writeBucketData() call Slice.toByteArray() cause allocate unnecessary memory

2016-11-03 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634555#comment-15634555 ] ASF GitHub Bot commented on APEXMALHAR-2327: GitHub user brightchen opened a pull

[GitHub] apex-malhar pull request #482: APEXMALHAR-2327 #resolve #comment BucketsFile...

2016-11-03 Thread brightchen
GitHub user brightchen opened a pull request: https://github.com/apache/apex-malhar/pull/482 APEXMALHAR-2327 #resolve #comment BucketsFileSystem.writeBucketData()… … call Slice.toByteArray() cause allocate unnecessary memory You can merge this pull request into a Git repository

[jira] [Updated] (APEXMALHAR-2327) BucketsFileSystem.writeBucketData() call Slice.toByteArray() cause allocate unnecessary memory

2016-11-03 Thread bright chen (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bright chen updated APEXMALHAR-2327: Remaining Estimate: 2m (was: 120h) Original Estimate: 2m (was: 120h) >

[jira] [Commented] (APEXMALHAR-2327) BucketsFileSystem.writeBucketData() call Slice.toByteArray() cause allocate unnecessary memory

2016-11-03 Thread bright chen (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634460#comment-15634460 ] bright chen commented on APEXMALHAR-2327: - Discussed with David, instead add new method

[jira] [Commented] (APEXMALHAR-2320) FSWindowDataManager.toSlice() can cause lots of garbage collection

2016-11-03 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634367#comment-15634367 ] ASF GitHub Bot commented on APEXMALHAR-2320: Github user brightchen closed the pull

[jira] [Commented] (APEXMALHAR-2320) FSWindowDataManager.toSlice() can cause lots of garbage collection

2016-11-03 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634368#comment-15634368 ] ASF GitHub Bot commented on APEXMALHAR-2320: GitHub user brightchen reopened a pull

[GitHub] apex-malhar pull request #477: APEXMALHAR-2320 #resolve #comment use Seriali...

2016-11-03 Thread brightchen
Github user brightchen closed the pull request at: https://github.com/apache/apex-malhar/pull/477 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] apex-malhar pull request #477: APEXMALHAR-2320 #resolve #comment use Seriali...

2016-11-03 Thread brightchen
GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/477 APEXMALHAR-2320 #resolve #comment use SerializationBuffer implement F… …SWindowDataManager.toSlice() You can merge this pull request into a Git repository by running: $ git pull

[jira] [Commented] (APEXMALHAR-2327) BucketsFileSystem.writeBucketData() call Slice.toByteArray() cause allocate unnecessary memory

2016-11-03 Thread bright chen (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634322#comment-15634322 ] bright chen commented on APEXMALHAR-2327: - [~csingh][~davidyan] Please comment if ok to add a

[jira] [Commented] (APEXCORE-561) Container info needs to be persisted even after application has been killed

2016-11-03 Thread David Yan (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633806#comment-15633806 ] David Yan commented on APEXCORE-561: The finish time is supplied by Yarn's ContainerReport. >

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread David Yan
Hi Bhupesh, Since each input port has its own incoming control tuple, I would imagine there would be an additional DefaultInputPort.processControl method that operator developers can override. If we go for option 1, my thinking is that the control tuples would always be delivered at the next

[GitHub] apex-malhar pull request #481: APEXMALHAR-2321 #resolve #comment Improve Buc...

2016-11-03 Thread brightchen
GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/481 APEXMALHAR-2321 #resolve #comment Improve Buckets memory management You can merge this pull request into a Git repository by running: $ git pull

[jira] [Commented] (APEXMALHAR-2321) Improve Buckets memory management

2016-11-03 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633516#comment-15633516 ] ASF GitHub Bot commented on APEXMALHAR-2321: GitHub user brightchen reopened a pull

[jira] [Commented] (APEXMALHAR-2321) Improve Buckets memory management

2016-11-03 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633514#comment-15633514 ] ASF GitHub Bot commented on APEXMALHAR-2321: Github user brightchen closed the pull

[GitHub] apex-malhar pull request #481: APEXMALHAR-2321 #resolve #comment Improve Buc...

2016-11-03 Thread brightchen
Github user brightchen closed the pull request at: https://github.com/apache/apex-malhar/pull/481 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Pramod Immaneni
The control tuple could be delivered to the operator only after it is received from all upstream partitions but still allow other data from an upstream partition after it's control tuple is received, we don't have to necessarily block and do complete synchronization like in end window. You are

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Bhupesh Chawda
I have a question regarding the callback for a control tuple. Will it be similar to InputPort::process() method? Something like InputPort::processControlTuple(t) ? Or will it be a method of the operator similar to beginWindow()? When we say that the control tuple will be delivered at window

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Thomas Weise
I don't see how that would work. Suppose you have a file splitter and multiple partitions of block readers. The "end of file" event cannot be processed downstream until all block readers are done. I also think that this is related to the batch demarcation discussion and there should be a single

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Thomas Weise
There is not guarantee about the ordering of events within a streaming window with multiple upstream partitions. This would require a synchronization logic similar to what the streaming window provides, hence I would expect it to be best supported as part of the same window synchronization. On

Re: Enhance batch support - batch demarcation

2016-11-03 Thread Thomas Weise
I don't think it is necessary to do special things for "batch" here. Input can be bounded or unbounded. When input is bounded, then the input operator should raise the shutdown exception, which will gracefully terminate the downstream operators. Downstream operators may want to know about the end

Re: Enhance batch support - batch demarcation

2016-11-03 Thread Bhupesh Chawda
Hi All, Starting with the implementation, we are planning to take care of a single batch job first. We will take up the scheduling aspect later. The first requirement is the following: A batch job is an Apex application which picks up data from the source, and processes it. Once the data is