Re: [KUDOS] Contributed runner: Apache Apex!
Very nice ..:) I would like to contribute as well. Thanks, Chinmay. On Tue, Oct 18, 2016 at 2:19 AM, David Yanwrote: > I would also like to contribute. Thanks! > > David > > On Mon, Oct 17, 2016 at 10:01 AM, Thomas Weise wrote: > > > FYI, the first cut of Apex runner for Beam is in a feature branch. > > > > https://github.com/apache/incubator-beam/tree/apex-runner > > > > The next step will be to detail the work needed to make it a top-level > > runner and lay it out in JIRA. > > > > We will also be looking for folks that are interested to contribute, so > if > > you have time and interest, please raise your hand! > > > > Thanks, > > Thomas > > > > -- Forwarded message -- > > From: Kenneth Knowles > > Date: Mon, Oct 17, 2016 at 9:51 AM > > Subject: [KUDOS] Contributed runner: Apache Apex! > > To: "d...@beam.incubator.apache.org" > > > > > > Hi all, > > > > I would to, once again, call attention to a great addition to Beam: a > > runner for Apache Apex. > > > > After lots of review and much thoughtful revision, pull request #540 has > > been merged to the apex-runner feature branch today. Please do take a > look, > > and help us put the finishing touches on it to get it ready for the > master > > branch. > > > > And please also congratulate and thank Thomas Weise for this large > > endeavor, Vlad Rosov who helped get the integration tests working, and > > Guarav Gupta who contributed review comments. > > > > Kenn > > >
[jira] [Created] (APEXMALHAR-2308) BlockReader must consider fixed width length in emitted block alignment.
Deepak Narkhede created APEXMALHAR-2308: --- Summary: BlockReader must consider fixed width length in emitted block alignment. Key: APEXMALHAR-2308 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2308 Project: Apache Apex Malhar Issue Type: Bug Reporter: Deepak Narkhede Assignee: Deepak Narkhede Current BlockReader doesn't consider the fixed length mode while emitting blocks, as single tuple may be split across two blocks hence leading to data inconsistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure
[ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587668#comment-15587668 ] ASF GitHub Bot commented on APEXMALHAR-2190: GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/404 APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill… …able data structure You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #404 commit 66d8defb767ced52c6563c9710a832395269625e Author: brightchenDate: 2016-08-16T00:46:27Z APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable data structure > Use reusable buffer to serial spillable data structure > -- > > Key: APEXMALHAR-2190 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: bright chen >Assignee: bright chen > Original Estimate: 240h > Remaining Estimate: 240h > > Spillable Data Structure created lots of temporary memory to serial data lot > of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up > memory very quickly. See APEXMALHAR-2182. > Use a shared memory to avoid allocate temporary memory and memory copy > some basic ideas > - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer > buffer): instead of create a memory and then return the serialized data, this > method let the caller pass in the buffer. So different objects or object with > embed objects can share the same LengthValueBuffer > - LengthValueBuffer: It is a buffer which manage the memory as length and > value(which is the generic format of serialized data). which provide length > placeholder mechanism to avoid temporary memory and data copy when the length > can be know after data serialized > - memory management classes: includes interface ByteStream and it's > implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism > to dynamic allocate and manage memory. Which basically provides following > function. I tried other some other stream mechamism such as > ByteArrayInputStream, but it can meet 3rd criteria, and don't have good > performance(50% loss) > - dynamic allocate memory > - reset memory for reuse > - BlocksStream make sure the output slices will not be changed when need > extra memory; Block can change the reference of output slices buffer is data > was moved due to reallocate of memory(BlocksStream is better solution). > - WindowableBlocksStream extends from BlocksStream and provides function to > reset memory window by window instead of reset all memory. It provides > certain amount of cache( as bytes ) in memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure
[ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587667#comment-15587667 ] ASF GitHub Bot commented on APEXMALHAR-2190: Github user brightchen closed the pull request at: https://github.com/apache/apex-malhar/pull/404 > Use reusable buffer to serial spillable data structure > -- > > Key: APEXMALHAR-2190 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: bright chen >Assignee: bright chen > Original Estimate: 240h > Remaining Estimate: 240h > > Spillable Data Structure created lots of temporary memory to serial data lot > of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up > memory very quickly. See APEXMALHAR-2182. > Use a shared memory to avoid allocate temporary memory and memory copy > some basic ideas > - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer > buffer): instead of create a memory and then return the serialized data, this > method let the caller pass in the buffer. So different objects or object with > embed objects can share the same LengthValueBuffer > - LengthValueBuffer: It is a buffer which manage the memory as length and > value(which is the generic format of serialized data). which provide length > placeholder mechanism to avoid temporary memory and data copy when the length > can be know after data serialized > - memory management classes: includes interface ByteStream and it's > implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism > to dynamic allocate and manage memory. Which basically provides following > function. I tried other some other stream mechamism such as > ByteArrayInputStream, but it can meet 3rd criteria, and don't have good > performance(50% loss) > - dynamic allocate memory > - reset memory for reuse > - BlocksStream make sure the output slices will not be changed when need > extra memory; Block can change the reference of output slices buffer is data > was moved due to reallocate of memory(BlocksStream is better solution). > - WindowableBlocksStream extends from BlocksStream and provides function to > reset memory window by window instead of reset all memory. It provides > certain amount of cache( as bytes ) in memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...
Github user brightchen closed the pull request at: https://github.com/apache/apex-malhar/pull/404 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...
GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/404 APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill⦠â¦able data structure You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #404 commit 66d8defb767ced52c6563c9710a832395269625e Author: brightchenDate: 2016-08-16T00:46:27Z APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable data structure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (APEXMALHAR-2303) S3 Line By Line Module
[ https://issues.apache.org/jira/browse/APEXMALHAR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Gupta reassigned APEXMALHAR-2303: -- Assignee: Ajay Gupta > S3 Line By Line Module > -- > > Key: APEXMALHAR-2303 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2303 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Ajay Gupta >Assignee: Ajay Gupta > Original Estimate: 336h > Remaining Estimate: 336h > > This is a new module which will consist of 2 operators > 1) File Splitter -- Already existing in Malhar library > 2) S3RecordReader -- Read a file from S3 and output the records (delimited or > fixed width) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (APEXMALHAR-2306) Tests should allow for additions to OperatorContext interface
[ https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated APEXMALHAR-2306: - Summary: Tests should allow for additions to OperatorContext interface (was: malhar unit test(s) have a dependency on OperatorContext in apex-core) > Tests should allow for additions to OperatorContext interface > - > > Key: APEXMALHAR-2306 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306 > Project: Apache Apex Malhar > Issue Type: Bug > Components: test benches >Reporter: Sanjay M Pujare > > While trying to build/test malhar against the latest apex-core, the following > failure was noticed > .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17] > com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext > is not abstract and does not override abstract method getName() in > com.datatorrent.api.Context.OperatorContext > Ideally, malhar should mock up the OperatorContext interface or we should > find other ways to eliminate this kind of dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2306) malhar unit test(s) have a dependency on OperatorContext in apex-core
[ https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587351#comment-15587351 ] Thomas Weise commented on APEXMALHAR-2306: -- Have a look at FSInputModuleTest, for example. Since the tests that use the test context were written with constructor dependency, it will be more work than just changing a factory method, but looks straightforward. > malhar unit test(s) have a dependency on OperatorContext in apex-core > - > > Key: APEXMALHAR-2306 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306 > Project: Apache Apex Malhar > Issue Type: Bug > Components: test benches >Reporter: Sanjay M Pujare > > While trying to build/test malhar against the latest apex-core, the following > failure was noticed > .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17] > com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext > is not abstract and does not override abstract method getName() in > com.datatorrent.api.Context.OperatorContext > Ideally, malhar should mock up the OperatorContext interface or we should > find other ways to eliminate this kind of dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2307) Session windows are not deleted properly after merge or extend in InMemorySessionWindowedStorage
David Yan created APEXMALHAR-2307: - Summary: Session windows are not deleted properly after merge or extend in InMemorySessionWindowedStorage Key: APEXMALHAR-2307 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2307 Project: Apache Apex Malhar Issue Type: Bug Reporter: David Yan Assignee: David Yan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2306) malhar unit test(s) have a dependency on OperatorContext in apex-core
[ https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587143#comment-15587143 ] ASF GitHub Bot commented on APEXMALHAR-2306: Github user sanjaypujare closed the pull request at: https://github.com/apache/apex-malhar/pull/459 > malhar unit test(s) have a dependency on OperatorContext in apex-core > - > > Key: APEXMALHAR-2306 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306 > Project: Apache Apex Malhar > Issue Type: Bug > Components: test benches >Reporter: Sanjay M Pujare > > While trying to build/test malhar against the latest apex-core, the following > failure was noticed > .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17] > com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext > is not abstract and does not override abstract method getName() in > com.datatorrent.api.Context.OperatorContext > Ideally, malhar should mock up the OperatorContext interface or we should > find other ways to eliminate this kind of dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #459: APEXMALHAR-2306 fix OperatorContext implement...
Github user sanjaypujare closed the pull request at: https://github.com/apache/apex-malhar/pull/459 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2306) malhar unit test(s) have a dependency on OperatorContext in apex-core
[ https://issues.apache.org/jira/browse/APEXMALHAR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587040#comment-15587040 ] ASF GitHub Bot commented on APEXMALHAR-2306: GitHub user sanjaypujare opened a pull request: https://github.com/apache/apex-malhar/pull/459 APEXMALHAR-2306 fix OperatorContext implementations to add the new required method @vrozov these changes were required to run malhar successfully against latest apex-core. This is just for review for now You can merge this pull request into a Git repository by running: $ git pull https://github.com/sanjaypujare/apex-malhar apex-core-3.5.0-changes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/459.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #459 commit 7d106b4abd42dcf4e9e7bf54a63eb042bcdc32bb Author: Sanjay PujareDate: 2016-10-18T23:35:35Z fix OperatorContext implementations to add the new required method > malhar unit test(s) have a dependency on OperatorContext in apex-core > - > > Key: APEXMALHAR-2306 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2306 > Project: Apache Apex Malhar > Issue Type: Bug > Components: test benches >Reporter: Sanjay M Pujare > > While trying to build/test malhar against the latest apex-core, the following > failure was noticed > .../apex-malhar/library/src/test/java/com/datatorrent/lib/helper/OperatorContextTestHelper.java:[47,17] > com.datatorrent.lib.helper.OperatorContextTestHelper.TestIdOperatorContext > is not abstract and does not override abstract method getName() in > com.datatorrent.api.Context.OperatorContext > Ideally malhar should mock up the OperstorContext interface or we should find > other ways to eliminate this kind of dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2305) Change implementation of session window to reflect what is described in streaming 102 blog
[ https://issues.apache.org/jira/browse/APEXMALHAR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586908#comment-15586908 ] ASF GitHub Bot commented on APEXMALHAR-2305: GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/458 APEXMALHAR-2305 #resolve Mirror the proto-session window behavior des… …cribed in the streaming 102 blog @siyuanh please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2305 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/458.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #458 commit 0e598eba46fc39e706eda2ff9a7eedd1c8b0f2de Author: David YanDate: 2016-10-18T22:41:28Z APEXMALHAR-2305 #resolve Mirror the proto-session window behavior described in the streaming 102 blog > Change implementation of session window to reflect what is described in > streaming 102 blog > -- > > Key: APEXMALHAR-2305 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2305 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: David Yan >Assignee: David Yan > > The proto-session windows described in the streaming 102 blog have a minimum > duration that is equal to the session gap. We should do the same in our > session window implementation. > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2305) Change implementation of session window to reflect what is described in streaming 102 blog
David Yan created APEXMALHAR-2305: - Summary: Change implementation of session window to reflect what is described in streaming 102 blog Key: APEXMALHAR-2305 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2305 Project: Apache Apex Malhar Issue Type: Bug Reporter: David Yan Assignee: David Yan The proto-session windows described in the streaming 102 blog have a minimum duration that is equal to the session gap. We should do the same in our session window implementation. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-561) Container info needs to be persisted even after application has been killed
[ https://issues.apache.org/jira/browse/APEXCORE-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586503#comment-15586503 ] David Yan commented on APEXCORE-561: This is a good feature to have to diagnostics. > Container info needs to be persisted even after application has been killed > --- > > Key: APEXCORE-561 > URL: https://issues.apache.org/jira/browse/APEXCORE-561 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare > > This is required so that info can be displayed (by Apex cli for example) even > for killed apps. > At least one piece of info missing, which is the finishedTime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (APEXMALHAR-2299) TimeBasedDedupOperator throws exception during time bucket assignment in certain edge cases
[ https://issues.apache.org/jira/browse/APEXMALHAR-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh resolved APEXMALHAR-2299. --- Resolution: Fixed > TimeBasedDedupOperator throws exception during time bucket assignment in > certain edge cases > --- > > Key: APEXMALHAR-2299 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2299 > Project: Apache Apex Malhar > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Francis Fernandes >Assignee: Francis Fernandes > Fix For: 3.6.0 > > Original Estimate: 96h > Remaining Estimate: 96h > > The following exception is thrown under certain edge cases: > {noformat} > Stopped running due to an exception. java.lang.IllegalArgumentException: new > time bucket should have a value greater than the old time bucket > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.apex.malhar.lib.state.managed.ManagedTimeUnifiedStateImpl.handleBucketConflict(ManagedTimeUnifiedStateImpl.java:126) > at > org.apache.apex.malhar.lib.state.managed.AbstractManagedStateImpl.prepareBucket(AbstractManagedStateImpl.java:265) > at > org.apache.apex.malhar.lib.state.managed.AbstractManagedStateImpl.putInBucket(AbstractManagedStateImpl.java:276) > at > org.apache.apex.malhar.lib.state.managed.ManagedTimeUnifiedStateImpl.put(ManagedTimeUnifiedStateImpl.java:71) > at > org.apache.apex.malhar.lib.dedup.TimeBasedDedupOperator.putManagedState(TimeBasedDedupOperator.java:191) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-core pull request #411: APEXCORE-510 Implement check in DefaultOutputPo...
GitHub user sanjaypujare opened a pull request: https://github.com/apache/apex-core/pull/411 APEXCORE-510 Implement check in DefaultOutputPort for thread affinity between setup and emit @vrozov pls review and merge as appropriate You can merge this pull request into a Git repository by running: $ git pull https://github.com/sanjaypujare/apex-core malhar-510.thread_affinity Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-core/pull/411.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #411 commit 41f0c94409891f37478ca5e3f72daefaddc1fcaf Author: Sanjay PujareDate: 2016-09-29T00:23:06Z ALEXCORE-510 thread affinity check also implement a property to suppress thread affinity checks: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] apex-malhar pull request #454: APEXMALHAR-2299 Modified to move window when ...
Github user asfgit closed the pull request at: https://github.com/apache/apex-malhar/pull/454 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (APEXMALHAR-2304) Apex SQL: Add examples for SQL in Apex in demos folder
Chinmay Kolhatkar created APEXMALHAR-2304: - Summary: Apex SQL: Add examples for SQL in Apex in demos folder Key: APEXMALHAR-2304 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2304 Project: Apache Apex Malhar Issue Type: New Feature Reporter: Chinmay Kolhatkar Assignee: Chinmay Kolhatkar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI
[ https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585751#comment-15585751 ] Chandni Singh edited comment on APEXMALHAR-2284 at 10/18/16 3:40 PM: - One concept that I think I should point is how getAsync improves performance is that if the value is not available immediately, then the client can cache the future and work on other keys. Look at the implementations of AbstractDeduper. 1. If SpillableArrayListMultimap only lacks asynchronous get, then that can be added to it. That should not be the only reason to duplicate the functionality. 2. ManagedTimeMultiValueState has no tests and is full of warnings. We should not create a complete separate implementation which duplicates a larger part of functionality provided by something existing otherwise we will have too many slightly different variations of the same thing and that will be very difficult to manage. IMO this is a good opportunity to improve SpillableArrayListMultimapImpl and use that. was (Author: csingh): One concept that I think I should point is that getAsync improves performance is that if the value is not available immediately, then the client can cache the future and work on other keys. Look at the implementations of AbstractDeduper. 1. If SpillableArrayListMultimap only lacks asynchronous get, then that can be added to it. That should not be the only reason to duplicate the functionality. 2. ManagedTimeMultiValueState has no tests and is full of warnings. We should not create a complete separate implementation which duplicates a larger part of functionality provided by something existing otherwise we will have too many slightly different variations of the same thing and that will be very difficult to manage. IMO this is a good opportunity to improve SpillableArrayListMultimapImpl and use that. > POJOInnerJoinOperatorTest fails in Travis CI > > > Key: APEXMALHAR-2284 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Thomas Weise >Assignee: Chaitanya >Priority: Blocker > Fix For: 3.6.0 > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt > {code} > Failed tests: > POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of > tuple emitted expected:<2> but was:<4> > POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted > expected:<1> but was:<2> > POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted > expected:<2> but was:<3> > POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple > emitted expected:<1> but was:<2> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI
[ https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585751#comment-15585751 ] Chandni Singh commented on APEXMALHAR-2284: --- One concept that I think I should point is that getAsync improves performance is that if the value is not available immediately, then the client can cache the future and work on other keys. Look at the implementations of AbstractDeduper. 1. If SpillableArrayListMultimap only lacks asynchronous get, then that can be added to it. That should not be the only reason to duplicate the functionality. 2. ManagedTimeMultiValueState has no tests and is full of warnings. We should not create a complete separate implementation which duplicates a larger part of functionality provided by something existing otherwise we will have too many slightly different variations of the same thing and that will be very difficult to manage. IMO this is a good opportunity to improve SpillableArrayListMultimapImpl and use that. > POJOInnerJoinOperatorTest fails in Travis CI > > > Key: APEXMALHAR-2284 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Thomas Weise >Assignee: Chaitanya >Priority: Blocker > Fix For: 3.6.0 > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt > {code} > Failed tests: > POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of > tuple emitted expected:<2> but was:<4> > POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted > expected:<1> but was:<2> > POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted > expected:<2> but was:<3> > POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple > emitted expected:<1> but was:<2> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #439: APEXMALHAR-2272 : Fixed sequentialFileRead on...
Github user yogidevendra closed the pull request at: https://github.com/apache/apex-malhar/pull/439 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2272) sequentialFileRead property on FSInputModule not functioning as expected
[ https://issues.apache.org/jira/browse/APEXMALHAR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585644#comment-15585644 ] ASF GitHub Bot commented on APEXMALHAR-2272: Github user yogidevendra closed the pull request at: https://github.com/apache/apex-malhar/pull/439 > sequentialFileRead property on FSInputModule not functioning as expected > > > Key: APEXMALHAR-2272 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2272 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Yogi Devendra >Assignee: Yogi Devendra >Priority: Minor > Fix For: 3.6.0 > > > When there is single large file in the input directory, and we have multiple > partitions for BlockReader and sequencialFileRead is set to true. > Only one BlockReader instance should be active; other BlockReader instances > should remain idle. > This is because sequencialFileRead makes sure that blocks of the file are > read serially by same BlockReader instance. > Observed behavior is all BlockReader instances are reading data which means > sequencialFileRead property is not functioning as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI
[ https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585604#comment-15585604 ] Bhupesh Chawda commented on APEXMALHAR-2284: As far as I understand, the issue is with race conditions involved in simultaneous get() and put() on a store based on managed state. For the join use case, asynchronous get() is important for efficiency reasons. Upon some discussion with Chaitanya, here are the possible options: 1. Make all get() and put() calls blocking. That way the issue is avoided at the cost of decreased throughput. 2. Allow an option to specify the source (Bucket.ReadSource). 3 All data inserted by the put() call is stored in a separate space and synched with managed state at end window. This ensures that put() does not interfere with asynchronous get() calls. This is similar to the way SpillableArrayListMultimap works, but which cannot be used directly as it lacks some functionality like asynchronous get. 4. Override the getAsync() method to achieve the desired functionality. > POJOInnerJoinOperatorTest fails in Travis CI > > > Key: APEXMALHAR-2284 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Thomas Weise >Assignee: Chaitanya >Priority: Blocker > Fix For: 3.6.0 > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt > {code} > Failed tests: > POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of > tuple emitted expected:<2> but was:<4> > POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted > expected:<1> but was:<2> > POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted > expected:<2> but was:<3> > POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple > emitted expected:<1> but was:<2> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #457: APEXMALHAR 2302 Add blockSize property to FSR...
GitHub user deepak-narkhede reopened a pull request: https://github.com/apache/apex-malhar/pull/457 APEXMALHAR 2302 Add blockSize property to FSRecordReaderModule This change adds blockSize property from FileSplitter to FSRecordReaderModule. Tested with RecordReader Application. You can merge this pull request into a Git repository by running: $ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2302 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/457.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #457 commit 4d52444f40f8e5bd1eb99bb54ca49574439202d4 Author: deepak-narkhedeDate: 2016-10-17T16:19:27Z APEXMALHAR-2302 Add blockSize property to FSRecordReaderModule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] apex-malhar pull request #457: APEXMALHAR 2302 Add blockSize property to FSR...
Github user deepak-narkhede closed the pull request at: https://github.com/apache/apex-malhar/pull/457 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (APEXMALHAR-2303) S3 Line By Line Module
Ajay Gupta created APEXMALHAR-2303: -- Summary: S3 Line By Line Module Key: APEXMALHAR-2303 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2303 Project: Apache Apex Malhar Issue Type: Bug Reporter: Ajay Gupta This is a new module which will consist of 2 operators 1) File Splitter -- Already existing in Malhar library 2) S3RecordReader -- Read a file from S3 and output the records (delimited or fixed width) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure
[ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584614#comment-15584614 ] ASF GitHub Bot commented on APEXMALHAR-2190: Github user brightchen closed the pull request at: https://github.com/apache/apex-malhar/pull/404 > Use reusable buffer to serial spillable data structure > -- > > Key: APEXMALHAR-2190 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: bright chen >Assignee: bright chen > Original Estimate: 240h > Remaining Estimate: 240h > > Spillable Data Structure created lots of temporary memory to serial data lot > of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up > memory very quickly. See APEXMALHAR-2182. > Use a shared memory to avoid allocate temporary memory and memory copy > some basic ideas > - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer > buffer): instead of create a memory and then return the serialized data, this > method let the caller pass in the buffer. So different objects or object with > embed objects can share the same LengthValueBuffer > - LengthValueBuffer: It is a buffer which manage the memory as length and > value(which is the generic format of serialized data). which provide length > placeholder mechanism to avoid temporary memory and data copy when the length > can be know after data serialized > - memory management classes: includes interface ByteStream and it's > implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism > to dynamic allocate and manage memory. Which basically provides following > function. I tried other some other stream mechamism such as > ByteArrayInputStream, but it can meet 3rd criteria, and don't have good > performance(50% loss) > - dynamic allocate memory > - reset memory for reuse > - BlocksStream make sure the output slices will not be changed when need > extra memory; Block can change the reference of output slices buffer is data > was moved due to reallocate of memory(BlocksStream is better solution). > - WindowableBlocksStream extends from BlocksStream and provides function to > reset memory window by window instead of reset all memory. It provides > certain amount of cache( as bytes ) in memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...
Github user brightchen closed the pull request at: https://github.com/apache/apex-malhar/pull/404 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2190) Use reusable buffer to serial spillable data structure
[ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584615#comment-15584615 ] ASF GitHub Bot commented on APEXMALHAR-2190: GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/404 APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill… …able data structure You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #404 commit 5e5e968086a774ab8067ae7e6d29e31f3b68af54 Author: brightchenDate: 2016-08-16T00:46:27Z APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable data structure > Use reusable buffer to serial spillable data structure > -- > > Key: APEXMALHAR-2190 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: bright chen >Assignee: bright chen > Original Estimate: 240h > Remaining Estimate: 240h > > Spillable Data Structure created lots of temporary memory to serial data lot > of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up > memory very quickly. See APEXMALHAR-2182. > Use a shared memory to avoid allocate temporary memory and memory copy > some basic ideas > - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer > buffer): instead of create a memory and then return the serialized data, this > method let the caller pass in the buffer. So different objects or object with > embed objects can share the same LengthValueBuffer > - LengthValueBuffer: It is a buffer which manage the memory as length and > value(which is the generic format of serialized data). which provide length > placeholder mechanism to avoid temporary memory and data copy when the length > can be know after data serialized > - memory management classes: includes interface ByteStream and it's > implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism > to dynamic allocate and manage memory. Which basically provides following > function. I tried other some other stream mechamism such as > ByteArrayInputStream, but it can meet 3rd criteria, and don't have good > performance(50% loss) > - dynamic allocate memory > - reset memory for reuse > - BlocksStream make sure the output slices will not be changed when need > extra memory; Block can change the reference of output slices buffer is data > was moved due to reallocate of memory(BlocksStream is better solution). > - WindowableBlocksStream extends from BlocksStream and provides function to > reset memory window by window instead of reset all memory. It provides > certain amount of cache( as bytes ) in memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #404: APEXMALHAR-2190 #resolve #comment Use reusabl...
GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/404 APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill⦠â¦able data structure You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #404 commit 5e5e968086a774ab8067ae7e6d29e31f3b68af54 Author: brightchenDate: 2016-08-16T00:46:27Z APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable data structure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXCORE-561) Container info needs to be persisted even after application has been killed
[ https://issues.apache.org/jira/browse/APEXCORE-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584603#comment-15584603 ] Sanjay M Pujare commented on APEXCORE-561: -- [~davidyan] pls add your comments > Container info needs to be persisted even after application has been killed > --- > > Key: APEXCORE-561 > URL: https://issues.apache.org/jira/browse/APEXCORE-561 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare > > This is required so that info can be displayed (by Apex cli for example) even > for killed apps. > At least one piece of info missing, which is the finishedTime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)