[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443244#comment-16443244 ] Ismaël Mejía commented on BEAM-4016: I printed for SDFWithLifecycle in SplittableDoFnTest every call to the id of the thread and the call to the method. Take a look at this commit https://github.com/iemejia/beam/commit/75d8279858f2f4f0cd35af7a117ddb7ae7eb1e3a This produces this output: [code] 18 GetInitialRestriction c 19 GetInitialRestriction b 15 GetInitialRestriction a 13 Setup 14 Setup 18 Setup 18 SplitRestriction a - [0, 1) 14 SplitRestriction c - [0, 1) 13 SplitRestriction b - [0, 1) 18 Setup 17 Setup 13 Setup 17 ProcessElement b - OffsetRangeTracker{range=[0, 1), lastClaimedOffset=null, lastAttemptedOffset=null} 18 ProcessElement c - OffsetRangeTracker{range=[0, 1), lastClaimedOffset=null, lastAttemptedOffset=null} 13 ProcessElement a - OffsetRangeTracker{range=[0, 1), lastClaimedOffset=null, lastAttemptedOffset=null} 17 Teardown 17 Teardown 17 Teardown [code] > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > The method annotated with @SplitRestriction is the method where we can define > the RestrictionTrackers (splits) in advance in a SDF. It makes sense to > execute this after the @Setup method given that usually connections are > established at Setup and can be used to ask the different data stores about > the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443139#comment-16443139 ] Eugene Kirpichov commented on BEAM-4016: Can you clarify about calling it more times than expected? Do you mean that it gets called more than once on the particular in-memory Java instance of the SDF on which SplitRestriction is called? > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > The method annotated with @SplitRestriction is the method where we can define > the RestrictionTrackers (splits) in advance in a SDF. It makes sense to > execute this after the @Setup method given that usually connections are > established at Setup and can be used to ask the different data stores about > the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439429#comment-16439429 ] Ismaël Mejía commented on BEAM-4016: I tried that workaround but it ends up calling the setup method more times than expected. Any other ideas, can you take a look? or help me reassign to someone who can (I have the impression [~tgroh] is not available, and I cannot take a serious look into this for now). > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > The method annotated with @SplitRestriction is the method where we can define > the RestrictionTrackers (splits) in advance in a SDF. It makes sense to > execute this after the @Setup method given that usually connections are > established at Setup and can be used to ask the different data stores about > the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439412#comment-16439412 ] Romain Manni-Bucau commented on BEAM-4016: -- PS: don't forget the teardown mapping for any instance (or use the same caching hack direct runer has) > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > The method annotated with @SplitRestriction is the method where we can define > the RestrictionTrackers (splits) in advance in a SDF. It makes sense to > execute this after the @Setup method given that usually connections are > established at Setup and can be used to ask the different data stores about > the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437961#comment-16437961 ] Eugene Kirpichov commented on BEAM-4016: Yeah it's the desired order. The fix is to add a call to invoker.invokeSetup() to https://github.com/apache/beam/blob/6107c314e7a1af0d29b3cd865c0dee1013c2261c/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L357 . > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > The method annotated with @SplitRestriction is the method where we can define > the RestrictionTrackers (splits) in advance in a SDF. It makes sense to > execute this after the @Setup method given that usually connections are > established at Setup and can be used to ask the different data stores about > the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435262#comment-16435262 ] Ismaël Mejía commented on BEAM-4016: Since it seems you are back [~jkff] can you confirm if this is the desired order so I can open the corresponding issues for Spark and Dataflow too. > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > The method annotated with @SplitRestriction is the method where we can define > the RestrictionTrackers (splits) in advance in a SDF. It makes sense to > execute this after the @Setup method given that usually connections are > established at Setup and can be used to ask the different data stores about > the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426988#comment-16426988 ] Ismaël Mejía commented on BEAM-4016: [~jkff] Can you confirm if the lifecycle is as I mention? > Direct runner incorrect lifecycle, @SplitRestriction should execute after > @Setup on SplittableDoFn > -- > > Key: BEAM-4016 > URL: https://issues.apache.org/jira/browse/BEAM-4016 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.4.0 >Reporter: Ismaël Mejía >Assignee: Thomas Groh >Priority: Major > Attachments: sdf-splitrestriction-lifeycle-test.patch > > > SplitRestriction is the method where we can split in advance a SDF. It makes > sense to execute this after the @Setup method given that usually connections > are established at Setup and can be used to ask the different data stores > about the partitioning strategy. I added a test for this in the > SplittableDoFnTest.SDFWithLifecycle test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)