[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443244#comment-16443244
 ] 

Ismaël Mejía commented on BEAM-4016:


I printed for SDFWithLifecycle in SplittableDoFnTest every call to the id of 
the thread and the call to the method. Take a look at this commit
https://github.com/iemejia/beam/commit/75d8279858f2f4f0cd35af7a117ddb7ae7eb1e3a

This produces this output:

[code]
18 GetInitialRestriction c
19 GetInitialRestriction b
15 GetInitialRestriction a
13 Setup
14 Setup
18 Setup
18 SplitRestriction a - [0, 1)
14 SplitRestriction c - [0, 1)
13 SplitRestriction b - [0, 1)
18 Setup
17 Setup
13 Setup
17 ProcessElement b - OffsetRangeTracker{range=[0, 1), lastClaimedOffset=null, 
lastAttemptedOffset=null}
18 ProcessElement c - OffsetRangeTracker{range=[0, 1), lastClaimedOffset=null, 
lastAttemptedOffset=null}
13 ProcessElement a - OffsetRangeTracker{range=[0, 1), lastClaimedOffset=null, 
lastAttemptedOffset=null}
17 Teardown
17 Teardown
17 Teardown
[code]



> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-18 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443139#comment-16443139
 ] 

Eugene Kirpichov commented on BEAM-4016:


Can you clarify about calling it more times than expected? Do you mean that it 
gets called more than once on the particular in-memory Java instance of the SDF 
on which SplitRestriction is called?

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439429#comment-16439429
 ] 

Ismaël Mejía commented on BEAM-4016:


I tried that workaround but it ends up calling the setup method more times than 
expected. Any other ideas, can you take a look? or help me reassign to someone 
who can (I have the impression [~tgroh] is not available, and I cannot take a 
serious look into this for now).

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-16 Thread Romain Manni-Bucau (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439412#comment-16439412
 ] 

Romain Manni-Bucau commented on BEAM-4016:
--

PS: don't forget the teardown mapping for any instance (or use the same caching 
hack direct runer has)

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-13 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437961#comment-16437961
 ] 

Eugene Kirpichov commented on BEAM-4016:


Yeah it's the desired order. The fix is to add a call to invoker.invokeSetup() 
to 
https://github.com/apache/beam/blob/6107c314e7a1af0d29b3cd865c0dee1013c2261c/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L357
 . 

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435262#comment-16435262
 ] 

Ismaël Mejía commented on BEAM-4016:


Since it seems you are back [~jkff] can you confirm if this is the desired 
order so I can open the corresponding issues for Spark and Dataflow too.

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426988#comment-16426988
 ] 

Ismaël Mejía commented on BEAM-4016:


[~jkff] Can you confirm if the lifecycle is as I mention?

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> SplitRestriction is the method where we can split in advance a SDF. It makes 
> sense to execute this after the @Setup method given that usually connections 
> are established at Setup and can be used to ask the different data stores 
> about the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)