[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581093#comment-15581093 ] Kenneth Knowles commented on BEAM-741: -- Great investigation. I actually think the SDK should also always prefer the transform's coder. But, also, for input of type {{KV}}, the expected behavior is for the registry to associate the type {{V}} with the value coder and thus in this context provide exactly the same coder. So I'm going to reopen and see about both of these. > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575937#comment-15575937 ] Andrew Martin commented on BEAM-741: [~kenn] I dug into this more today, and found the specific reason for this failure - the inference process in Beam checks the coder registry, and if it doesn't find any will try to use a fallback coder provider. If it fails there, only then will it try to obtain the coder from the producing transform. In Scio we set our own fallback coder provider, so Beam will never end up using the output coder from the producing transform. So, in Scio we probably need to prefer using the default output coder of the producing transform, and fall back as a last resort. I will close this because it is an issue in Scio, not in Beam. Thanks! > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572443#comment-15572443 ] Andrew Martin commented on BEAM-741: [~kenn] After investigating further, it seems like the coder for the output of the values transform is not inferred correctly because of what appears to be some loss of type information in the type descriptor - the output of the `Values` transform should be of type Iterable but the raw type is just Object during the inference process, so the default coder provider is used (which we set in our own code). I'm part of a team at Spotify developing Scio (https://github.com/spotify/scio) and we have a work-in-progress branch for beam porting, and it is some tests in there that fail. I'd like to have a failing test written in the pure beam API so you can take a look - that being said, is it possible to invoke the @RunnableOnService tests locally using the direct runner? > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566814#comment-15566814 ] Kenneth Knowles commented on BEAM-741: -- Actually, what you say makes me want to investigate further. The registry and the coder inference process is expected to propagate coders in a case like that. I'll leave it up to you whether you want to pursue, but if you do feel like offering a snippet (or a pull request with a failing test :-) we'd definitely look into it. > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1554#comment-1554 ] Andrew Martin commented on BEAM-741: I see, so being explicit about setting the coder in Values is probably a duct-tape solution for a more fundamental problem we have, perhaps something to do with the coder registry (we have some custom coders so the problem is likely here). I will close this. > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566259#comment-15566259 ] Andrew Martin commented on BEAM-741: I found this issue when using the Direct Runner for a test, which does dynamic re-sharding as part of the Write transform. It does a GroupByKey -> Values -> Flatten, and Flatten failed because it did not have an 'IterableLikeCoder' as the input coder, which should be the case after calling Values form a GBK. > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable
[ https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566252#comment-15566252 ] Kenneth Knowles commented on BEAM-741: -- Can you provide a reproduction? This is fairly surprising, since the {{Values}} transform does not manipulate the coder. But it does infer it based on static types, so perhaps this is causing an unexpected coder to be inferred? > Values transform does not use the correct output coder when values is an > Iterable > > > Key: BEAM-741 > URL: https://issues.apache.org/jira/browse/BEAM-741 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Andrew Martin >Assignee: Davor Bonaci > -- This message was sent by Atlassian JIRA (v6.3.4#6332)