[jira] [Commented] (CRUNCH-668) From.avroFile do not support globbing patterns (GenericData based overloads)
[ https://issues.apache.org/jira/browse/CRUNCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418653#comment-16418653 ] Clément MATHIEU commented on CRUNCH-668: Patch updated. It restores the ability to pass a file as path. New logic mimics what {{SourceTargetHelper#getPathSize}} does. My understanding is that it is what Crunch aims to support but a careful review is welcome as it seems easy to get it wrong. I also spotted a few places where globs are not supported. For example, passing a glob to a Source and materializing the resulting PCol fails while adding an intermediate identity DoFn makes it work. Unfortunately, I don't have time to fix them as they are not on my critical path. > From.avroFile do not support globbing patterns (GenericData based overloads) > > > Key: CRUNCH-668 > URL: https://issues.apache.org/jira/browse/CRUNCH-668 > Project: Crunch > Issue Type: Improvement > Components: Core >Affects Versions: 0.15.0 >Reporter: Clément MATHIEU >Assignee: Josh Wills >Priority: Major > Attachments: > 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil-v2.patch, > 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil.patch > > > GenericData based overloads of {{From.avroFile}} throws a RuntimeException > when a globbing pattern is provided. I see no reason to not support globbing > patterns here as it works fine with {{textFile}} and SpecificData based > overloads. > The issue is that the code extracting Avro schema from the first file use > {{listStatus}} rather than {{globStatus}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CRUNCH-668) From.avroFile do not support globbing patterns (GenericData based overloads)
[ https://issues.apache.org/jira/browse/CRUNCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418544#comment-16418544 ] Clément MATHIEU commented on CRUNCH-668: Hi Josh. Good catch, I forgot to run IT tests. I will update the patch. > From.avroFile do not support globbing patterns (GenericData based overloads) > > > Key: CRUNCH-668 > URL: https://issues.apache.org/jira/browse/CRUNCH-668 > Project: Crunch > Issue Type: Improvement > Components: Core >Affects Versions: 0.15.0 >Reporter: Clément MATHIEU >Assignee: Josh Wills >Priority: Major > Attachments: > 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil.patch > > > GenericData based overloads of {{From.avroFile}} throws a RuntimeException > when a globbing pattern is provided. I see no reason to not support globbing > patterns here as it works fine with {{textFile}} and SpecificData based > overloads. > The issue is that the code extracting Avro schema from the first file use > {{listStatus}} rather than {{globStatus}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CRUNCH-668) From.avroFile do not support globbing patterns (GenericData based overloads)
[ https://issues.apache.org/jira/browse/CRUNCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418475#comment-16418475 ] Josh Wills commented on CRUNCH-668: --- Hey [~clem...@unportant.info]! Thanks for this; I applied it to my master branch and got a few failures in tests in crunch-core and was wondering if you saw the same thing: Tests in error: AvroParquetFileSourceTargetIT.testCustomReadSchemaGeneric_FieldSuperset:258 » Runtime AvroParquetFileSourceTargetIT.testCustomReadSchemaWithProjection:297 » Runtime AvroParquetFileSourceTargetIT.testCustomReadSchema_FieldSubset:221 » Runtime E... > From.avroFile do not support globbing patterns (GenericData based overloads) > > > Key: CRUNCH-668 > URL: https://issues.apache.org/jira/browse/CRUNCH-668 > Project: Crunch > Issue Type: Improvement > Components: Core >Affects Versions: 0.15.0 >Reporter: Clément MATHIEU >Assignee: Josh Wills >Priority: Major > Attachments: > 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil.patch > > > GenericData based overloads of {{From.avroFile}} throws a RuntimeException > when a globbing pattern is provided. I see no reason to not support globbing > patterns here as it works fine with {{textFile}} and SpecificData based > overloads. > The issue is that the code extracting Avro schema from the first file use > {{listStatus}} rather than {{globStatus}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)