[jira] [Commented] (CRUNCH-668) From.avroFile do not support globbing patterns (GenericData based overloads)

2018-03-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CRUNCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418653#comment-16418653
 ] 

Clément MATHIEU commented on CRUNCH-668:


Patch updated. 

It restores the ability to pass a file as path. New logic mimics what 
{{SourceTargetHelper#getPathSize}} does. My understanding is that it is what 
Crunch aims to support but a careful review is welcome as it seems easy to get 
it wrong.

I also spotted a few places where globs are not supported. For example, passing 
a glob to a Source and materializing the resulting PCol fails while adding an 
intermediate identity DoFn makes it work. Unfortunately, I don't have time to 
fix them as they are not on my critical path.

> From.avroFile do not support globbing patterns (GenericData based overloads)
> 
>
> Key: CRUNCH-668
> URL: https://issues.apache.org/jira/browse/CRUNCH-668
> Project: Crunch
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.15.0
>Reporter: Clément MATHIEU
>Assignee: Josh Wills
>Priority: Major
> Attachments: 
> 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil-v2.patch, 
> 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil.patch
>
>
> GenericData based overloads of {{From.avroFile}} throws a RuntimeException 
> when a globbing pattern is provided. I see no reason to not support globbing 
> patterns here as it works fine with {{textFile}} and SpecificData based 
> overloads.
> The issue is that the code extracting Avro schema from the first file use 
> {{listStatus}} rather than {{globStatus}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CRUNCH-668) From.avroFile do not support globbing patterns (GenericData based overloads)

2018-03-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CRUNCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418544#comment-16418544
 ] 

Clément MATHIEU commented on CRUNCH-668:


Hi Josh. Good catch, I forgot to run IT tests. I will update the patch.

> From.avroFile do not support globbing patterns (GenericData based overloads)
> 
>
> Key: CRUNCH-668
> URL: https://issues.apache.org/jira/browse/CRUNCH-668
> Project: Crunch
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.15.0
>Reporter: Clément MATHIEU
>Assignee: Josh Wills
>Priority: Major
> Attachments: 
> 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil.patch
>
>
> GenericData based overloads of {{From.avroFile}} throws a RuntimeException 
> when a globbing pattern is provided. I see no reason to not support globbing 
> patterns here as it works fine with {{textFile}} and SpecificData based 
> overloads.
> The issue is that the code extracting Avro schema from the first file use 
> {{listStatus}} rather than {{globStatus}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CRUNCH-668) From.avroFile do not support globbing patterns (GenericData based overloads)

2018-03-28 Thread Josh Wills (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418475#comment-16418475
 ] 

Josh Wills commented on CRUNCH-668:
---

Hey [~clem...@unportant.info]! Thanks for this; I applied it to my master 
branch and got a few failures in tests in crunch-core and was wondering if you 
saw the same thing:

 

Tests in error: 

  AvroParquetFileSourceTargetIT.testCustomReadSchemaGeneric_FieldSuperset:258 » 
Runtime

  AvroParquetFileSourceTargetIT.testCustomReadSchemaWithProjection:297 » Runtime

  AvroParquetFileSourceTargetIT.testCustomReadSchema_FieldSubset:221 » Runtime 
E...

> From.avroFile do not support globbing patterns (GenericData based overloads)
> 
>
> Key: CRUNCH-668
> URL: https://issues.apache.org/jira/browse/CRUNCH-668
> Project: Crunch
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.15.0
>Reporter: Clément MATHIEU
>Assignee: Josh Wills
>Priority: Major
> Attachments: 
> 0001-CRUNCH-668-Support-globbing-patterns-in-From-avroFil.patch
>
>
> GenericData based overloads of {{From.avroFile}} throws a RuntimeException 
> when a globbing pattern is provided. I see no reason to not support globbing 
> patterns here as it works fine with {{textFile}} and SpecificData based 
> overloads.
> The issue is that the code extracting Avro schema from the first file use 
> {{listStatus}} rather than {{globStatus}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)