[
https://issues.apache.org/jira/browse/PIG-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-252:
---------------------------
Attachment: localglobbing.patch
Pig will use hadoop default mode as its local mode execution engine. There
should be no difference to support globbing in both local mode and mapreduce
mode. Pig will pass unfiltered globbing string to hadoop
("org.apache.hadoop.fs.FileSystem.globPaths"). So once
[HADOOP-3498|https://issues.apache.org/jira/browse/HADOOP-3498] is fixed, pig
should automatically benefit from it. The only thing is currently there is
still some code for file existence checking for local mode specificly. We need
to clear this out. I attached a patch for reference (target branches/types).
> Allow multiple paths in the load statement
> ------------------------------------------
>
> Key: PIG-252
> URL: https://issues.apache.org/jira/browse/PIG-252
> Project: Pig
> Issue Type: Improvement
> Reporter: Olga Natkovich
> Attachments: localglobbing.patch
>
>
> From Tom White:
> I;m having a problem loading data from multiple paths in Pig. What I'm trying
> to do is to load data from a range of dates, so I would like to specify an
> input of two globbed paths:
> x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'
> Pig doesn't seem to like this though as it's trying to interpret it as a
> single path. The best I can do it to use UNION:
> x1 = LOAD '2008/05/{26,27,28,29,30,31}'
> x2 = LOAD '2008/06/{1,2}'
> x = UNION x1, x2
> The downside to this is that I want to parameterize my paths, and having
> separate script for each number of paths in the input is cumbersome.
> Is there a better way of doing this? Are there any plans to support multiple
> paths, and/or PathFilters?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.