[jira] Updated: (PIG-252) Allow multiple paths in the load statement

Daniel Dai (JIRA) Tue, 01 Jul 2008 11:45:36 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daniel Dai updated PIG-252:
---------------------------

    Attachment: localglobbing.patch

Pig will use hadoop default mode as its local mode execution engine. There 
should be no difference to support globbing in both local mode and mapreduce 
mode. Pig will pass unfiltered globbing string to hadoop 
("org.apache.hadoop.fs.FileSystem.globPaths"). So once 
[HADOOP-3498|https://issues.apache.org/jira/browse/HADOOP-3498] is fixed, pig 
should automatically benefit from it. The only thing is currently there is 
still some code for file existence checking for local mode specificly. We need 
to clear this out. I attached a patch for reference (target branches/types).

> Allow multiple paths in the load statement
> ------------------------------------------
>
>                 Key: PIG-252
>                 URL: https://issues.apache.org/jira/browse/PIG-252
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>         Attachments: localglobbing.patch
>
>
> From Tom White:
> I;m having a problem loading data from multiple paths in Pig. What I'm trying 
> to do is to load data from a range of dates, so I would like to specify an 
> input of two globbed paths:
> x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'
> Pig doesn't seem to like this though as it's trying to interpret it as a 
> single path. The best I can do it to use UNION:
> x1 = LOAD '2008/05/{26,27,28,29,30,31}'
> x2 = LOAD '2008/06/{1,2}'
> x = UNION x1, x2
> The downside to this is that I want to parameterize my paths, and having 
> separate script for each number of paths in the input is cumbersome.
> Is there a better way of doing this? Are there any plans to support multiple 
> paths, and/or PathFilters?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-252) Allow multiple paths in the load statement

Reply via email to