Let's say I have my input data from the past 12 months organized into subdirs
by date:
/data/2012-06-10
/data/2012-06-11
...
/data/2013-06-09
And now say that I want to run a Pig script to process data from a range of
dates within the last 12 months, say 2012-11-07 through 2013-05-26. The regex
that I could specify for this date range is going to get quite complicated.
Is there a way that I can get my Pig script to load data from such a range
without a regex?
I could load all the data in /data/*, and then FILTER by the date field in each
record, but this is not desirable if the range of dates is small compared to
the entire dataset.