Thanks Ankur, I just opened a jira:
https://issues.apache.org/jira/browse/PIG-1174

On Thu, Dec 24, 2009 at 2:18 AM, Ankur C. Goel <gan...@yahoo-inc.com> wrote:

>  (Coming in late on this)
>
> Bill,
>      Please feel  free to open a JIRA to report the issue. The problem is
> that by the time MultiStorage gets into action, pig already creates the
> output file based on the path (after INTO) and assumes that UDF will start
> writing to it. Ideally the decision to create files/dir should be left
> completely to the custom STORE UDF.  Also MultiStorage should take care of
> getting output path from UDF context so that user does not need to pass it
> again. Lastly passing field name instead of field number to use as dynamic
> key would be clearer in the script.
>
> I am hoping the changes would be a lot easier to do after the current
> Load/Store redesign is implemented.
>
> -...@nkur
>
>
>  12/16/09 10:24 PM, "Bill Graham" <billgra...@gmail.com> wrote:
>
> Thanks Dmitriy, this is exactly what I need.
>
> There was one bug I ran into though FYI, which is when making a request
> like
> this, as documented in the JavaDocs:
>
> STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0',
> 'none', '\t');
>
> Pig would create a file '/my/home/output' and then an exception would be
> thrown when MultiStorage tried to make a directory under '/my/home/output'.
> The workaround that worked for me was to instead specify a dummy location
> as
> the first path like so:
>
> STORE A INTO '/my/home/output/temp' USING
> MultiStorage('/my/home/output','0', 'none', '\t');
>
>
> On Tue, Dec 15, 2009 at 1:06 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> wrote:
>
> > Bill,
> > A custom storefunc should do the trick. See
> > https://issues.apache.org/jira/browse/PIG-958  (aka
> > piggybank.storage.MultiStorage) for a jumping-off point.
> >
> > -D
> >
> > On Tue, Dec 15, 2009 at 1:59 PM, Bill Graham <billgra...@gmail.com>
> wrote:
> > > Hi,
> > >
> > > I'm pretty sure the answer to my question is no, but I have to ask. Is
> it
> > > possible within Pig to store different groups of data into different
> > output
> > > files where the grouping is dynamic (i.e. not known ahead of time)?
> > Here's
> > > what I'm trying to do...
> > >
> > > I've got a script that reads log files of URLs and generates counts for
> a
> > > given time period. The urls might have a 'tag' querystring param
> though,
> > and
> > > in that case I want to get the most popular urls for each tag output to
> > it's
> > > own file.
> > >
> > > My data looks like this and is ordered by tag asc, count desc:
> > >
> > > [tag] [timeinterval] [url] [count]
> > >
> > > I need to do something like so:
> > >
> > > for each tag group found
> > >  store all records in file foo_[tag].txt
> > >
> > > I ultimately need these files on local disk and I'm looking for a
> better
> > way
> > > to do so than generating a file of N unique tags in HDFS, reading it
> from
> > > Java, submitting N jobs with the tag name substituted into a script
> file,
> > > followed by N copyToLocal calls.
> > >
> > > At least two possible solutions come to mind, but am curious if there's
> > > another that I'm overlooking:
> > > 1. In java submit pig dynamic commands to an instance of PigServer. I'd
> > > still need a unique tag file for this case.
> > > 2. Maybe with a custom store function??
> > >
> > > thanks,
> > > Bill
> > >
> >
>
>

Reply via email to