Re: load union store

Dmitriy Ryaboy Thu, 27 Jan 2011 17:58:03 -0800

Kris,
As logs accumulate over time the union will get slow since you have to read
all the data off disk and write it back to disk.


Why not just have a hierarchy in your cleaned log directory? You can do
something like
define newdir `date +%s`

store newclean into 'cleaned_files/$newdir/'


then to load all logs you can just load 'cleaned_files'

you can also format the date output differently and wind up with your
cleaned files nicely organized by year/month/day/hour/ ...

D

On Thu, Jan 27, 2011 at 4:40 PM, Kris Coward <[email protected]> wrote:

> Hi all,
>
> I'm writing a bit of code to grab some logfiles, parse them, and run some
> sanity checks on them (before subjecting them to further analysis).
> Naturally, logfiles being logfiles, they accumulate, and I was wondering
> how efficiently pig would handle a request to add recently accumulated
> log data to a bit of logfile that's already been started.
>
> In particular, two approaches that I'm contemplating are
>
> raw = LOAD 'logfile' ...
> -- snipped parsing/cleaning steps producing a relation with alias
> "cleanfile"
> oldclean = LOAD 'existing_log';
> newclean = UNION oldclean, cleanfile;
> STORE newclean INTO 'tmp_log';
> rm existing_log;
> mv tmp_log existing_log;
>
> ...ALTERNATELY...
>
> raw = LOAD 'logfile' ...
> -- snipped parsing/cleaning steps producing a relation with alias
> "cleanfile"
> STORE cleanfile INTO 'tmp_log';
>
> followed by renumbering all the part files in tmp_log and copying them
> to existing_log.
>
> Is pig clever enough to handle the first set of instructions reasonably
> efficiently (and if not, are there any gotchas I'd have to watch out for
> with the second approach, e.g. a catalogue file that'd have to be edited
> when the new parts are added).
>
> Thanks,
> Kris
>
> --
> Kris Coward                                     http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
>

Re: load union store

Reply via email to