Can you filter and load at the same time?

Jonathan Coveney Wed, 01 Dec 2010 07:58:21 -0800

In order to facilitate more robust loading, I have 2 questions.

1) I know that you can use some wildcards in loading... for example, if you
have 2 files, dog1.txt and dog2.txt, you can load dog*.txt and it will load
more. Is there any way to use regular expressions or anything more powerful
in the actual load? For example, if I want to load 10 different files with a
generally similar name structure but identically structured data, what's the
easiest and fastest way to load them all into the same table?
2) Can you filter as you load? If you do a load then a filter right after
that, it seems wasteful (unless pig/hadoop are smart enough to realize that
it doesn't have to load all the data off the bat)


I appreciate your help
Jon

Can you filter and load at the same time?

Reply via email to