In order to facilitate more robust loading, I have 2 questions. 1) I know that you can use some wildcards in loading... for example, if you have 2 files, dog1.txt and dog2.txt, you can load dog*.txt and it will load more. Is there any way to use regular expressions or anything more powerful in the actual load? For example, if I want to load 10 different files with a generally similar name structure but identically structured data, what's the easiest and fastest way to load them all into the same table? 2) Can you filter as you load? If you do a load then a filter right after that, it seems wasteful (unless pig/hadoop are smart enough to realize that it doesn't have to load all the data off the bat)
I appreciate your help Jon
