You basically need to create a new loader function in java that extends the RegExLoader class and then override the getPattern method and it will return the regex Pattern object/string that you want to use. The CommonLogLoader function in Piggybank provides an example of how to do this. The source code for that class is below..
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog/CommonLogLoader.java You may also be able to just use the REGEX_EXTRACT_ALL function that is built into Pig with the TextLoader function to do this. http://pig.apache.org/docs/r0.11.0/func.html#regex-extract-all ------------------------------ Brent Black Sr. Data Warehouse Engineer ------------------------------ Demand Media 5808 Lake Washington Blvd. Suite 300 Kirkland, WA. 98034 www.demandmedia.com [email protected] 425-298-2376 --------------------------------- -----Original Message----- From: Terry Healy [mailto:[email protected]] Sent: Wednesday, March 06, 2013 2:07 PM To: [email protected] Subject: Using CommonLogLoader for Bluecoat logs Hello- Using Pig Version 0.10.0 I'm trying to import BlueCoat proxy logs using Pig, and having difficulty with field delimiters (Multiple spaces, spaces embedded withing browser description string, etc.) This seems like it would be a common Pig application. I found an old example using ...piggybank.evaluation.string.EXTRACT that seemed to fit the bill, but alas, EXTRACT has been deprecated. Extract allowed REGEX like parsing, so I checked out ...piggybank.storage.RegExLoader, and then ...piggybank.storage.apachelog.CommonLogLoader. I am unable to translate what the Usage comments mean. Basically, I don't see where I set the REGEX pattern string for my needs. Can anyone explain the usage of either of these classes or point me to an example? Thanks, Terry Please NOTE: This electronic message, including any attachments, may include privileged, confidential and/or inside information owned by Demand Media, Inc. Any distribution or use of this communication by anyone other than the intended recipient(s) is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify the sender by replying to this message and then delete it from your system. Thank you.
