You basically need to create a new loader function in java that extends the 
RegExLoader class and then override the getPattern method and it will return 
the regex Pattern object/string that you want to use.  The CommonLogLoader 
function in Piggybank provides an example of how to do this.  The source code 
for that class is below..

http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog/CommonLogLoader.java


You may also be able to just use the REGEX_EXTRACT_ALL function that is built 
into Pig with the TextLoader function to do this.

http://pig.apache.org/docs/r0.11.0/func.html#regex-extract-all


------------------------------
Brent Black
Sr. Data Warehouse Engineer
------------------------------
Demand Media
5808 Lake Washington Blvd.
Suite 300
Kirkland, WA. 98034
www.demandmedia.com
[email protected]
425-298-2376
---------------------------------


-----Original Message-----
From: Terry Healy [mailto:[email protected]]
Sent: Wednesday, March 06, 2013 2:07 PM
To: [email protected]
Subject: Using CommonLogLoader for Bluecoat logs


Hello-

Using Pig Version  0.10.0

I'm trying to import BlueCoat proxy logs using Pig, and having difficulty with 
field delimiters (Multiple spaces, spaces embedded withing browser description 
string, etc.) This seems like it would be a common Pig application.

I found an old example using ...piggybank.evaluation.string.EXTRACT that seemed 
to fit the bill, but alas, EXTRACT has been deprecated.

Extract allowed REGEX like parsing, so I checked out 
...piggybank.storage.RegExLoader, and then 
...piggybank.storage.apachelog.CommonLogLoader.

I am unable to translate what the Usage comments mean. Basically, I don't see 
where I set the REGEX pattern string for my needs.

Can anyone explain the usage of either of these classes or point me to an 
example?

Thanks,

Terry


Please NOTE: This electronic message, including any attachments, may include 
privileged, confidential and/or inside information owned by Demand Media, Inc. 
Any distribution or use of this communication by anyone other than the intended 
recipient(s) is strictly prohibited and may be unlawful.  If you are not the 
intended recipient, please notify the sender by replying to this message and 
then delete it from your system. Thank you.

Reply via email to