Actually, I take it back, as it is not what I want.

Given a line like:
127.0.0.1 -  -  [24/05/2010:01:19:22 +0000] "GET /solr/select?q=import 
statement&start=1(MORE HERE) HTTP/1.1" 200 37571

I want to be able to grab the q parameter (ie. "import statement") and put that 
as my item for that document.  It needs to split the value within &q as well, 
so it should be "import" and "statement".

I guess I was proposing that instead of splitting on the parameter, I would be 
only including those items that match a capturing group (all by default, but 
also optionally passed in)

On May 27, 2010, at 5:25 PM, Grant Ingersoll wrote:

> Cool, glad I asked.  It's almost what I want and good enough for now.  
> However, what if I have multiple matching groups in my regex?  I was thinking 
> it would be nice to take in a list of the matching groups to include and then 
> iterate over them and append by the separator.
> 
> On May 27, 2010, at 5:14 PM, Robin Anil wrote:
> 
>> fpg uses regex to split. Just add another option for using the regex
>> to match instead of splitting. Less work I guess
>> 
>> 
>> 
>> On Fri, May 28, 2010 at 2:42 AM, Grant Ingersoll <[email protected]> wrote:
>>> I'd like to take a bunch of logs and extract a bit of each line and then 
>>> put them into format for FPG.  Was thinking a simple M/R job that took in a 
>>> regex would suffice and then output in the format for FPG.  Is that 
>>> generally useful or am I missing something obvious?  I want to do FPG on my 
>>> query logs and it seems like a generally useful conversion.  I suppose, in 
>>> fact, it isn't even log specific.
>>> 
>>> -Grant
> 


Reply via email to