Actually, I take it back, as it is not what I want. Given a line like: 127.0.0.1 - - [24/05/2010:01:19:22 +0000] "GET /solr/select?q=import statement&start=1(MORE HERE) HTTP/1.1" 200 37571
I want to be able to grab the q parameter (ie. "import statement") and put that as my item for that document. It needs to split the value within &q as well, so it should be "import" and "statement". I guess I was proposing that instead of splitting on the parameter, I would be only including those items that match a capturing group (all by default, but also optionally passed in) On May 27, 2010, at 5:25 PM, Grant Ingersoll wrote: > Cool, glad I asked. It's almost what I want and good enough for now. > However, what if I have multiple matching groups in my regex? I was thinking > it would be nice to take in a list of the matching groups to include and then > iterate over them and append by the separator. > > On May 27, 2010, at 5:14 PM, Robin Anil wrote: > >> fpg uses regex to split. Just add another option for using the regex >> to match instead of splitting. Less work I guess >> >> >> >> On Fri, May 28, 2010 at 2:42 AM, Grant Ingersoll <[email protected]> wrote: >>> I'd like to take a bunch of logs and extract a bit of each line and then >>> put them into format for FPG. Was thinking a simple M/R job that took in a >>> regex would suffice and then output in the format for FPG. Is that >>> generally useful or am I missing something obvious? I want to do FPG on my >>> query logs and it seems like a generally useful conversion. I suppose, in >>> fact, it isn't even log specific. >>> >>> -Grant >
