sorry that wasn't a link. It's my input to the pig. Basically what's inside params.dat. When I run those 3 pig lines I get empty output. What I want is something like this:
http://abc.com/?a=v1&b=v2 broken down into a map and also be able to preserve abc.com. Otherwise if it's complex I can write UDFs On Mon, Jun 18, 2012 at 1:04 PM, Subir S <[email protected]> wrote: > I think link Mohit mentioned was his input. Not sure if i understood > correctly. > > I suspect something related to the schema. > > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema > > http://stackoverflow.com/a/8238591 > > So when you load with delimiter '&', what will happen to the first field? > and how will the second field automatically become a map...I mean in your > schema... you mention only one field...not two fields..URL&QUERY > > Thanks, Subir > > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <[email protected] > >wrote: > > > Your link does not work, I recommend using pastebin. > > > > 2012/6/18 Mohit Anchlia <[email protected]> > > > > > I am trying to parse URL using map type of pig. My query string is: > > > > > > https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123 > > > > > > My very simple script for testing is this. But when I look at the part > > file > > > it returns null. > > > > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS > > > (M:map[]); > > > > > > rmf '/examples/map/output/'; > > > > > > STORE B INTO '/examples/map/output/'; > > > > > > I am working on analyzing clickstream data. For this I need to first > > parse > > > these strings into files representing dimensions and also do > > sessionization > > > on them before loading it into RDBMS. > > > > > >
