Trouble with REGEX in PIG

2013-12-04 Thread Watrous, Daniel
Hi, I'm trying to use regular expressions in PIG, but it's failing. Based on the documentation http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying this: [watrous@c0003913 ~]$ pig -x local which: no hadoop in

Re: Trouble with REGEX in PIG

2013-12-04 Thread Ankit Bhatnagar
R u planning to use org.apache.pig.builtin.REGEX_EXTRACT ? On 12/4/13 9:28 AM, Watrous, Daniel daniel.t.watr...@hp.com wrote: Hi, I'm trying to use regular expressions in PIG, but it's failing. Based on the documentation http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying

Re: Trouble with REGEX in PIG

2013-12-04 Thread Pradeep Gollakota
It's not valid PigLatin... The Grunt shell doesn't let you try out functions and UDFs are you're trying to use them. A = LOAD 'data' USING PigStorage() as (ip: chararray); B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1); DUMP B; You always have to load a dataset and work

RE: Trouble with REGEX in PIG

2013-12-04 Thread Watrous, Daniel
That's what I was trying first, but then I tried defining it too. -Original Message- From: Ankit Bhatnagar [mailto:ank...@yahoo-inc.com] Sent: Wednesday, December 04, 2013 11:15 AM To: user@pig.apache.org; Watrous, Daniel Subject: Re: Trouble with REGEX in PIG R u planning to use

RE: Trouble with REGEX in PIG

2013-12-04 Thread Watrous, Daniel
Pradeep, Does the documentation here need to be updated: http://pig.apache.org/docs/r0.12.0/func.html#regex-extract It suggests that the function can run against a string and should return the expected value. I did confirm that I can use REGEX_EXTRACT on values loaded from a file. Thank

CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Russell Jurney
I have this bug that is killing me, where I can't self-join/cross a dataset with itself. Its blocking my work :( The script is like this: businesses = LOAD 'yelp_phoenix_academic_dataset/yelp_academic_dataset_business.json' using com.twitter.elephantbird.pig.load.JsonLoader() as json:map[]; /*

Re: CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Russell Jurney
There was a bug in the script on the 2nd to last line. Fixed it, still have same issue. I found a workaround: if I store the CROSSED relation immediately after the CROSS, then load it... it works. Something about resetting the plan. This is a bug. I'll file a JIRA. On Wed, Dec 4, 2013 at 1:21

Re: CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Pradeep Gollakota
I tried to following script (not exactly the same) and it worked correctly for me. businesses = LOAD 'dataset' using PigStorage(',') AS (a, b, c, business_id: chararray, lat: double, lng: double); locations = FOREACH businesses GENERATE business_id, lat, lng; STORE locations INTO 'locations.tsv';

weird classpath problem

2013-12-04 Thread Yigitbasi, Nezih
Hi everyone, I am having some weird classpath issues with a UDF that returns a custom tuple. My custom tuple has an arraylist of custom objects. It looks like: class MyTuple private ArrayListMyClass list; When the UDF is called, everything works fine: the tuples are created and the UDF

Re: CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Russell Jurney
If you store immediately after the CROSS, it works. If you do another FOREACH/GENERATE, etc. it does not. On Wed, Dec 4, 2013 at 1:41 PM, Pradeep Gollakota pradeep...@gmail.comwrote: I tried to following script (not exactly the same) and it worked correctly for me. businesses = LOAD