[ 
https://issues.apache.org/jira/browse/PIG-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-5018.
-------------------------------------
    Resolution: Invalid

> Mohan.V
> -------
>
>                 Key: PIG-5018
>                 URL: https://issues.apache.org/jira/browse/PIG-5018
>             Project: Pig
>          Issue Type: Bug
>            Reporter: mohan
>
> I am trying to write Hadoop Pig script which will take 2 files and filter 
> based on string i.e
> words.txt
> google 
> facebook 
> twitter 
> linkedin
> tweets.json
> {"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook 
> about whether the americans wins a Ryder cup. I mean surely he has slightly 
> more important matters. #fami ...", "user_id": 450990391, "id": 
> 252479809098223616, "created_date": "Sun Sep 30 2012"}
> SCRIPT
> twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, 
> text:chararray, user_id:chararray, id:chararray, created_date:chararray');
>     filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
>     extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, 
> created_time, created_date, text;
>     final = GROUP extracted BY pattern;
>     dump final;
> OUTPUT
> (facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 
> 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a 
> Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
> the output that im getting is, without loading the words.txt file i.e by 
> filtering the tweet directly.
> I need to get the output as
> (facebook)(complete tweet of that facebook word contained)
> i.e it should read the words.txt and as words are reading according to that 
> it should get all the tweets from tweets.json file
> Any help
> Mohan.V



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to