Problem with my pig script

2013-10-29 Thread Sameer Tilak
Hello Pig experts, I have the following simple script. For simplicity, I have replaced my UDF with this dummy UDF that shows the problem that I am having. UDF TupleTest generates a tuple in the following manner: boolean randomboolean = rngen.nextBoolean(); if(randomboolean)

Re: count distinct on multiple columns

2013-10-29 Thread Pradeep Gollakota
Great question. There seems to be some confusion about how DISTINCT operates. I remembered (and thankfully found) this message that explains the behavior. A

Re: ORDER BY a map value fails with a syntax error - pig bug?

2013-10-29 Thread Ruslan Al-Fakikh
Thanks, William! On Tue, Oct 29, 2013 at 10:41 PM, wrote: > http://pig.apache.org/docs/r0.12.0/basic.html#order-by says > "Pig currently supports ordering on fields with simple types or by > tuple designator (*). You cannot order on fields with complex types or by > expressions." > > I t

RE: ORDER BY a map value fails with a syntax error - pig bug?

2013-10-29 Thread william.dowling
http://pig.apache.org/docs/r0.12.0/basic.html#order-by says "Pig currently supports ordering on fields with simple types or by tuple designator (*). You cannot order on fields with complex types or by expressions." I think "you cannot order ... by expressions" means the behavior you see

ORDER BY a map value fails with a syntax error - pig bug?

2013-10-29 Thread Ruslan Al-Fakikh
Hi guys, The following script: A = LOAD 'input' AS (M:map []); sorted = ORDER A BY M#'key1'; dump sorted; gives: 2013-10-29 14:31:03,611 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Syntax error, unexpected symbol at or near 'M' While this one: LOAD 'input' AS (M:map []); name

Error 1066 Limit data on pig 12!

2013-10-29 Thread Aida Mashkouri Najafi
Hi all, On pig 12, My script dumps data completely without any error when there is no limit set. Adding the limit to my script I keep getting the ERROR 1066: Unable to open iterator for alias idb2. Could you please help me? This is my script : idb1 = FILTER idb BY (pixel_id == 252 and user_id !=

count distinct on multiple columns

2013-10-29 Thread Min Zhou
Hi all, Below script is how we count distinct on columns jid and mid sjv = LOAD '/path/of/the/data' USING AvroStorage(); jv = FOREACH sjv GENERATE TOTUPLE(jid, mid) AS jid_mid, time; groupv = GROUP jv ALL; countv = FOREACH groupv { unique = DISTINCT jv.jid_mid; GENERATE COUNT(uni