Re: Nested foreach with order by

2014-02-28 Thread Anastasis Andronidis
I found the problem. I used some private variables in my class. I was thinking that in every tuple I'm getting, pig will create a new object of my class. But this not the case of course. Sorry for the inconvenience Anastasis On 28 Φεβ 2014, at 2:07 π.μ., Anastasis Andronidis

Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hello everyone, I have a foreach statement and inside of it, I use an order by. After the order by, I have a UDF. Example like this: logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader(); logs_g = GROUP logs BY (date, site, profile) PARALLEL 2; service_flavors = FOREACH logs_g {

Re: Nested foreach with order by

2014-02-27 Thread Pradeep Gollakota
Where exactly are you getting duplicates? I'm not sure I understand your question. Can you give an example please? On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis andronat_...@hotmail.com wrote: Hello everyone, I have a foreach statement and inside of it, I use an order by. After the

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Yes, of course, my output is like that: (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
BTW, is this some how related[1] ? [1]: http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Yes, of course, my output is like that:

Re: Nested foreach with order by

2014-02-27 Thread Pradeep Gollakota
No... that wouldn't be related since you're not doing a GROUP ALL. The `FLATTEN(MY_UDF(t))` has me a little weary. Something is possibly going wrong in your UDF. The output of your UDF is going to be a string that is some generic status right? My uneducated guess is that there's a bug in your

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hi again, I added this in my UDF: if(!((DataBag) input.get(0)).isSorted()) { throw new IOException(It's not sorted); } And the exception arises. Why? I don't understand it. I specified ORDER BY in the nested foreach. Thank you for helping me btw! On 28 Φεβ 2014, at 1:12

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
I also just found out that the bag from the nested order by is org.apache.pig.data.InternalCachedBag and not org.apache.pig.data.SortedDataBag should be like that? On 28 Φεβ 2014, at 1:51 π.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Hi again, I added this in my UDF: