Thanks for being thorough! It's indeed a bug, but backporting a fix may be
hard. The parser and logical plan changed a lot from .8-.9, so if at all
possible, I would try to use 0.10 (the last release). We use it in
production and it is stable, and has a lot of benefits over .8. I will wan
that the parser changed so if you have many existing jobs, it may be worth
running them on a test cluster with 0.10, but if you don't, defintely
better to make the jump now.

2012/5/23 Johannes Schwenk <johannes.schw...@adition.com>

> Hi Jonathan,
>
> thanks again for your help!
>
> I have cloned the current git head and created this pig script
> http://pastebin.com/Gc9C9ZPS
>
> TestCONTAINS-testFilteringCluster-input.txt contains
> http://pastebin.com/h5MC695F
>
> The adition.jar has been built against the cloudera cdh3u3 distribution
> and contains the filter function CONTAINS
> http://pastebin.com/Uwje7v1V
>
>
> Output from running my script with both versions of pig:
>
> pig 0.11.0-SNAPSHOT
> http://pastebin.com/Cr5CkHui
>
> => Correct results!!
>
>
> pig 0.8.1-cdh3u3
> http://pastebin.com/yXY17mXx
>
> => Incorrect results!!
>
>
> It seems like the new logical plan in pig 0.8.1 optimizes the OR
> operator away. So its a bug, right?
>
>
>
> Am 22.05.2012 21:26, schrieb Jonathan Coveney:
> > If this is a bug, it's an annoying one, so I definitely appreciate your
> > help in getting to the bottom of it. So let's get to the bottom of it :)
> >
> > First, I would clone the trunk version of pig and run the same tests
> > against it and compare. Always good to test any bugs against trunk to see
> > if it is version specific.
> >
> > Right off the bat, I would say that you should dump the files in your
> test
> > to a file, make a short script that does exactly what your test does, and
> > paste the EXPLAIN plan generated for your script (ideally in both your
> > version of pig and trunk). We should be able to see if there is something
> > weird going on.
> >
> > Let me know if you need any help with any of that. If it persists I'll
> try
> > and recreate on my end.
> >
> > 2012/5/22 Johannes Schwenk <johannes.schw...@adition.com>
> >
> >> Thank you for your quick suggestions!
> >>
> >> - I am now using local mode - good point!
> >> - I know of builtin matches, the CONTAINS filter was just to get into
> >> programming UDFS...
> >> - Whatever I do the problem persists. I tried:
> >>  * turning off all optimizations (-t All) : no effect
> >>  * reordering the statements : the outcome contains still only the
> >> matching tuples to the lhs of the OR
> >>  * using different data (just in case...) : no effect
> >>  * finally counted how many times the exec() function gets called
> >> processing the script... : exactly *six times* - each for every record!
> >>
> >> That last observation leads me to believe that this is a bug!? The exec
> >> function should be called at least *ten times* I think.
> >>
> >> Du you have any suggestions on how to verify this?
> >>
> >> Greetings
> >>
> >> Am 21.05.2012 19:11, schrieb Jonathan Coveney:
> >>> Not sure why it is failing... though I will mention two things. 1) you
> >>> should use local mode if possible, especially just to test UDFs :) 2)
> you
> >>> could use the builtin matches function to achieve this (ie matches
> >>> '.*keyword.*')
> >>>
> >>> Besides that it is odd indeed, and I'd have to dig in more.
> >>>
> >>> 2012/5/21 Johannes Schwenk <johannes.schw...@adition.com>
> >>>
> >>>> Hello List,
> >>>>
> >>>> I am using Clouderas distribution (cdh3u3) which comes with pig-0.8.1.
> >>>>
> >>>> I have written a UDF extending FilterFunc that checks if the provided
> >>>> string is contained within the specified column of the current tuple:
> >>>> http://pastebin.com/Uwje7v1V
> >>>>
> >>>> I have also written some TestCases:
> >>>> http://pastebin.com/uA4LHB4Q
> >>>>
> >>>> The odd thing is, that only TestCase testFilteringClusterWithOR1 fails
> >>>> because the result has not the expected length of 3 but is of length 2
> >>>> instead (line 177 in http://pastebin.com/Uwje7v1V). After a lot of
> >>>> investigating I still can not find out why testFilteringCluster and
> >>>> testFilteringClusterWithOR2 succeed but not
> testFilteringClusterWithOR1.
> >>>> Is there a special prerequisite for making my FilterFunc usabel within
> >>>> OR ? Maybe I have missed something very obvious... Please help me
> figure
> >>>> this out!
> >>>>
> >>>> Greetings,
> >>>> Johannes Schwenk
> >>>>
> >>>> --
> >>>> Softwareentwickler (Reporting)
> >>>> ________________________________________________________
> >>>>
> >>>> ADITION technologies AG
> >>>> Schwarzwaldstraße 78b
> >>>> 79117 Freiburg
> >>>>
> >>>> http://www.adition.com
> >>>>
> >>>> T +49 / (0)761 / 88147 - 30
> >>>> F +49 / (0)761 / 88147 - 77
> >>>> SUPPORT +49  / (0)1805 - ADITION
> >>>>
> >>>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
> >>>>
> >>>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> >>>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus
> >> Schlüter
> >>>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> >>>> UStIDNr.: DE 218 858 434
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> Johannes Schwenk
> >>
> >> --
> >> Softwareentwickler (Reporting)
> >> ________________________________________________________
> >>
> >> ADITION technologies AG
> >> Schwarzwaldstraße 78b
> >> 79117 Freiburg
> >>
> >> http://www.adition.com
> >>
> >> T +49 / (0)761 / 88147 - 30
> >> F +49 / (0)761 / 88147 - 77
> >> SUPPORT +49  / (0)1805 - ADITION
> >>
> >> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
> >>
> >> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> >> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus
> Schlüter
> >> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> >> UStIDNr.: DE 218 858 434
> >>
> >>
> >
>
>
>
> Johannes Schwenk
>
> --
> Softwareentwickler (Reporting)
> ________________________________________________________
>
> ADITION technologies AG
> Schwarzwaldstraße 78b
> 79117 Freiburg
>
> http://www.adition.com
>
> T +49 / (0)761 / 88147 - 30
> F +49 / (0)761 / 88147 - 77
> SUPPORT +49  / (0)1805 - ADITION
>
> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>
> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> UStIDNr.: DE 218 858 434
>
>

Reply via email to