Can you please tell us how you make the UDF call in your script and execute
this on the cluster? Also, have you tried the script in local mode (pig -x
local) ?

Do the logs say anything?

-Prashant

On Sat, Mar 3, 2012 at 11:44 PM, Manu <[email protected]> wrote:

>
>
> Hi
>
>
>
> I wrote the following Eval function which should return an integer
> according to the distance of two words (word1, word2) inside a list of
> words (Groupedwords).
>
>
>
> The input to the function is:
> 1.  Groupedwords (a bag of words)
> 2.  Word1
> 3.  Word2
>
> 4. Distance (integer)
>
>
>
> I created a test with the following input and the result was 1 as
> expected but when I run the same code on the cluster I got 0. On the
> cluster I run many different inputs. Some should return 0 and some
> should return 1. The problem is that on the cluster ALL the inputs
> returned 0.
>
>
>
> HOW CAN THAT HAPPEN?
>
>
>
> InputDescription:
> GroupedWords::wordvalue: {wordvalue: chararray},
>
> wordPairs::words::w1: chararray,wordPairs::words::w2: chararray,
>
> vectorLength: int
>
>
>
> InputValues:{(crowd),(game),(football),(money),(football),(ball),(tennis
> ),(pool),(hall),(team),(manager),(olympics)},crowd,game,10)
>
>
>
> The code:
>
>
>
> public class AssociationEvaluator extends EvalFunc<Integer>{
>
>
>
>       private static int vectorLength = 10; // should be read from
> configuration
>
>
>
>       @Override
>
>       public Integer exec(Tuple input) throws IOException {
>
>
>
>              int result = 0;
>
>
>
>              if (input == null || input.size() == 0)
>
>            return 0;
>
>        try{
>
>
>
>               //1. type validation
>
>            Object tmp = input.get(0);
>
>            if (!(tmp instanceof DataBag)) {
>
>                throw new IOException("Expected input to be a bag of
> words, but  got " + tmp.getClass().getName());
>
>            }
>
>            DataBag words = (DataBag)input.get(0);
>
>            String word1 = (String)input.get(1);
>
>            String word2 = (String)input.get(2);
>
>            vectorLength = (int)input.get(3);
>
>
>
>            //2. convert the bag to an array
>
>            int index=0;
>
>            String[] wordsarray = new String[(int)words.size()];
>
>            for (Iterator<Tuple> iterator = words.iterator();
> iterator.hasNext();) {
>
>                           Tuple word = iterator.next(); //word =
> {type:string, value:string}
>
>                           wordsarray[index] = word.get(0).toString();
>
>                           index++;
>
>
>
>            }
>
>
>
>            //run on the array and check for associations
>
>            for (int i = 0; i < wordsarray.length; i++) {
>
>              if(wordsarray[i]==null)
>
>                     break;
>
>
>
>              if (wordsarray[i] == word1)
>
>                           {
>
>                                  int searchBound = (int) words.size();
>
>                                  if (i + vectorLength <= words.size())
>
>
>                                         searchBound = i + vectorLength;
>
>                                  for (int j = i; j < searchBound; j++)
>
>                        {
>
>                                         if (wordsarray[j] == word2)
>
>                                                result++;
>
>
>                        }
>
>
>
>                           }
>
>                     }
>
>
>
>            return result;
>
>
>
>        }catch(Exception e){
>
>              throw new IOException(e);
>
>        }
>
>
>
>       }
>
>
>
> The test:
>
>
>
> @Test
>
>       public void testExecTuple() {
>
>
>
>              Integer result;
>
>
>
>              TupleFactory mTupleFactory = TupleFactory.getInstance();
>
>              BagFactory mBagFactory = BagFactory.getInstance();
>
>              Tuple input = mTupleFactory.newTuple(4);
>
>
>
>              DataBag wordvalue = mBagFactory.newDefaultBag();
>
>              wordvalue.add(mTupleFactory.newTuple("crowd"));
>
>              wordvalue.add(mTupleFactory.newTuple("game"));
>
>              wordvalue.add(mTupleFactory.newTuple("football"));
>
>              wordvalue.add(mTupleFactory.newTuple("money"));
>
>              wordvalue.add(mTupleFactory.newTuple("football"));
>
>              wordvalue.add(mTupleFactory.newTuple("ball"));
>
>              wordvalue.add(mTupleFactory.newTuple("tennis"));
>
>              wordvalue.add(mTupleFactory.newTuple("pool"));
>
>              wordvalue.add(mTupleFactory.newTuple("hall"));
>
>              wordvalue.add(mTupleFactory.newTuple("team"));
>
>              wordvalue.add(mTupleFactory.newTuple("manager"));
>
>              wordvalue.add(mTupleFactory.newTuple("olympics"));
>
>
>
>              try {
>
>                     input.set(0, wordvalue);
>
>                     input.set(1,"crowd");
>
>                     input.set(2, "game");
>
>                     input.set(3, 10);
>
>              } catch (ExecException e1) {
>
>                     // TODO Auto-generated catch block
>
>                     e1.printStackTrace();
>
>              }
>
>
>
>
>
>              AssociationEvaluator eval = new AssociationEvaluator();
>
>
>
>
>              try {
>
>                     result = eval.exec(input);
>
>                     assertNotSame(0, result);
>
>                  }
>
>              catch (IOException e) {
>
>                     // TODO Auto-generated catch block
>
>                     e.printStackTrace();
>
>              }
>
>
>
>
>
>       }
>
>
>
>
>
>
>
>
>
>
>
> Manu Cohen-Yashar
>
> Senior Architect, Cloud Computing and Application Security
>
> Sela Group
>
>
>
> Phone: 972-4-9881203
>
> Mobile: 972-52-5574551
>
>
>
>

Reply via email to