Can you please tell us how you make the UDF call in your script and execute this on the cluster? Also, have you tried the script in local mode (pig -x local) ?
Do the logs say anything? -Prashant On Sat, Mar 3, 2012 at 11:44 PM, Manu <[email protected]> wrote: > > > Hi > > > > I wrote the following Eval function which should return an integer > according to the distance of two words (word1, word2) inside a list of > words (Groupedwords). > > > > The input to the function is: > 1. Groupedwords (a bag of words) > 2. Word1 > 3. Word2 > > 4. Distance (integer) > > > > I created a test with the following input and the result was 1 as > expected but when I run the same code on the cluster I got 0. On the > cluster I run many different inputs. Some should return 0 and some > should return 1. The problem is that on the cluster ALL the inputs > returned 0. > > > > HOW CAN THAT HAPPEN? > > > > InputDescription: > GroupedWords::wordvalue: {wordvalue: chararray}, > > wordPairs::words::w1: chararray,wordPairs::words::w2: chararray, > > vectorLength: int > > > > InputValues:{(crowd),(game),(football),(money),(football),(ball),(tennis > ),(pool),(hall),(team),(manager),(olympics)},crowd,game,10) > > > > The code: > > > > public class AssociationEvaluator extends EvalFunc<Integer>{ > > > > private static int vectorLength = 10; // should be read from > configuration > > > > @Override > > public Integer exec(Tuple input) throws IOException { > > > > int result = 0; > > > > if (input == null || input.size() == 0) > > return 0; > > try{ > > > > //1. type validation > > Object tmp = input.get(0); > > if (!(tmp instanceof DataBag)) { > > throw new IOException("Expected input to be a bag of > words, but got " + tmp.getClass().getName()); > > } > > DataBag words = (DataBag)input.get(0); > > String word1 = (String)input.get(1); > > String word2 = (String)input.get(2); > > vectorLength = (int)input.get(3); > > > > //2. convert the bag to an array > > int index=0; > > String[] wordsarray = new String[(int)words.size()]; > > for (Iterator<Tuple> iterator = words.iterator(); > iterator.hasNext();) { > > Tuple word = iterator.next(); //word = > {type:string, value:string} > > wordsarray[index] = word.get(0).toString(); > > index++; > > > > } > > > > //run on the array and check for associations > > for (int i = 0; i < wordsarray.length; i++) { > > if(wordsarray[i]==null) > > break; > > > > if (wordsarray[i] == word1) > > { > > int searchBound = (int) words.size(); > > if (i + vectorLength <= words.size()) > > > searchBound = i + vectorLength; > > for (int j = i; j < searchBound; j++) > > { > > if (wordsarray[j] == word2) > > result++; > > > } > > > > } > > } > > > > return result; > > > > }catch(Exception e){ > > throw new IOException(e); > > } > > > > } > > > > The test: > > > > @Test > > public void testExecTuple() { > > > > Integer result; > > > > TupleFactory mTupleFactory = TupleFactory.getInstance(); > > BagFactory mBagFactory = BagFactory.getInstance(); > > Tuple input = mTupleFactory.newTuple(4); > > > > DataBag wordvalue = mBagFactory.newDefaultBag(); > > wordvalue.add(mTupleFactory.newTuple("crowd")); > > wordvalue.add(mTupleFactory.newTuple("game")); > > wordvalue.add(mTupleFactory.newTuple("football")); > > wordvalue.add(mTupleFactory.newTuple("money")); > > wordvalue.add(mTupleFactory.newTuple("football")); > > wordvalue.add(mTupleFactory.newTuple("ball")); > > wordvalue.add(mTupleFactory.newTuple("tennis")); > > wordvalue.add(mTupleFactory.newTuple("pool")); > > wordvalue.add(mTupleFactory.newTuple("hall")); > > wordvalue.add(mTupleFactory.newTuple("team")); > > wordvalue.add(mTupleFactory.newTuple("manager")); > > wordvalue.add(mTupleFactory.newTuple("olympics")); > > > > try { > > input.set(0, wordvalue); > > input.set(1,"crowd"); > > input.set(2, "game"); > > input.set(3, 10); > > } catch (ExecException e1) { > > // TODO Auto-generated catch block > > e1.printStackTrace(); > > } > > > > > > AssociationEvaluator eval = new AssociationEvaluator(); > > > > > try { > > result = eval.exec(input); > > assertNotSame(0, result); > > } > > catch (IOException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > } > > > > > > } > > > > > > > > > > > > Manu Cohen-Yashar > > Senior Architect, Cloud Computing and Application Security > > Sela Group > > > > Phone: 972-4-9881203 > > Mobile: 972-52-5574551 > > > >
