Hi
I wrote the following Eval function which should return an integer
according to the distance of two words (word1, word2) inside a list of
words (Groupedwords).
The input to the function is:
1. Groupedwords (a bag of words)
2. Word1
3. Word2
4. Distance (integer)
I created a test with the following input and the result was 1 as
expected but when I run the same code on the cluster I got 0. On the
cluster I run many different inputs. Some should return 0 and some
should return 1. The problem is that on the cluster ALL the inputs
returned 0.
HOW CAN THAT HAPPEN?
InputDescription:
GroupedWords::wordvalue: {wordvalue: chararray},
wordPairs::words::w1: chararray,wordPairs::words::w2: chararray,
vectorLength: int
InputValues:{(crowd),(game),(football),(money),(football),(ball),(tennis
),(pool),(hall),(team),(manager),(olympics)},crowd,game,10)
The code:
public class AssociationEvaluator extends EvalFunc<Integer>{
private static int vectorLength = 10; // should be read from
configuration
@Override
public Integer exec(Tuple input) throws IOException {
int result = 0;
if (input == null || input.size() == 0)
return 0;
try{
//1. type validation
Object tmp = input.get(0);
if (!(tmp instanceof DataBag)) {
throw new IOException("Expected input to be a bag of
words, but got " + tmp.getClass().getName());
}
DataBag words = (DataBag)input.get(0);
String word1 = (String)input.get(1);
String word2 = (String)input.get(2);
vectorLength = (int)input.get(3);
//2. convert the bag to an array
int index=0;
String[] wordsarray = new String[(int)words.size()];
for (Iterator<Tuple> iterator = words.iterator();
iterator.hasNext();) {
Tuple word = iterator.next(); //word =
{type:string, value:string}
wordsarray[index] = word.get(0).toString();
index++;
}
//run on the array and check for associations
for (int i = 0; i < wordsarray.length; i++) {
if(wordsarray[i]==null)
break;
if (wordsarray[i] == word1)
{
int searchBound = (int) words.size();
if (i + vectorLength <= words.size())
searchBound = i + vectorLength;
for (int j = i; j < searchBound; j++)
{
if (wordsarray[j] == word2)
result++;
}
}
}
return result;
}catch(Exception e){
throw new IOException(e);
}
}
The test:
@Test
public void testExecTuple() {
Integer result;
TupleFactory mTupleFactory = TupleFactory.getInstance();
BagFactory mBagFactory = BagFactory.getInstance();
Tuple input = mTupleFactory.newTuple(4);
DataBag wordvalue = mBagFactory.newDefaultBag();
wordvalue.add(mTupleFactory.newTuple("crowd"));
wordvalue.add(mTupleFactory.newTuple("game"));
wordvalue.add(mTupleFactory.newTuple("football"));
wordvalue.add(mTupleFactory.newTuple("money"));
wordvalue.add(mTupleFactory.newTuple("football"));
wordvalue.add(mTupleFactory.newTuple("ball"));
wordvalue.add(mTupleFactory.newTuple("tennis"));
wordvalue.add(mTupleFactory.newTuple("pool"));
wordvalue.add(mTupleFactory.newTuple("hall"));
wordvalue.add(mTupleFactory.newTuple("team"));
wordvalue.add(mTupleFactory.newTuple("manager"));
wordvalue.add(mTupleFactory.newTuple("olympics"));
try {
input.set(0, wordvalue);
input.set(1,"crowd");
input.set(2, "game");
input.set(3, 10);
} catch (ExecException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
AssociationEvaluator eval = new AssociationEvaluator();
try {
result = eval.exec(input);
assertNotSame(0, result);
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Manu Cohen-Yashar
Senior Architect, Cloud Computing and Application Security
Sela Group
Phone: 972-4-9881203
Mobile: 972-52-5574551