Hi 

 

I wrote the following Eval function which should return an integer
according to the distance of two words (word1, word2) inside a list of
words (Groupedwords).

 

The input to the function is: 
1.  Groupedwords (a bag of words) 
2.  Word1 
3.  Word2

4. Distance (integer)

 

I created a test with the following input and the result was 1 as
expected but when I run the same code on the cluster I got 0. On the
cluster I run many different inputs. Some should return 0 and some
should return 1. The problem is that on the cluster ALL the inputs
returned 0.

 

HOW CAN THAT HAPPEN?  

 

InputDescription: 
GroupedWords::wordvalue: {wordvalue: chararray},

wordPairs::words::w1: chararray,wordPairs::words::w2: chararray,

vectorLength: int

 

InputValues:{(crowd),(game),(football),(money),(football),(ball),(tennis
),(pool),(hall),(team),(manager),(olympics)},crowd,game,10)

 

The code:

 

public class AssociationEvaluator extends EvalFunc<Integer>{

 

       private static int vectorLength = 10; // should be read from
configuration

       

       @Override

       public Integer exec(Tuple input) throws IOException {

              

              int result = 0;

              

              if (input == null || input.size() == 0)

            return 0;

        try{

              

               //1. type validation

            Object tmp = input.get(0);

            if (!(tmp instanceof DataBag)) {

                throw new IOException("Expected input to be a bag of
words, but  got " + tmp.getClass().getName());

            }

            DataBag words = (DataBag)input.get(0);

            String word1 = (String)input.get(1);

            String word2 = (String)input.get(2);

            vectorLength = (int)input.get(3);

                     

            //2. convert the bag to an array

            int index=0;

            String[] wordsarray = new String[(int)words.size()];

            for (Iterator<Tuple> iterator = words.iterator();
iterator.hasNext();) {

                           Tuple word = iterator.next(); //word =
{type:string, value:string}

                           wordsarray[index] = word.get(0).toString();

                           index++;

                           

            }

            

            //run on the array and check for associations

            for (int i = 0; i < wordsarray.length; i++) {

              if(wordsarray[i]==null)

                     break;

              

              if (wordsarray[i] == word1)

                           {

                                  int searchBound = (int) words.size();

                                  if (i + vectorLength <= words.size())


                                         searchBound = i + vectorLength;

                                  for (int j = i; j < searchBound; j++)

                        {

                                         if (wordsarray[j] == word2)

                                                result++;


                        }                              

                                  

                           }

                     }          

         

            return result;

            

        }catch(Exception e){

              throw new IOException(e);

        }

 

       }

 

The test:

 

@Test

       public void testExecTuple() {

              

              Integer result;

              

              TupleFactory mTupleFactory = TupleFactory.getInstance();

              BagFactory mBagFactory = BagFactory.getInstance();

              Tuple input = mTupleFactory.newTuple(4);

              

              DataBag wordvalue = mBagFactory.newDefaultBag();

              wordvalue.add(mTupleFactory.newTuple("crowd"));

              wordvalue.add(mTupleFactory.newTuple("game"));

              wordvalue.add(mTupleFactory.newTuple("football"));

              wordvalue.add(mTupleFactory.newTuple("money"));

              wordvalue.add(mTupleFactory.newTuple("football"));

              wordvalue.add(mTupleFactory.newTuple("ball"));

              wordvalue.add(mTupleFactory.newTuple("tennis"));

              wordvalue.add(mTupleFactory.newTuple("pool"));

              wordvalue.add(mTupleFactory.newTuple("hall"));

              wordvalue.add(mTupleFactory.newTuple("team"));

              wordvalue.add(mTupleFactory.newTuple("manager"));

              wordvalue.add(mTupleFactory.newTuple("olympics"));

              

              try {

                     input.set(0, wordvalue);

                     input.set(1,"crowd");

                     input.set(2, "game");

                     input.set(3, 10);

              } catch (ExecException e1) {

                     // TODO Auto-generated catch block

                     e1.printStackTrace();

              }

              

              

              AssociationEvaluator eval = new AssociationEvaluator();


              

              try {

                     result = eval.exec(input);

                     assertNotSame(0, result);         

                  }

              catch (IOException e) {

                     // TODO Auto-generated catch block

                     e.printStackTrace();

              }

       

       

       }

 

 

 

 

 

Manu Cohen-Yashar

Senior Architect, Cloud Computing and Application Security

Sela Group

 

Phone: 972-4-9881203

Mobile: 972-52-5574551

 

Reply via email to