Hi , I am developing a custom mapper that is somewhat similar to the multithreaded mapper that came with Hadoop, and I am getting weird errors when running using multiple threads processing multiple input key, value pairs simultaneously, here is the stack trace, I looked into OpenIntDoubleHashMap, and it seems to be stemmed from null values stored in the tables,
attempt_201212190955_0004_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: 24 attempt_201212190955_0004_m_000000_0: at org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46) attempt_201212190955_0004_m_000000_0: at org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18) Not sure if anyone knows if it is inherently thread safe to process multiple input key, val pair to the mapper simultaneously ? Thanks Yunming
