Hadoop will only use one thread with one Mapper or Reducer instance. Unless you are somehow spawning threads on your own concurrency should not be an issue. I don't known if this behavior is guaranteed but seems to be how it always works. On Dec 19, 2012 4:03 PM, "Yunming Zhang" <[email protected]> wrote:
> Hi , > > I am developing a custom mapper that is somewhat similar to the > multithreaded mapper that came with Hadoop, and I am getting weird errors > when running using multiple threads processing multiple input key, value > pairs simultaneously, here is the stack trace, I looked into > OpenIntDoubleHashMap, and it seems to be stemmed from null values stored in > the tables, > > attempt_201212190955_0004_m_000000_0: > java.lang.ArrayIndexOutOfBoundsException: 24 > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46) > attempt_201212190955_0004_m_000000_0: at > org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18) > > Not sure if anyone knows if it is inherently thread safe to process > multiple input key, val pair to the mapper simultaneously ? > > Thanks > > Yunming
