Sean is right, most MR code is not and does not need to be thread safe.

Why are you writing a multi-threaded mapper?

On 12/19/2012 07:50 PM, Sean Owen wrote:
Hadoop will only use one thread with one Mapper or Reducer instance. Unless
you are somehow spawning threads on your own concurrency should not be an
issue. I don't known if this behavior is guaranteed but seems to be how it
always works.
On Dec 19, 2012 4:03 PM, "Yunming Zhang" <[email protected]> wrote:

Hi ,

I am developing a custom mapper that is somewhat similar to the
multithreaded mapper that came with Hadoop, and I am getting weird errors
when running using multiple threads processing multiple input key, value
pairs simultaneously, here is the stack trace, I looked into
OpenIntDoubleHashMap, and it seems to be stemmed from null values stored in
the tables,

attempt_201212190955_0004_m_000000_0:
java.lang.ArrayIndexOutOfBoundsException: 24
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46)
attempt_201212190955_0004_m_000000_0:   at
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)

Not sure if anyone knows if it is inherently thread safe to process
multiple input key, val pair to the mapper simultaneously ?

Thanks

Yunming

Reply via email to