Hi Yunming,

I took a look at the source and concluded that it is not thread safe code. CIMapper seemed okay, but it uses an instance variable of type ClusterClassifier which not thread safe. Maybe that's why ClusterClassifer is in the stack trace :-)

Marty

On 12/19/2012 11:02 AM, Yunming Zhang wrote:
Hi ,

I am developing a custom mapper that is somewhat similar to the multithreaded 
mapper that came with Hadoop, and I am getting weird errors when running using 
multiple threads processing multiple input key, value pairs simultaneously, 
here is the stack trace, I looked into OpenIntDoubleHashMap, and it seems to be 
stemmed from null values stored in the tables,

attempt_201212190955_0004_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: 
24
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46)
attempt_201212190955_0004_m_000000_0:   at 
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)

Not sure if anyone knows if it is inherently thread safe to process multiple 
input key, val pair to the mapper simultaneously ?

Thanks

Yunming


Reply via email to