Hi Yunming,
I took a look at the source and concluded that it is not thread safe
code. CIMapper seemed okay, but it uses an instance variable of type
ClusterClassifier which not thread safe. Maybe that's why
ClusterClassifer is in the stack trace :-)
Marty
On 12/19/2012 11:02 AM, Yunming Zhang wrote:
Hi ,
I am developing a custom mapper that is somewhat similar to the multithreaded
mapper that came with Hadoop, and I am getting weird errors when running using
multiple threads processing multiple input key, value pairs simultaneously,
here is the stack trace, I looked into OpenIntDoubleHashMap, and it seems to be
stemmed from null values stored in the tables,
attempt_201212190955_0004_m_000000_0: java.lang.ArrayIndexOutOfBoundsException:
24
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46)
attempt_201212190955_0004_m_000000_0: at
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)
Not sure if anyone knows if it is inherently thread safe to process multiple
input key, val pair to the mapper simultaneously ?
Thanks
Yunming