Thanks Marty, Sean, 

yeah, I took a look at the source code yesterday and realized that it is not 
thread safe as well, 

I am working on a high performance mapper that require making the mapper thread 
safe so I could exploit the data parallelism that comes with processing 
multiple input <key, val> pairs to a single mapper, 
 
I am currently researching into if there is any easy way that I could make the 
CIMapper implementation thread safe by may be making a few key data structures 
that are thread safe, like the OpenIntDoubleHashMap, and hopefully this won't 
screw up the correctness of the algorithm itself,

Yunming

On Dec 20, 2012, at 9:07 AM, Marty Kube <[email protected]> 
wrote:

> Sean is right, most MR code is not and does not need to be thread safe.
> 
> Why are you writing a multi-threaded mapper?
> 
> On 12/19/2012 07:50 PM, Sean Owen wrote:
>> Hadoop will only use one thread with one Mapper or Reducer instance. Unless
>> you are somehow spawning threads on your own concurrency should not be an
>> issue. I don't known if this behavior is guaranteed but seems to be how it
>> always works.
>> On Dec 19, 2012 4:03 PM, "Yunming Zhang" <[email protected]> wrote:
>> 
>>> Hi ,
>>> 
>>> I am developing a custom mapper that is somewhat similar to the
>>> multithreaded mapper that came with Hadoop, and I am getting weird errors
>>> when running using multiple threads processing multiple input key, value
>>> pairs simultaneously, here is the stack trace, I looked into
>>> OpenIntDoubleHashMap, and it seems to be stemmed from null values stored in
>>> the tables,
>>> 
>>> attempt_201212190955_0004_m_000000_0:
>>> java.lang.ArrayIndexOutOfBoundsException: 24
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46)
>>> attempt_201212190955_0004_m_000000_0:   at
>>> org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)
>>> 
>>> Not sure if anyone knows if it is inherently thread safe to process
>>> multiple input key, val pair to the mapper simultaneously ?
>>> 
>>> Thanks
>>> 
>>> Yunming
> 

Reply via email to