Re: Is the implementation of CIMapper thread safe ?

Marty Kube Thu, 20 Dec 2012 06:54:53 -0800

Writing thread safe code is hard. Don't do it unless you have too. 

On Dec 20, 2012, at 4:28 AM, Sean Owen <[email protected]> wrote:


> ... but making the implementation thread-safe won't make it be used by
> multiple threads. If you want more parallelism, suggest to Hadoop to
> use more mappers by reducing the max input split size. But this is
> still not going to require your mappers to be thread-safe.
> 
> if you mean you are making your own parallelism in miniature by
> writing a multi-threaded mapper, I wouldn't bother. Just use more
> parallelism via Hadoop.
> 
> On Thu, Dec 20, 2012 at 3:31 AM, Yunming Zhang
> <[email protected]> wrote:
>> Thanks Marty, Sean,
>> 
>> yeah, I took a look at the source code yesterday and realized that it is not 
>> thread safe as well,
>> 
>> I am working on a high performance mapper that require making the mapper 
>> thread safe so I could exploit the data parallelism that comes with 
>> processing multiple input <key, val> pairs to a single mapper,
>> 
>> I am currently researching into if there is any easy way that I could make 
>> the CIMapper implementation thread safe by may be making a few key data 
>> structures that are thread safe, like the OpenIntDoubleHashMap, and 
>> hopefully this won't screw up the correctness of the algorithm itself,
>> 
>> Yunming
>> 
>> On Dec 20, 2012, at 9:07 AM, Marty Kube 
>> <[email protected]> wrote:
>> 
>>> Sean is right, most MR code is not and does not need to be thread safe.
>>> 
>>> Why are you writing a multi-threaded mapper?
>>> 
>>> On 12/19/2012 07:50 PM, Sean Owen wrote:
>>>> Hadoop will only use one thread with one Mapper or Reducer instance. Unless
>>>> you are somehow spawning threads on your own concurrency should not be an
>>>> issue. I don't known if this behavior is guaranteed but seems to be how it
>>>> always works.
>>>> On Dec 19, 2012 4:03 PM, "Yunming Zhang" <[email protected]> 
>>>> wrote:
>>>> 
>>>>> Hi ,
>>>>> 
>>>>> I am developing a custom mapper that is somewhat similar to the
>>>>> multithreaded mapper that came with Hadoop, and I am getting weird errors
>>>>> when running using multiple threads processing multiple input key, value
>>>>> pairs simultaneously, here is the stack trace, I looked into
>>>>> OpenIntDoubleHashMap, and it seems to be stemmed from null values stored 
>>>>> in
>>>>> the tables,
>>>>> 
>>>>> attempt_201212190955_0004_m_000000_0:
>>>>> java.lang.ArrayIndexOutOfBoundsException: 24
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:278)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:198)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:130)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:738)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:263)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:234)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:229)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:37)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.classify.ClusterClassifier.train(ClusterClassifier.java:158)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:46)
>>>>> attempt_201212190955_0004_m_000000_0:   at
>>>>> org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)
>>>>> 
>>>>> Not sure if anyone knows if it is inherently thread safe to process
>>>>> multiple input key, val pair to the mapper simultaneously ?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Yunming
>>

Re: Is the implementation of CIMapper thread safe ?

Reply via email to