After upgrading to 1.8.2  I do not see hotspots with parallel threads.
Thanks Doug!

On Tue, Jan 23, 2018 at 8:41 PM, Nishanth S <[email protected]> wrote:

> Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our
> version and let every one .Thanks for jumping in.
>
> On Jan 23, 2018 5:19 PM, "Doug Cutting" <[email protected]> wrote:
>
>> This sounds like AVRO-1760, fixed since Avro 1.8.0.
>>
>> https://issues.apache.org/jira/browse/AVRO-1760
>>
>> What version of Avro are you using?
>>
>> Doug
>>
>> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <[email protected]>
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a process that reads data from a  local file share  ,serailizes
>>> and writes to HDFS in avro format. Currently it is running as a single
>>> threaded process. When converted  to a parallel process we did  get  some
>>> performance improvement  but  not the desired .Thread dumps are pasted
>>> below .I am just wondering if I am building the avro objects correctly.
>>> For every record that  that is read from the binary file we create an
>>> equivalent avro object in the below format. Our avro schema is  pretty
>>> big,  around 1800 fields and all of those have default values . After doing
>>> some profiling  I  could see that the most  time consuming method is
>>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
>>> taking  more time than doing the actual reads/writes. Thanks for taking a
>>> look.
>>>
>>> Parent p = new Parent();
>>> LOGHDR hdr = LOGHDR.newBuilder().build()
>>> MSGHDR msg = MSGHDR.newBuilder().build()
>>> p.setHdr(hdr);
>>> p.setMsg(msg);
>>>
>>> Then  all fields in p and all the nested types that p holds together
>>> like LOGHDR and MSGHDR are set  .
>>>
>>>
>>>
>>>
>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>>> waiting for monitor entry [0x00007fad52833000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>
>>>
>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>>> waiting for monitor entry [0x00007fad52934000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>                at com.model.avro.SEGMENT1B$Build
>>> er.build(SEGMENT1B.java:4362)
>>>
>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>>> runnable [0x00007fad52b36000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>> 584)
>>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>> izedMap)
>>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>> ata.java:981)
>>>
>>>
>>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <[email protected]>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have a process that reads data from a  local file share  ,serailizes
>>>> and writes to HDFS in avro format. Currently it is running as a single
>>>> threaded process. When converted t to a parallel process we did  get  some
>>>> performance improvement  but  not the desired .Thread dumps show  that at
>>>> any time only on thread  has access to  this method and others are  blocked
>>>> .I am just wondering if I am building the avro objects correctly.
>>>>
>>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800
>>>> nid=0x4328 waiting for monitor entry [0x00007fad52833000]
>>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>>                at java.util.Collections$Synchron
>>>> izedMap.get(Collections.java:2584)
>>>>                - waiting to lock <0x000000066a5e3460> (a
>>>> java.util.Collections$SynchronizedMap)
>>>>                at org.apache.avro.generic.Generi
>>>> cData.getDefaultValue(GenericData.java:981)
>>>>                at org.apache.avro.data.RecordBui
>>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>>
>>>>
>>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000
>>>> nid=0x4327 waiting for monitor entry [0x00007fad52934000]
>>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>>                at java.util.Collections$Synchron
>>>> izedMap.get(Collections.java:2584)
>>>>                - waiting to lock <0x000000066a5e3460> (a
>>>> java.util.Collections$SynchronizedMap)
>>>>                at org.apache.avro.generic.Generi
>>>> cData.getDefaultValue(GenericData.java:981)
>>>>                at org.apache.avro.data.RecordBui
>>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>>                at com.model.avro.SEGMENT1B$Build
>>>> er.build(SEGMENT1B.java:4362)
>>>>
>>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800
>>>> nid=0x4325 runnable [0x00007fad52b36000]
>>>>    java.lang.Thread.State: RUNNABLE
>>>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>>> 584)
>>>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>>> izedMap)
>>>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>>> ata.java:981)
>>>>
>>>>
>>>>
>>>
>>

Reply via email to