Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our
version and let every one .Thanks for jumping in.

On Jan 23, 2018 5:19 PM, "Doug Cutting" <[email protected]> wrote:

> This sounds like AVRO-1760, fixed since Avro 1.8.0.
>
> https://issues.apache.org/jira/browse/AVRO-1760
>
> What version of Avro are you using?
>
> Doug
>
> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <[email protected]>
> wrote:
>
>> Hi All,
>>
>> We have a process that reads data from a  local file share  ,serailizes
>> and writes to HDFS in avro format. Currently it is running as a single
>> threaded process. When converted  to a parallel process we did  get  some
>> performance improvement  but  not the desired .Thread dumps are pasted
>> below .I am just wondering if I am building the avro objects correctly.
>> For every record that  that is read from the binary file we create an
>> equivalent avro object in the below format. Our avro schema is  pretty
>> big,  around 1800 fields and all of those have default values . After doing
>> some profiling  I  could see that the most  time consuming method is
>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
>> taking  more time than doing the actual reads/writes. Thanks for taking a
>> look.
>>
>> Parent p = new Parent();
>> LOGHDR hdr = LOGHDR.newBuilder().build()
>> MSGHDR msg = MSGHDR.newBuilder().build()
>> p.setHdr(hdr);
>> p.setMsg(msg);
>>
>> Then  all fields in p and all the nested types that p holds together like
>> LOGHDR and MSGHDR are set  .
>>
>>
>>
>>
>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>> waiting for monitor entry [0x00007fad52833000]
>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>                at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>>                - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>>                at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>>                at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>
>>
>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>> waiting for monitor entry [0x00007fad52934000]
>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>                at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>>                - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>>                at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>>                at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>                at com.model.avro.SEGMENT1B$Build
>> er.build(SEGMENT1B.java:4362)
>>
>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>> runnable [0x00007fad52b36000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>> 584)
>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>> izedMap)
>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>> ata.java:981)
>>
>>
>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <[email protected]>
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a process that reads data from a  local file share  ,serailizes
>>> and writes to HDFS in avro format. Currently it is running as a single
>>> threaded process. When converted t to a parallel process we did  get  some
>>> performance improvement  but  not the desired .Thread dumps show  that at
>>> any time only on thread  has access to  this method and others are  blocked
>>> .I am just wondering if I am building the avro objects correctly.
>>>
>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>>> waiting for monitor entry [0x00007fad52833000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>
>>>
>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>>> waiting for monitor entry [0x00007fad52934000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>                at com.model.avro.SEGMENT1B$Build
>>> er.build(SEGMENT1B.java:4362)
>>>
>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>>> runnable [0x00007fad52b36000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>> 584)
>>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>> izedMap)
>>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>> ata.java:981)
>>>
>>>
>>>
>>
>

Reply via email to