After upgrading to 1.8.2 I do not see hotspots with parallel threads. Thanks Doug!
On Tue, Jan 23, 2018 at 8:41 PM, Nishanth S <[email protected]> wrote: > Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our > version and let every one .Thanks for jumping in. > > On Jan 23, 2018 5:19 PM, "Doug Cutting" <[email protected]> wrote: > >> This sounds like AVRO-1760, fixed since Avro 1.8.0. >> >> https://issues.apache.org/jira/browse/AVRO-1760 >> >> What version of Avro are you using? >> >> Doug >> >> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <[email protected]> >> wrote: >> >>> Hi All, >>> >>> We have a process that reads data from a local file share ,serailizes >>> and writes to HDFS in avro format. Currently it is running as a single >>> threaded process. When converted to a parallel process we did get some >>> performance improvement but not the desired .Thread dumps are pasted >>> below .I am just wondering if I am building the avro objects correctly. >>> For every record that that is read from the binary file we create an >>> equivalent avro object in the below format. Our avro schema is pretty >>> big, around 1800 fields and all of those have default values . After doing >>> some profiling I could see that the most time consuming method is >>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact >>> taking more time than doing the actual reads/writes. Thanks for taking a >>> look. >>> >>> Parent p = new Parent(); >>> LOGHDR hdr = LOGHDR.newBuilder().build() >>> MSGHDR msg = MSGHDR.newBuilder().build() >>> p.setHdr(hdr); >>> p.setMsg(msg); >>> >>> Then all fields in p and all the nested types that p holds together >>> like LOGHDR and MSGHDR are set . >>> >>> >>> >>> >>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328 >>> waiting for monitor entry [0x00007fad52833000] >>> java.lang.Thread.State: BLOCKED (on object monitor) >>> at java.util.Collections$Synchron >>> izedMap.get(Collections.java:2584) >>> - waiting to lock <0x000000066a5e3460> (a >>> java.util.Collections$SynchronizedMap) >>> at org.apache.avro.generic.Generi >>> cData.getDefaultValue(GenericData.java:981) >>> at org.apache.avro.data.RecordBui >>> lderBase.defaultValue(RecordBuilderBase.java:135) >>> >>> >>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327 >>> waiting for monitor entry [0x00007fad52934000] >>> java.lang.Thread.State: BLOCKED (on object monitor) >>> at java.util.Collections$Synchron >>> izedMap.get(Collections.java:2584) >>> - waiting to lock <0x000000066a5e3460> (a >>> java.util.Collections$SynchronizedMap) >>> at org.apache.avro.generic.Generi >>> cData.getDefaultValue(GenericData.java:981) >>> at org.apache.avro.data.RecordBui >>> lderBase.defaultValue(RecordBuilderBase.java:135) >>> at com.model.avro.SEGMENT1B$Build >>> er.build(SEGMENT1B.java:4362) >>> >>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325 >>> runnable [0x00007fad52b36000] >>> java.lang.Thread.State: RUNNABLE >>> at java.util.Collections$SynchronizedMap.get(Collections.java:2 >>> 584) >>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron >>> izedMap) >>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD >>> ata.java:981) >>> >>> >>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <[email protected]> >>> wrote: >>> >>>> Hi All, >>>> >>>> We have a process that reads data from a local file share ,serailizes >>>> and writes to HDFS in avro format. Currently it is running as a single >>>> threaded process. When converted t to a parallel process we did get some >>>> performance improvement but not the desired .Thread dumps show that at >>>> any time only on thread has access to this method and others are blocked >>>> .I am just wondering if I am building the avro objects correctly. >>>> >>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 >>>> nid=0x4328 waiting for monitor entry [0x00007fad52833000] >>>> java.lang.Thread.State: BLOCKED (on object monitor) >>>> at java.util.Collections$Synchron >>>> izedMap.get(Collections.java:2584) >>>> - waiting to lock <0x000000066a5e3460> (a >>>> java.util.Collections$SynchronizedMap) >>>> at org.apache.avro.generic.Generi >>>> cData.getDefaultValue(GenericData.java:981) >>>> at org.apache.avro.data.RecordBui >>>> lderBase.defaultValue(RecordBuilderBase.java:135) >>>> >>>> >>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 >>>> nid=0x4327 waiting for monitor entry [0x00007fad52934000] >>>> java.lang.Thread.State: BLOCKED (on object monitor) >>>> at java.util.Collections$Synchron >>>> izedMap.get(Collections.java:2584) >>>> - waiting to lock <0x000000066a5e3460> (a >>>> java.util.Collections$SynchronizedMap) >>>> at org.apache.avro.generic.Generi >>>> cData.getDefaultValue(GenericData.java:981) >>>> at org.apache.avro.data.RecordBui >>>> lderBase.defaultValue(RecordBuilderBase.java:135) >>>> at com.model.avro.SEGMENT1B$Build >>>> er.build(SEGMENT1B.java:4362) >>>> >>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 >>>> nid=0x4325 runnable [0x00007fad52b36000] >>>> java.lang.Thread.State: RUNNABLE >>>> at java.util.Collections$SynchronizedMap.get(Collections.java:2 >>>> 584) >>>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron >>>> izedMap) >>>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD >>>> ata.java:981) >>>> >>>> >>>> >>> >>
