Hi All,
We have a process that reads data from a local file share ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted to a parallel process we did get some
performance improvement but not the desired .Thread dumps are pasted
below .I am just wondering if I am building the avro objects correctly. For
every record that that is read from the binary file we create an
equivalent avro object in the below format. Our avro schema is pretty
big, around 1800 fields and all of those have default values . After doing
some profiling I could see that the most time consuming method
is org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
taking more time than doing the actual reads/writes. Thanks for taking a
look.
Parent p = new Parent();
LOGHDR hdr = LOGHDR.newBuilder().build()
MSGHDR msg = MSGHDR.newBuilder().build()
p.setHdr(hdr);
p.setMsg(msg);
Then all fields in p and all the nested types that p holds together like
LOGHDR and MSGHDR are set .
"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
4362)
"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
java.lang.Thread.State: RUNNABLE
at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
- locked <0x000000066a5e3460> (a java.util.Collections$
SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <[email protected]> wrote:
> Hi All,
>
> We have a process that reads data from a local file share ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted t to a parallel process we did get some
> performance improvement but not the desired .Thread dumps show that at
> any time only on thread has access to this method and others are blocked
> .I am just wondering if I am building the avro objects correctly.
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
> at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
> at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
> at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
> 4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
> java.lang.Thread.State: RUNNABLE
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - locked <0x000000066a5e3460> (a java.util.Collections$
> SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>
>
>