Re: Is AvroCoder the right coder for me?

2019-04-08 Thread Augusto Ribeiro
Hi Ryan, Thanks for the input. When I last tried running my pipeline, this problem doesn't seem to be a huge bottleneck. I probably had other things that were making it worse. I still think it is weird that when you take a thread dump "snapshot" most of the methods are waiting on that lock so

Re: Is AvroCoder the right coder for me?

2019-04-04 Thread Ryan Skraba
Hello Augusto! I just took a look. The behaviour that you're seeing looks like it's set in Avro ReflectData -- to avoid doing expensive reflection calls for each serialization/deserialization, it uses a cache per-class AND access is synchronized [1]. Only one thread in your executor JVM is

Re: Is AvroCoder the right coder for me?

2019-04-02 Thread Maximilian Michels
Hey Augusto, I haven't used @DefaultCoder, but it could be the problem here. What if you specify the coder directly for your PCollection? For example: pCol.setCoder(AvroCoder.of(YourClazz.class)); Thanks, Max On 01.04.19 17:52, Augusto Ribeiro wrote: Hi Max, I tried to run the job again

Re: Is AvroCoder the right coder for me?

2019-04-01 Thread Augusto Ribeiro
Hi Max, I tried to run the job again in a cluster, this is a thread dump from one of the Spark executors (16 cores) https://imgur.com/u2Gz0xY As you can see, almost all threads are blocked on that single Avro reflection method. Best regards, Augusto On

Re: Is AvroCoder the right coder for me?

2019-03-27 Thread Augusto Ribeiro
Hi Max, Thanks for the answer I will give it another try after I sorted out some other things. I will try to save more data next time (screenshots, thread dumps) so that if it happens again I will be more specific in my questions. Best regards, Augusto On 2019/03/26 12:31:54, Maximilian

Re: Is AvroCoder the right coder for me?

2019-03-26 Thread Maximilian Michels
Hi Augusto, Generally speaking Avro should provide very good performance. The calls you are seeing should not be significant because Avro caches the schema information for a type. It only creates a schema via Reflection the first time it sees a new type. You can optimize further by using

Is AvroCoder the right coder for me?

2019-03-21 Thread augusto . mcc
Hi I am trying out Beam to do some data aggregations. Many of the inputs/outputs of my transforms are complex objects (not super complex, but containing Maps/Lists/Sets sometimes) so when I was prompted to defined a coder to these objects I added the annotation @DefaultCoder(AvroCoder.class)