Hey Augusto,

I haven't used @DefaultCoder, but it could be the problem here.

What if you specify the coder directly for your PCollection? For example:

  pCol.setCoder(AvroCoder.of(YourClazz.class));


Thanks,
Max

On 01.04.19 17:52, Augusto Ribeiro wrote:
Hi Max,

I tried to run the job again in a cluster, this is a thread dump from one of the Spark executors (16 cores)

https://imgur.com/u2Gz0xY

As you can see, almost all threads are blocked on that single Avro reflection method.

Best regards,
Augusto


On 2019/03/27 07:43:17, Augusto Ribeiro <a...@gmail.com <http://gmail.com>> wrote:
 > Hi Max,>
 >
> Thanks for the answer I will give it another try after I sorted out some other things. I will try to save more data next time (screenshots, thread dumps) so that if it happens again I will be more specific in my questions.>
 >
 > Best regards,>
 > Augusto>
 >
> On 2019/03/26 12:31:54, Maximilian Michels <m....@apache.org <http://apache.org>> wrote: >
 > > Hi Augusto,> >
 > > >
> > Generally speaking Avro should provide very good performance. The calls > > > > you are seeing should not be significant because Avro caches the schema > > > > information for a type. It only creates a schema via Reflection the > >
 > > first time it sees a new type.> >
 > > >
> > You can optimize further by using your domain knowledge and create a > > > > custom coder. However, if you do not do anything fancy, I think the odds > >
 > > are low that you will see a performance increase.> >
 > > >
 > > Cheers,> >
 > > Max> >
 > > >
 > > On 26.03.19 09:35, Augusto Ribeiro wrote:> >
 > > > Hi again,> >
 > > > > >
> > > Sorry for bumping this thread but nobody really came with insight.> >
 > > > > >
> > > Should I be defining my own coders for my objects or is it common practice to use the AvroCoder or maybe some other coder?> >
 > > > > >
 > > > Best regards,> >
 > > > Augusto> >
 > > > > >
> > > On 2019/03/21 07:35:07, au...@gmail.com <http://gmail.com> <a....@gmail.com <http://gmail.com>> wrote:> >
 > > >> Hi>> >
 > > >>> >
> > >> I am trying out Beam to do some data aggregations. Many of the inputs/outputs of my transforms are complex objects (not super complex, but containing Maps/Lists/Sets sometimes) so when I was prompted to defined a coder to these objects I added the annotation @DefaultCoder(AvroCoder.class) and things worked in my development environment.>> >
 > > >>> >
> > >> Now that I am trying to run in on "real" data I notice that after I deployed it to a spark runner and looking at some thread dumps, many of the threads were blocked on the following method on the Avro library (ReflectData.getAccessorsFor). So my question is, did I do the wrong thing by using the AvroCoder or is there some other coder that easily can solve my problem?>> >
 > > >>> >
 > > >> Best regards,>> >
 > > >> Augusto>> >
 > > >>> >
 > > >>> >
 > > >>> >
 > > >

Reply via email to