Hey Augusto,
I haven't used @DefaultCoder, but it could be the problem here.
What if you specify the coder directly for your PCollection? For example:
pCol.setCoder(AvroCoder.of(YourClazz.class));
Thanks,
Max
On 01.04.19 17:52, Augusto Ribeiro wrote:
Hi Max,
I tried to run the job again in a cluster, this is a thread dump from
one of the Spark executors (16 cores)
https://imgur.com/u2Gz0xY
As you can see, almost all threads are blocked on that single Avro
reflection method.
Best regards,
Augusto
On 2019/03/27 07:43:17, Augusto Ribeiro <a...@gmail.com
<http://gmail.com>> wrote:
> Hi Max,>
>
> Thanks for the answer I will give it another try after I sorted out
some other things. I will try to save more data next time (screenshots,
thread dumps) so that if it happens again I will be more specific in my
questions.>
>
> Best regards,>
> Augusto>
>
> On 2019/03/26 12:31:54, Maximilian Michels <m....@apache.org
<http://apache.org>> wrote: >
> > Hi Augusto,> >
> > >
> > Generally speaking Avro should provide very good performance. The
calls > >
> > you are seeing should not be significant because Avro caches the
schema > >
> > information for a type. It only creates a schema via Reflection the
> >
> > first time it sees a new type.> >
> > >
> > You can optimize further by using your domain knowledge and create
a > >
> > custom coder. However, if you do not do anything fancy, I think the
odds > >
> > are low that you will see a performance increase.> >
> > >
> > Cheers,> >
> > Max> >
> > >
> > On 26.03.19 09:35, Augusto Ribeiro wrote:> >
> > > Hi again,> >
> > > > >
> > > Sorry for bumping this thread but nobody really came with
insight.> >
> > > > >
> > > Should I be defining my own coders for my objects or is it common
practice to use the AvroCoder or maybe some other coder?> >
> > > > >
> > > Best regards,> >
> > > Augusto> >
> > > > >
> > > On 2019/03/21 07:35:07, au...@gmail.com <http://gmail.com>
<a....@gmail.com <http://gmail.com>> wrote:> >
> > >> Hi>> >
> > >>> >
> > >> I am trying out Beam to do some data aggregations. Many of the
inputs/outputs of my transforms are complex objects (not super complex,
but containing Maps/Lists/Sets sometimes) so when I was prompted to
defined a coder to these objects I added the annotation
@DefaultCoder(AvroCoder.class) and things worked in my development
environment.>> >
> > >>> >
> > >> Now that I am trying to run in on "real" data I notice that
after I deployed it to a spark runner and looking at some thread dumps,
many of the threads were blocked on the following method on the Avro
library (ReflectData.getAccessorsFor). So my question is, did I do the
wrong thing by using the AvroCoder or is there some other coder that
easily can solve my problem?>> >
> > >>> >
> > >> Best regards,>> >
> > >> Augusto>> >
> > >>> >
> > >>> >
> > >>> >
> > >