Hi Max,

I tried to run the job again in a cluster, this is a thread dump from one of 
the Spark executors (16 cores)

https://imgur.com/u2Gz0xY <https://imgur.com/u2Gz0xY>

As you can see, almost all threads are blocked on that single Avro reflection 
method.

Best regards,
Augusto


On 2019/03/27 07:43:17, Augusto Ribeiro <[email protected]> wrote: 
> Hi Max,> 
> 
> Thanks for the answer I will give it another try after I sorted out some 
> other things. I will try to save more data next time (screenshots, thread 
> dumps) so that if it happens again I will be more specific in my questions.> 
> 
> Best regards,> 
> Augusto> 
> 
> On 2019/03/26 12:31:54, Maximilian Michels <[email protected]> wrote: > 
> > Hi Augusto,> > 
> > > 
> > Generally speaking Avro should provide very good performance. The calls > > 
> > you are seeing should not be significant because Avro caches the schema > > 
> > information for a type. It only creates a schema via Reflection the > > 
> > first time it sees a new type.> > 
> > > 
> > You can optimize further by using your domain knowledge and create a > > 
> > custom coder. However, if you do not do anything fancy, I think the odds > 
> > > 
> > are low that you will see a performance increase.> > 
> > > 
> > Cheers,> > 
> > Max> > 
> > > 
> > On 26.03.19 09:35, Augusto Ribeiro wrote:> > 
> > > Hi again,> > 
> > > > > 
> > > Sorry for bumping this thread but nobody really came with insight.> > 
> > > > > 
> > > Should I be defining my own coders for my objects or is it common 
> > > practice to use the AvroCoder or maybe some other coder?> > 
> > > > > 
> > > Best regards,> > 
> > > Augusto> > 
> > > > > 
> > > On 2019/03/21 07:35:07, [email protected] <[email protected]> wrote:> > 
> > >> Hi>> > 
> > >>> > 
> > >> I am trying out Beam to do some data aggregations. Many of the 
> > >> inputs/outputs of my transforms are complex objects (not super complex, 
> > >> but containing Maps/Lists/Sets sometimes) so when I was prompted to 
> > >> defined a coder to these objects I added the annotation 
> > >> @DefaultCoder(AvroCoder.class) and things worked in my development 
> > >> environment.>> > 
> > >>> > 
> > >> Now that I am trying to run in on "real" data I notice that after I 
> > >> deployed it to a spark runner and looking at some thread dumps, many of 
> > >> the threads were blocked on the following method on the Avro library 
> > >> (ReflectData.getAccessorsFor). So my question is, did I do the wrong 
> > >> thing by using the AvroCoder or is there some other coder that easily 
> > >> can solve my problem?>> > 
> > >>> > 
> > >> Best regards,>> > 
> > >> Augusto>> > 
> > >>> > 
> > >>> > 
> > >>> > 
> > > 

Reply via email to