Hi Max, I tried to run the job again in a cluster, this is a thread dump from one of the Spark executors (16 cores)
https://imgur.com/u2Gz0xY <https://imgur.com/u2Gz0xY> As you can see, almost all threads are blocked on that single Avro reflection method. Best regards, Augusto On 2019/03/27 07:43:17, Augusto Ribeiro <[email protected]> wrote: > Hi Max,> > > Thanks for the answer I will give it another try after I sorted out some > other things. I will try to save more data next time (screenshots, thread > dumps) so that if it happens again I will be more specific in my questions.> > > Best regards,> > Augusto> > > On 2019/03/26 12:31:54, Maximilian Michels <[email protected]> wrote: > > > Hi Augusto,> > > > > > > Generally speaking Avro should provide very good performance. The calls > > > > you are seeing should not be significant because Avro caches the schema > > > > information for a type. It only creates a schema via Reflection the > > > > first time it sees a new type.> > > > > > > You can optimize further by using your domain knowledge and create a > > > > custom coder. However, if you do not do anything fancy, I think the odds > > > > > > are low that you will see a performance increase.> > > > > > > Cheers,> > > > Max> > > > > > > On 26.03.19 09:35, Augusto Ribeiro wrote:> > > > > Hi again,> > > > > > > > > > Sorry for bumping this thread but nobody really came with insight.> > > > > > > > > > Should I be defining my own coders for my objects or is it common > > > practice to use the AvroCoder or maybe some other coder?> > > > > > > > > > Best regards,> > > > > Augusto> > > > > > > > > > On 2019/03/21 07:35:07, [email protected] <[email protected]> wrote:> > > > >> Hi>> > > > >>> > > > >> I am trying out Beam to do some data aggregations. Many of the > > >> inputs/outputs of my transforms are complex objects (not super complex, > > >> but containing Maps/Lists/Sets sometimes) so when I was prompted to > > >> defined a coder to these objects I added the annotation > > >> @DefaultCoder(AvroCoder.class) and things worked in my development > > >> environment.>> > > > >>> > > > >> Now that I am trying to run in on "real" data I notice that after I > > >> deployed it to a spark runner and looking at some thread dumps, many of > > >> the threads were blocked on the following method on the Avro library > > >> (ReflectData.getAccessorsFor). So my question is, did I do the wrong > > >> thing by using the AvroCoder or is there some other coder that easily > > >> can solve my problem?>> > > > >>> > > > >> Best regards,>> > > > >> Augusto>> > > > >>> > > > >>> > > > >>> > > > >
