Hi Augusto,
Generally speaking Avro should provide very good performance. The calls
you are seeing should not be significant because Avro caches the schema
information for a type. It only creates a schema via Reflection the
first time it sees a new type.
You can optimize further by using your domain knowledge and create a
custom coder. However, if you do not do anything fancy, I think the odds
are low that you will see a performance increase.
Cheers,
Max
On 26.03.19 09:35, Augusto Ribeiro wrote:
Hi again,
Sorry for bumping this thread but nobody really came with insight.
Should I be defining my own coders for my objects or is it common practice to
use the AvroCoder or maybe some other coder?
Best regards,
Augusto
On 2019/03/21 07:35:07, [email protected] <[email protected]> wrote:
Hi>
I am trying out Beam to do some data aggregations. Many of the inputs/outputs of
my transforms are complex objects (not super complex, but containing
Maps/Lists/Sets sometimes) so when I was prompted to defined a coder to these
objects I added the annotation @DefaultCoder(AvroCoder.class) and things worked in
my development environment.>
Now that I am trying to run in on "real" data I notice that after I deployed it to
a spark runner and looking at some thread dumps, many of the threads were blocked on the
following method on the Avro library (ReflectData.getAccessorsFor). So my question is, did I
do the wrong thing by using the AvroCoder or is there some other coder that easily can solve
my problem?>
Best regards,>
Augusto>