Let us know what you find, especially if the serializer needs to be more defensive to ensure proper caching.
On Tue, Sep 6, 2016 at 8:45 AM Kristopher Kane <[email protected]> wrote: > Come to think of it, I did see RestUtils rank some what higher in the > visualvm CPU profiler but did not give it the attention it deserved. > > On Tue, Sep 6, 2016 at 9:39 AM, Aaron Niskodé-Dossett <[email protected]> > wrote: > >> Hi Kris, >> >> One possibility is that the Serializer isn't actually caching the schema >> <-> id mappings and is hitting the schema registry every time. The call to >> register() in getFingerprint() [1] in particular can be a finicky since the >> cache is ultimately in an IDENTITY hash map, not a regular old hashmap[2]. >> I'm familiar with the Avro deserializer you're using and though it >> accounted for this, but perhaps not. >> >> You could add timing information to the getFingerprint() and getSchema() >> calls in ConfluentAvroSerializer. If the results indicate cache misses, >> that's probably your culprit. >> >> Best, Aaron >> >> >> [1] >> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/avro/ConfluentAvroSerializer.java#L66 >> [2] >> https://github.com/confluentinc/schema-registry/blob/v1.0/client/src/main/java/io/confluent/kafka/schemaregistry/client/CachedSchemaRegistryClient.java#L79 >> >> On Tue, Sep 6, 2016 at 7:40 AM Kristopher Kane <[email protected]> >> wrote: >> >>> Hi everyone. >>> >>> I have a simple topology that uses the Avro serializer ( >>> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/avro/ConfluentAvroSerializer.java) >>> and writes to Elasticsearch. >>> >>> The topology is like this: >>> >>> Kafka (raw scheme) -> Avro deserializer -> Elasticsearch >>> >>> This topology runs well with one worker, however, once I add one more >>> worker (total of two) and change nothing else, the topology throughput >>> drops and tuples start timing out. >>> >>> I've attached visualvm/jstatd to the workers when in multi worker mode - >>> and added some jmx configs to the worker opts - but I am unable to see >>> anything glaring. >>> >>> I've never seen Storm act this way but have also never worked with a >>> custom serializer so assume that it is the culprit but I cannot explain >>> why. >>> >>> Any pointers would be appreciated. >>> >>> Kris >>> >>> >>> >>> >>> >
