poor Perl vs. Java Thrift performance in Cassandra

Ted Zlatanov Mon, 10 May 2010 08:50:38 -0700

Apologies if this has been discussed before but I didn't see it in the
archives.


I see poor performance of any Perl code against Cassandra compared to
Java.  I generally clock a 5-20x speed difference using the raw Thrift
API, depending on the number of structures that need to be
serialized/deserialized.  This is with Perl 5.10 vs. the latest Sun JVM.

I maintain the Net::Cassandra::Easy Perl module that uses this interface
so I'd like to make it faster.  I think any performance improvements
would be good for all Thrift users so I am posting here in the hopes of
getting some feedback.

It seems to me like one of the problems is the large number of OO method
calls, which in Perl are slower than function calls.  Another is that
pack()/unpack() is probably the fastest way to serialize/deserialize data
in Perl, but it's not used much.  Instead I see step-by-step
accumulation of values from the source data, which is suboptimal.  In
Java this makes perfect sense but in Perl it drags performance down.

Perhaps a good optimization would be to generate the pack/unpack format
strings at compilation time, combine them with static function wrappers,
and use that instead of multiple OO calls?  Although I am comfortable
with Perl, I don't know Thrift well enough to recommend the best
approach there.  I hope to be helpful with benchmarks and specific
optimizations, though.

Thanks
Ted

poor Perl vs. Java Thrift performance in Cassandra

Reply via email to