Hi,
I am looking at cassandra for a logging application. We currently log to a
Postgresql database.
I set up 2 cassandra servers for testing. I did a benchmark where I had 100
hashes representing logs entries, read from a json file. I then looped over
these to do 10,000 log inserts. I repeated the same writing to a postgresql
instance on one of the cassandra servers. The script is attached. The cassandra
writes appear to perform a lot worse. Is this expected?
jeff@transcoder01:~$ ruby cassandra-bm.rb
cassandra
3.170000 0.480000 3.650000 ( 12.032212)
jeff@transcoder01:~$ ruby cassandra-bm.rb
postgres
2.140000 0.330000 2.470000 ( 7.002601)
Regards,
Jeff
require 'rubygems'
require 'cassandra-cql'
require 'simple_uuid'
require 'benchmark'
require 'json'
require 'active_record'
type = 'postgres'
#type = 'cassandra'
puts type
ActiveRecord::Base.establish_connection(
#:adapter => "jdbcpostgresql",
:adapter => "postgresql",
:host => "meta01",
:username => "postgres",
:database => "test")
db = nil
if type == 'postgres'
db = ActiveRecord::Base.connection
else
db = CassandraCQL::Database.new('meta01:9160', {:keyspace => 'PlayLog'})
end
def cql_insert(table, key, key_value)
cql = "INSERT INTO #{table} (KEY, "
cql << key_value.keys.join(', ')
cql << ") VALUES ('#{key}', "
cql << (key_value.values.map {|x| "'#{x}'" }).join(', ')
cql << ")"
cql
end
def quote_value(x, type=nil)
if x.nil?
return 'NULL'
else
return "'#{x}'"
end
end
def sql_insert(table, key_value)
key_value.delete('time')
cql = "INSERT INTO #{table} ("
cql << key_value.keys.join(', ')
cql << ") VALUES ("
cql << (key_value.values.map {|x| quote_value(x) }).join(', ')
cql << ")"
cql
end
# load 100 hashes of log details
rows = []
File.open('data.json') do |f|
rows = JSON.load(f)
end
bm = Benchmark.measure do
(1..10000).each do |i|
row = rows[i%100]
if type == 'postgres'
fred = sql_insert('playlog', row)
else
fred = cql_insert('playlog', SimpleUUID::UUID.new.to_guid, row)
end
db.execute(fred)
end
end
puts bm