<snip> If I perform tests via the system using both dd and mkfile, I see
speeds
of around 50MB/s for WRITES, 60MB/s for READS, however if a colleague
loads a 100MB csv file using READSEQ into a Unidata file, not doing
anything fancy, I see massive Average Service Times (asvc_t - using
IOSTAT) and the device is usually always 100% busy, no real CPU overhead
but with 15MB/s tops WRITE. There is only ONE person using this system
(to test throughput). <snip>

I don't claim to be an expert in performance monitoring so I'm probably
setting myself up for a big fall but...

Are you comparing like with like ? dd is simply throwing data out
sequentially, Unidata is hashing the key for each record ( how many in 100
MB ?) and writing it to a possibly/ probably different area of the disk.

What happens if your colleague loads 100MB using READSEQ and throws the same
100MB back out into a different csv file using WRITESEQ ?

What sort of Unidata file is he writing to ? Static or Dynamic ? and what
does it look like at the end ?

Does your colleague's program first read to see whether the record exists ?
is there any correlation between the order of records in the csv file and
the groups they end up in the Unidata file ?

Just a few thoughts

Piers


-- 
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

Reply via email to