<snip> If I perform tests via the system using both dd and mkfile, I see speeds of around 50MB/s for WRITES, 60MB/s for READS, however if a colleague loads a 100MB csv file using READSEQ into a Unidata file, not doing anything fancy, I see massive Average Service Times (asvc_t - using IOSTAT) and the device is usually always 100% busy, no real CPU overhead but with 15MB/s tops WRITE. There is only ONE person using this system (to test throughput). <snip>
I don't claim to be an expert in performance monitoring so I'm probably setting myself up for a big fall but... Are you comparing like with like ? dd is simply throwing data out sequentially, Unidata is hashing the key for each record ( how many in 100 MB ?) and writing it to a possibly/ probably different area of the disk. What happens if your colleague loads 100MB using READSEQ and throws the same 100MB back out into a different csv file using WRITESEQ ? What sort of Unidata file is he writing to ? Static or Dynamic ? and what does it look like at the end ? Does your colleague's program first read to see whether the record exists ? is there any correlation between the order of records in the csv file and the groups they end up in the Unidata file ? Just a few thoughts Piers -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
