Black, Michael (IS) wrote: 

> Hmmm....a 6.5X speed diff between RAM and disk?  Sounds pretty good to 
> me.  Not sure why you expect better.  

I don't expect better.  As I said, I'm "not really sure what I should 
expect here, or how to find out what I should expect".  

> 10,000/sec is crappy?  And you think this because....????  

Because I think anything besides instantaneous results is crappy.  :) 

But I'm willing for now to live with the max I should reasonably expect 
from modern hardware.  I'd like to know e.g.  how much raid5 is hurting, 
so I can know whether to recommend that the user move to a single disk.  

Jay, btw, I saw your email on stripe sizes -- thanks so much for the 
suggestion -- I'll look into it.  But see below for a recent update to 
app behavior.  

> #1 What OS are you on?  There are numerous disk speed testing programs 
> depending on your OS.  

64-bit RHEL5.4, Linux 2.6.18 for the moment.  I could probably convince 
the user to switch to FreeBSD6 if there are big wins there.  My app 
already works in both OSs.  

> #2 Are you multi-threading?  A seperate reader process could help.  

Just one process, one thread.  

I'm sure you're right -- adding a reader process would certainly help 
run time by a few percentage points, but at the (unjustified at the 
moment) expense of increased maintenance cost & dev time.  This is an 
extremely cheap project so far and I'm trying to keep it that way.  

> #3 How many records total?  

Probably a couple billion, where each record is about 300 bytes across 
about 15 columns.  

> #4 Final size of database?  

After indices I'm guessing we'll be at ~500Gb.  When the db has 0.5b 
records and all indices are defined, we're at around 130Gb.  

> #5 How fast can you read your input file?  

I can generate inputs to SQLite at a rate of at least 65k records/sec.  
I haven't measured the input generation separately from the sqlite 
calls.  

> #6 What happens if you just insert the same records the same # of times 

Haven't tested it.  

> #7 What does your CPU usage show?  

After dropping the indices (per Jay et al's suggestion) I think SQLite 
is actually spending more than half its time on the CPU under large 
record sets.  Obviously I need to measure that more carefully under the 
new DB schema.  

My initial guess that I was inserting in O(1) was wrong -- time to 
insert 2m records went up by about a minute per 100m existing records.  

And this part is interesting: I finished the initial seed and created 
my user indices.  Then I added some more records, and found that insert 
times went down from 9 minutes to 2 minutes per 2 million records.  The 
plot is *striking*.  (I'd like to show it to the forum -- is it possible 
to send emails with attachments here?  It's a 60kb jpg file.)  I'm back 
up to inserting 17k records/second and am almost entirely CPU-bound.  I 
think I'm back in RAM!  

What is going on here?  Is SQLite using index data to do inserts more 
quickly?  Do you think that's causing it to need to read fewer pages on 
a particular insert?  

I'm very interested to see how this would look if I defined the user 
indices before the initial seed.  

> I assume you're multicore (as most are now I think).  

Yes, I have multiple CPUs, but I'm in one process in one thread so I'm 
only using one CPU at a time.  

Now that I'm seeing this CPU-bound behavior after adding indices, I'm 
reconsidering the whole multi-process thing.  Still, at least 75% of CPU 
usage is in SQLite.  More testing needed.  

> Depending on what you're doing with this data are you sure you need a 
> database solution?  

No.  But let's assume for now it's the best thing available to solve 
my problem under tight time constraints -- because in any case they're 
interesting questions, right?  :) 

> I don't recall you really explaining your ultimate goal...  

I tried to state the question as generally as possible while capturing 
the relevant specifics of my problem, so that gurus' answers will be 
useful to more people (including future Eric who is writing another 
application).  

I'll try starting off with those user indices and see how we do.

Thanks again!

Eric 

-- 
Eric A. Smith

I have always wished that my computer would be as easy to use as 
my telephone. My wish has come true. I no longer know how to use 
my telephone.
    -- Bjarne Stroustrup
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to