John Elrick wrote:
> Simon Slavin wrote:
>   
>> On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote:
>>
>>   
>>     
>>> I'm having a problem that I'm trying to find an elegant solution  
>>> to.  I
>>> have a database that stores real-time information - this information  
>>> is
>>> replaced by new values every 5 minutes and has about 30,000 entries.
>>> Once a new database is made available, I should start using that one.
>>>
>>> This database is used as part of an HTTP server that responds to
>>> requests based on the data stored in the database.  So, I'm running  
>>> into
>>> a design issue trying to solve how to "switch" the database from the  
>>> old
>>> one to the new one.  With multiple incoming requests, I'm not sure  
>>> how/
>>> when to update the database.  Also, due to the large number of records
>>> (30,000) that are to be put in every 5 minutes, I don't think I should
>>> just keep adding to the database since it takes quite some time - it
>>> feels better to let a separate process create the database and alert  
>>> my
>>> program that a new file is ready!
>>>     
>>>       
>> You can do all this in one database without problems.  What you need  
>> is to have a 'dataset' concept: every row in the data table belongs to  
>> a particular dataset.  All you need is an integer.  And you make  
>> another table in the same database that has one entry that just stores  
>> which dataset is the current one.
>>
>> When a client needs to consult the table with the data in it, it first  
>> reads the current dataset number (an integer) from the other table.   
>> Then it reads just the rows from the data table that have the same  
>> dataset number.  That's all.
>>
>> The application that updates the database keeps at least two datasets  
>> in the data table: the current one, and one previous one in case  
>> applications haven't finished reading it yet.  When new data comes in,  
>> it adds a new dataset, updates the current dataset table, then deletes  
>> an old dataset.
>>
>> SQLite is pretty good at recovering space from deleted records.  You  
>> might want to read up on VACUUM, but you shouldn't even need that.
>>   
>>     
>
> I like your idea better than mine.  I would suggest some means of 
> ensuring that some process isn't attempting to use a particular dataset, 
> although five minutes may be adequate now, some future process may be 
> introduced which takes longer.  It may be a "bad thing" for the data to 
> suddenly vanish while it was working with it.  That was the underlying 
> motivation for reference counting; although reference counting may not 
> be the only way of accomplishing this goal.
>   

Thanks for the suggestion Simon and thanks for the analysis, John.  
Since I started this thread, I should point out "my" requirements.

I like the idea of the dataset.  I just don't know if inserting 30,000 
inserts in a transaction will cause any concurrency issues.  I'm going 
to prototype this (since SQLite + Ruby is easy to do this in quickly), 
but in your opinions, which of these would be fast and least hassle in 
terms of speed/ concurrency:
* Insert 30,000 records into the 'dataset' directly.  The table into 
which I insert will have an index on 1 field.  I could do it in 1 
transaction or multiple transactions of say 5,000 records each
* Insert all in 1 shot using a .sql file which has ".separator ," 
".import newestfile.csv"
* Insert all the records (using #1 or #2) into a temporary table and 
then an insert into ... select ... to copy it across.

In terms of processing, each request is handled within 0.25 - 1.5 
seconds (depends on workload) and there may be between 0 to 50 (approx) 
threads reading data from a dataset at any given time.  However, 
requests can come at any point in the 5 minutes, so I think maintaining 
at least 1 extra dataset is needed, maybe 2.

As each request finishes, it doesn't need the data any more.  The data 
is really transient - meaning it loses its value quite soon after it's 
been used.

Thanks for your help, guys!

Cheers,
Mohit.
6/9/2009 | 11:55 PM.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to