Re: [sqlite] Working with a new DB every 5 minutes

2009-06-09 Thread Mohit Sindhwani
John Elrick wrote:
> Simon Slavin wrote:
>   
>> On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote:
>>
>>   
>> 
>>> I'm having a problem that I'm trying to find an elegant solution  
>>> to.  I
>>> have a database that stores real-time information - this information  
>>> is
>>> replaced by new values every 5 minutes and has about 30,000 entries.
>>> Once a new database is made available, I should start using that one.
>>>
>>> This database is used as part of an HTTP server that responds to
>>> requests based on the data stored in the database.  So, I'm running  
>>> into
>>> a design issue trying to solve how to "switch" the database from the  
>>> old
>>> one to the new one.  With multiple incoming requests, I'm not sure  
>>> how/
>>> when to update the database.  Also, due to the large number of records
>>> (30,000) that are to be put in every 5 minutes, I don't think I should
>>> just keep adding to the database since it takes quite some time - it
>>> feels better to let a separate process create the database and alert  
>>> my
>>> program that a new file is ready!
>>> 
>>>   
>> You can do all this in one database without problems.  What you need  
>> is to have a 'dataset' concept: every row in the data table belongs to  
>> a particular dataset.  All you need is an integer.  And you make  
>> another table in the same database that has one entry that just stores  
>> which dataset is the current one.
>>
>> When a client needs to consult the table with the data in it, it first  
>> reads the current dataset number (an integer) from the other table.   
>> Then it reads just the rows from the data table that have the same  
>> dataset number.  That's all.
>>
>> The application that updates the database keeps at least two datasets  
>> in the data table: the current one, and one previous one in case  
>> applications haven't finished reading it yet.  When new data comes in,  
>> it adds a new dataset, updates the current dataset table, then deletes  
>> an old dataset.
>>
>> SQLite is pretty good at recovering space from deleted records.  You  
>> might want to read up on VACUUM, but you shouldn't even need that.
>>   
>> 
>
> I like your idea better than mine.  I would suggest some means of 
> ensuring that some process isn't attempting to use a particular dataset, 
> although five minutes may be adequate now, some future process may be 
> introduced which takes longer.  It may be a "bad thing" for the data to 
> suddenly vanish while it was working with it.  That was the underlying 
> motivation for reference counting; although reference counting may not 
> be the only way of accomplishing this goal.
>   

Thanks for the suggestion Simon and thanks for the analysis, John.  
Since I started this thread, I should point out "my" requirements.

I like the idea of the dataset.  I just don't know if inserting 30,000 
inserts in a transaction will cause any concurrency issues.  I'm going 
to prototype this (since SQLite + Ruby is easy to do this in quickly), 
but in your opinions, which of these would be fast and least hassle in 
terms of speed/ concurrency:
* Insert 30,000 records into the 'dataset' directly.  The table into 
which I insert will have an index on 1 field.  I could do it in 1 
transaction or multiple transactions of say 5,000 records each
* Insert all in 1 shot using a .sql file which has ".separator ," 
".import newestfile.csv"
* Insert all the records (using #1 or #2) into a temporary table and 
then an insert into ... select ... to copy it across.

In terms of processing, each request is handled within 0.25 - 1.5 
seconds (depends on workload) and there may be between 0 to 50 (approx) 
threads reading data from a dataset at any given time.  However, 
requests can come at any point in the 5 minutes, so I think maintaining 
at least 1 extra dataset is needed, maybe 2.

As each request finishes, it doesn't need the data any more.  The data 
is really transient - meaning it loses its value quite soon after it's 
been used.

Thanks for your help, guys!

Cheers,
Mohit.
6/9/2009 | 11:55 PM.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Working with a new DB every 5 minutes

2009-06-09 Thread John Elrick
Simon Slavin wrote:
> On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote:
>
>   
>> I'm having a problem that I'm trying to find an elegant solution  
>> to.  I
>> have a database that stores real-time information - this information  
>> is
>> replaced by new values every 5 minutes and has about 30,000 entries.
>> Once a new database is made available, I should start using that one.
>>
>> This database is used as part of an HTTP server that responds to
>> requests based on the data stored in the database.  So, I'm running  
>> into
>> a design issue trying to solve how to "switch" the database from the  
>> old
>> one to the new one.  With multiple incoming requests, I'm not sure  
>> how/
>> when to update the database.  Also, due to the large number of records
>> (30,000) that are to be put in every 5 minutes, I don't think I should
>> just keep adding to the database since it takes quite some time - it
>> feels better to let a separate process create the database and alert  
>> my
>> program that a new file is ready!
>> 
>
> You can do all this in one database without problems.  What you need  
> is to have a 'dataset' concept: every row in the data table belongs to  
> a particular dataset.  All you need is an integer.  And you make  
> another table in the same database that has one entry that just stores  
> which dataset is the current one.
>
> When a client needs to consult the table with the data in it, it first  
> reads the current dataset number (an integer) from the other table.   
> Then it reads just the rows from the data table that have the same  
> dataset number.  That's all.
>
> The application that updates the database keeps at least two datasets  
> in the data table: the current one, and one previous one in case  
> applications haven't finished reading it yet.  When new data comes in,  
> it adds a new dataset, updates the current dataset table, then deletes  
> an old dataset.
>
> SQLite is pretty good at recovering space from deleted records.  You  
> might want to read up on VACUUM, but you shouldn't even need that.
>   

I like your idea better than mine.  I would suggest some means of 
ensuring that some process isn't attempting to use a particular dataset, 
although five minutes may be adequate now, some future process may be 
introduced which takes longer.  It may be a "bad thing" for the data to 
suddenly vanish while it was working with it.  That was the underlying 
motivation for reference counting; although reference counting may not 
be the only way of accomplishing this goal.


John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Working with a new DB every 5 minutes

2009-06-08 Thread Simon Slavin

On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote:

> I'm having a problem that I'm trying to find an elegant solution  
> to.  I
> have a database that stores real-time information - this information  
> is
> replaced by new values every 5 minutes and has about 30,000 entries.
> Once a new database is made available, I should start using that one.
>
> This database is used as part of an HTTP server that responds to
> requests based on the data stored in the database.  So, I'm running  
> into
> a design issue trying to solve how to "switch" the database from the  
> old
> one to the new one.  With multiple incoming requests, I'm not sure  
> how/
> when to update the database.  Also, due to the large number of records
> (30,000) that are to be put in every 5 minutes, I don't think I should
> just keep adding to the database since it takes quite some time - it
> feels better to let a separate process create the database and alert  
> my
> program that a new file is ready!

You can do all this in one database without problems.  What you need  
is to have a 'dataset' concept: every row in the data table belongs to  
a particular dataset.  All you need is an integer.  And you make  
another table in the same database that has one entry that just stores  
which dataset is the current one.

When a client needs to consult the table with the data in it, it first  
reads the current dataset number (an integer) from the other table.   
Then it reads just the rows from the data table that have the same  
dataset number.  That's all.

The application that updates the database keeps at least two datasets  
in the data table: the current one, and one previous one in case  
applications haven't finished reading it yet.  When new data comes in,  
it adds a new dataset, updates the current dataset table, then deletes  
an old dataset.

SQLite is pretty good at recovering space from deleted records.  You  
might want to read up on VACUUM, but you shouldn't even need that.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Working with a new DB every 5 minutes

2009-06-08 Thread John Elrick
Mohit Sindhwani wrote:
> Hi Everyone,
>
> I'm having a problem that I'm trying to find an elegant solution to.  I 
> have a database that stores real-time information - this information is 
> replaced by new values every 5 minutes and has about 30,000 entries.  
> Once a new database is made available, I should start using that one.
>
> This database is used as part of an HTTP server that responds to 
> requests based on the data stored in the database.  So, I'm running into 
> a design issue trying to solve how to "switch" the database from the old 
> one to the new one.  With multiple incoming requests, I'm not sure how/ 
> when to update the database.  Also, due to the large number of records 
> (30,000) that are to be put in every 5 minutes, I don't think I should 
> just keep adding to the database since it takes quite some time - it 
> feels better to let a separate process create the database and alert my 
> program that a new file is ready!
>
> Any suggestions on how I should approach this problem?
>   

Since you are using Windows, maybe:

server_metadata.db

with a single table:

create table database_links (
  db_name varchar,
  current boolean,
  user_count
);


Your load process will create a new database with a unique name 
(suggestion: use a guid mangled to a valid filename).  Upon completion 
of loading, insert a single record in a transaction (pseudocode)

begin;
execute("select db_name from database_links where current = 'F'") do |name|
  deleteFile(name);

execute "delete from database_links where user_count = 0 and current = 'F';"
execute "update database_links set current = 'F';"
execute "insert into database_links values (:db_name, 'T', 0);"
commit;

As each client process connects to the database, it retrieves the 
current database name:

select db_name from database_links where current = 'T';

It then updates the count:

update database_links set user_count = user_count + 1 where db_name = ?

When the process is done, decrement the count:

update database_links set user_count = user_count - 1 where db_name = ?

You should get SQL_BUSY if the database is being changed, and your old 
database will still exist until the last process is done using it and 
the newer database has been written.

FWIW,


John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Working with a new DB every 5 minutes

2009-06-08 Thread Mohit Sindhwani
Pavel Ivanov wrote:
>> * Getting rid of the old files - I'm on Windows and would need to see
>> the equivalent way of your Unix suggestion.
>> 
>
> I don't think you have good options here. One option is to make
> continuous retries while remove() returns error. Another option I
> believe would be to change SQLite code to open databasses with
> FILE_SHARE_DELETE flag. In this case you will be able to work the same
> way as on Unix.
>   
Thanks again Pavel.  I'll read up more on FILE_SHARE_DELETE.  Either 
way, I could add a cron job to delete anything that's older than an 
hour, I guess.  It would require me to have a buffer space of less than 
50MB, I think - but hard disk space is cheaper.

Cheers,
Mohit.
6/9/2009 | 3:52 AM.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Working with a new DB every 5 minutes

2009-06-08 Thread Pavel Ivanov
> * Getting rid of the old files - I'm on Windows and would need to see
> the equivalent way of your Unix suggestion.

I don't think you have good options here. One option is to make
continuous retries while remove() returns error. Another option I
believe would be to change SQLite code to open databasses with
FILE_SHARE_DELETE flag. In this case you will be able to work the same
way as on Unix.

Pavel

On Mon, Jun 8, 2009 at 3:24 PM, Mohit Sindhwani wrote:
> Pavel Ivanov wrote:
>> I'd create another special file (maybe even database) that will keep
>> information about current file that your program should be working
>> with. So separate process will create new database and then update
>> this file. And program will just read this file and then work with
>> database mentioned in the file. If you work on Unix you even will not
>> have problems deleting old databases - after updating this special
>> file just unlink old database and after last file handle to it is
>> closed it will be gone...
>>
> Hi Pavel
>
> Thanks for your (blazing fast) reply.  I did think of something like
> this though I was thinking of something like a sqlite database file to
> store the recent file.  That said, I was concerned about a couple of things:
> * I guess there could be times when I would get database busy errors -
> but that can be dealt with using a minor delay if it happens.
> * Getting rid of the old files - I'm on Windows and would need to see
> the equivalent way of your Unix suggestion.
>
> But this does seem like an interesting way to proceed!  We have used
> SQLite for other things in this system and it really seems faster
> compared to an un-optimized instance of PostgreSQL running on the server.
>
> Cheers,
> Mohit.
> 6/9/2009 | 3:24 AM.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Working with a new DB every 5 minutes

2009-06-08 Thread Pavel Ivanov
I'd create another special file (maybe even database) that will keep
information about current file that your program should be working
with. So separate process will create new database and then update
this file. And program will just read this file and then work with
database mentioned in the file. If you work on Unix you even will not
have problems deleting old databases - after updating this special
file just unlink old database and after last file handle to it is
closed it will be gone...

Pavel

On Mon, Jun 8, 2009 at 3:07 PM, Mohit Sindhwani wrote:
> Hi Everyone,
>
> I'm having a problem that I'm trying to find an elegant solution to.  I
> have a database that stores real-time information - this information is
> replaced by new values every 5 minutes and has about 30,000 entries.
> Once a new database is made available, I should start using that one.
>
> This database is used as part of an HTTP server that responds to
> requests based on the data stored in the database.  So, I'm running into
> a design issue trying to solve how to "switch" the database from the old
> one to the new one.  With multiple incoming requests, I'm not sure how/
> when to update the database.  Also, due to the large number of records
> (30,000) that are to be put in every 5 minutes, I don't think I should
> just keep adding to the database since it takes quite some time - it
> feels better to let a separate process create the database and alert my
> program that a new file is ready!
>
> Any suggestions on how I should approach this problem?
>
> Cheers,
> Mohit.
> 6/9/2009 | 3:07 AM.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users