Re: [sqlite] Working with a new DB every 5 minutes
John Elrick wrote: > Simon Slavin wrote: > >> On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote: >> >> >> >>> I'm having a problem that I'm trying to find an elegant solution >>> to. I >>> have a database that stores real-time information - this information >>> is >>> replaced by new values every 5 minutes and has about 30,000 entries. >>> Once a new database is made available, I should start using that one. >>> >>> This database is used as part of an HTTP server that responds to >>> requests based on the data stored in the database. So, I'm running >>> into >>> a design issue trying to solve how to "switch" the database from the >>> old >>> one to the new one. With multiple incoming requests, I'm not sure >>> how/ >>> when to update the database. Also, due to the large number of records >>> (30,000) that are to be put in every 5 minutes, I don't think I should >>> just keep adding to the database since it takes quite some time - it >>> feels better to let a separate process create the database and alert >>> my >>> program that a new file is ready! >>> >>> >> You can do all this in one database without problems. What you need >> is to have a 'dataset' concept: every row in the data table belongs to >> a particular dataset. All you need is an integer. And you make >> another table in the same database that has one entry that just stores >> which dataset is the current one. >> >> When a client needs to consult the table with the data in it, it first >> reads the current dataset number (an integer) from the other table. >> Then it reads just the rows from the data table that have the same >> dataset number. That's all. >> >> The application that updates the database keeps at least two datasets >> in the data table: the current one, and one previous one in case >> applications haven't finished reading it yet. When new data comes in, >> it adds a new dataset, updates the current dataset table, then deletes >> an old dataset. >> >> SQLite is pretty good at recovering space from deleted records. You >> might want to read up on VACUUM, but you shouldn't even need that. >> >> > > I like your idea better than mine. I would suggest some means of > ensuring that some process isn't attempting to use a particular dataset, > although five minutes may be adequate now, some future process may be > introduced which takes longer. It may be a "bad thing" for the data to > suddenly vanish while it was working with it. That was the underlying > motivation for reference counting; although reference counting may not > be the only way of accomplishing this goal. > Thanks for the suggestion Simon and thanks for the analysis, John. Since I started this thread, I should point out "my" requirements. I like the idea of the dataset. I just don't know if inserting 30,000 inserts in a transaction will cause any concurrency issues. I'm going to prototype this (since SQLite + Ruby is easy to do this in quickly), but in your opinions, which of these would be fast and least hassle in terms of speed/ concurrency: * Insert 30,000 records into the 'dataset' directly. The table into which I insert will have an index on 1 field. I could do it in 1 transaction or multiple transactions of say 5,000 records each * Insert all in 1 shot using a .sql file which has ".separator ," ".import newestfile.csv" * Insert all the records (using #1 or #2) into a temporary table and then an insert into ... select ... to copy it across. In terms of processing, each request is handled within 0.25 - 1.5 seconds (depends on workload) and there may be between 0 to 50 (approx) threads reading data from a dataset at any given time. However, requests can come at any point in the 5 minutes, so I think maintaining at least 1 extra dataset is needed, maybe 2. As each request finishes, it doesn't need the data any more. The data is really transient - meaning it loses its value quite soon after it's been used. Thanks for your help, guys! Cheers, Mohit. 6/9/2009 | 11:55 PM. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Working with a new DB every 5 minutes
Simon Slavin wrote: > On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote: > > >> I'm having a problem that I'm trying to find an elegant solution >> to. I >> have a database that stores real-time information - this information >> is >> replaced by new values every 5 minutes and has about 30,000 entries. >> Once a new database is made available, I should start using that one. >> >> This database is used as part of an HTTP server that responds to >> requests based on the data stored in the database. So, I'm running >> into >> a design issue trying to solve how to "switch" the database from the >> old >> one to the new one. With multiple incoming requests, I'm not sure >> how/ >> when to update the database. Also, due to the large number of records >> (30,000) that are to be put in every 5 minutes, I don't think I should >> just keep adding to the database since it takes quite some time - it >> feels better to let a separate process create the database and alert >> my >> program that a new file is ready! >> > > You can do all this in one database without problems. What you need > is to have a 'dataset' concept: every row in the data table belongs to > a particular dataset. All you need is an integer. And you make > another table in the same database that has one entry that just stores > which dataset is the current one. > > When a client needs to consult the table with the data in it, it first > reads the current dataset number (an integer) from the other table. > Then it reads just the rows from the data table that have the same > dataset number. That's all. > > The application that updates the database keeps at least two datasets > in the data table: the current one, and one previous one in case > applications haven't finished reading it yet. When new data comes in, > it adds a new dataset, updates the current dataset table, then deletes > an old dataset. > > SQLite is pretty good at recovering space from deleted records. You > might want to read up on VACUUM, but you shouldn't even need that. > I like your idea better than mine. I would suggest some means of ensuring that some process isn't attempting to use a particular dataset, although five minutes may be adequate now, some future process may be introduced which takes longer. It may be a "bad thing" for the data to suddenly vanish while it was working with it. That was the underlying motivation for reference counting; although reference counting may not be the only way of accomplishing this goal. John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Working with a new DB every 5 minutes
On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote: > I'm having a problem that I'm trying to find an elegant solution > to. I > have a database that stores real-time information - this information > is > replaced by new values every 5 minutes and has about 30,000 entries. > Once a new database is made available, I should start using that one. > > This database is used as part of an HTTP server that responds to > requests based on the data stored in the database. So, I'm running > into > a design issue trying to solve how to "switch" the database from the > old > one to the new one. With multiple incoming requests, I'm not sure > how/ > when to update the database. Also, due to the large number of records > (30,000) that are to be put in every 5 minutes, I don't think I should > just keep adding to the database since it takes quite some time - it > feels better to let a separate process create the database and alert > my > program that a new file is ready! You can do all this in one database without problems. What you need is to have a 'dataset' concept: every row in the data table belongs to a particular dataset. All you need is an integer. And you make another table in the same database that has one entry that just stores which dataset is the current one. When a client needs to consult the table with the data in it, it first reads the current dataset number (an integer) from the other table. Then it reads just the rows from the data table that have the same dataset number. That's all. The application that updates the database keeps at least two datasets in the data table: the current one, and one previous one in case applications haven't finished reading it yet. When new data comes in, it adds a new dataset, updates the current dataset table, then deletes an old dataset. SQLite is pretty good at recovering space from deleted records. You might want to read up on VACUUM, but you shouldn't even need that. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Working with a new DB every 5 minutes
Mohit Sindhwani wrote: > Hi Everyone, > > I'm having a problem that I'm trying to find an elegant solution to. I > have a database that stores real-time information - this information is > replaced by new values every 5 minutes and has about 30,000 entries. > Once a new database is made available, I should start using that one. > > This database is used as part of an HTTP server that responds to > requests based on the data stored in the database. So, I'm running into > a design issue trying to solve how to "switch" the database from the old > one to the new one. With multiple incoming requests, I'm not sure how/ > when to update the database. Also, due to the large number of records > (30,000) that are to be put in every 5 minutes, I don't think I should > just keep adding to the database since it takes quite some time - it > feels better to let a separate process create the database and alert my > program that a new file is ready! > > Any suggestions on how I should approach this problem? > Since you are using Windows, maybe: server_metadata.db with a single table: create table database_links ( db_name varchar, current boolean, user_count ); Your load process will create a new database with a unique name (suggestion: use a guid mangled to a valid filename). Upon completion of loading, insert a single record in a transaction (pseudocode) begin; execute("select db_name from database_links where current = 'F'") do |name| deleteFile(name); execute "delete from database_links where user_count = 0 and current = 'F';" execute "update database_links set current = 'F';" execute "insert into database_links values (:db_name, 'T', 0);" commit; As each client process connects to the database, it retrieves the current database name: select db_name from database_links where current = 'T'; It then updates the count: update database_links set user_count = user_count + 1 where db_name = ? When the process is done, decrement the count: update database_links set user_count = user_count - 1 where db_name = ? You should get SQL_BUSY if the database is being changed, and your old database will still exist until the last process is done using it and the newer database has been written. FWIW, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Working with a new DB every 5 minutes
Pavel Ivanov wrote: >> * Getting rid of the old files - I'm on Windows and would need to see >> the equivalent way of your Unix suggestion. >> > > I don't think you have good options here. One option is to make > continuous retries while remove() returns error. Another option I > believe would be to change SQLite code to open databasses with > FILE_SHARE_DELETE flag. In this case you will be able to work the same > way as on Unix. > Thanks again Pavel. I'll read up more on FILE_SHARE_DELETE. Either way, I could add a cron job to delete anything that's older than an hour, I guess. It would require me to have a buffer space of less than 50MB, I think - but hard disk space is cheaper. Cheers, Mohit. 6/9/2009 | 3:52 AM. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Working with a new DB every 5 minutes
> * Getting rid of the old files - I'm on Windows and would need to see > the equivalent way of your Unix suggestion. I don't think you have good options here. One option is to make continuous retries while remove() returns error. Another option I believe would be to change SQLite code to open databasses with FILE_SHARE_DELETE flag. In this case you will be able to work the same way as on Unix. Pavel On Mon, Jun 8, 2009 at 3:24 PM, Mohit Sindhwaniwrote: > Pavel Ivanov wrote: >> I'd create another special file (maybe even database) that will keep >> information about current file that your program should be working >> with. So separate process will create new database and then update >> this file. And program will just read this file and then work with >> database mentioned in the file. If you work on Unix you even will not >> have problems deleting old databases - after updating this special >> file just unlink old database and after last file handle to it is >> closed it will be gone... >> > Hi Pavel > > Thanks for your (blazing fast) reply. I did think of something like > this though I was thinking of something like a sqlite database file to > store the recent file. That said, I was concerned about a couple of things: > * I guess there could be times when I would get database busy errors - > but that can be dealt with using a minor delay if it happens. > * Getting rid of the old files - I'm on Windows and would need to see > the equivalent way of your Unix suggestion. > > But this does seem like an interesting way to proceed! We have used > SQLite for other things in this system and it really seems faster > compared to an un-optimized instance of PostgreSQL running on the server. > > Cheers, > Mohit. > 6/9/2009 | 3:24 AM. > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Working with a new DB every 5 minutes
I'd create another special file (maybe even database) that will keep information about current file that your program should be working with. So separate process will create new database and then update this file. And program will just read this file and then work with database mentioned in the file. If you work on Unix you even will not have problems deleting old databases - after updating this special file just unlink old database and after last file handle to it is closed it will be gone... Pavel On Mon, Jun 8, 2009 at 3:07 PM, Mohit Sindhwaniwrote: > Hi Everyone, > > I'm having a problem that I'm trying to find an elegant solution to. I > have a database that stores real-time information - this information is > replaced by new values every 5 minutes and has about 30,000 entries. > Once a new database is made available, I should start using that one. > > This database is used as part of an HTTP server that responds to > requests based on the data stored in the database. So, I'm running into > a design issue trying to solve how to "switch" the database from the old > one to the new one. With multiple incoming requests, I'm not sure how/ > when to update the database. Also, due to the large number of records > (30,000) that are to be put in every 5 minutes, I don't think I should > just keep adding to the database since it takes quite some time - it > feels better to let a separate process create the database and alert my > program that a new file is ready! > > Any suggestions on how I should approach this problem? > > Cheers, > Mohit. > 6/9/2009 | 3:07 AM. > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users