Re: [sqlite] suggestion on the database design
John Machin wrote: > > On 21/08/2009 1:29 PM, pierr wrote: >> >> Simon Slavin-2 wrote: >>> >>> On 21 Aug 2009, at 3:26am, pierr wrote: >>> I did not know the sequence in defining the field matters. This is what I should have done. >>> Sorry, I should have explained better. You were right: there is no >>> difference. I was just rearranging the fields in the classic way: >>> with the primary key column as the first column. It helps me think >>> about how the database works. You did nothing wrong. >>> >> Hi Simon, >> It do make a difference. >> With this schema, >> CREATE TABLE IF NOT EXISTS tblIndex( >> frame_type INTEGER, >> pts VARCHAR(5) >> ts_start INTEGER PRIMARY KEY, >> ts_end INTEGER, >> ) >> There will be a rowid field in the database ; and there is a >> sqlite_autoindex_tblIndex_1 build on ts_start automatically ; so >> 1,800,000 >> records (16bytes each) takes 62M . > > Please don't type from memory -- what you have above has TWO syntax > errors, and doesn't (with those random comma placements fixed) produce > the result that you say -- and do read my previous message. > > Here is an (annotated) copy/paste of an actual session: > > sqlite> CREATE TABLE IF NOT EXISTS tblIndex( > ...> frame_type INTEGER, > ...> pts VARCHAR(5)<<<=== missing comma > ...> ts_start INTEGER PRIMARY KEY, > ...> ts_end INTEGER, > ...> ) > ...> ; > SQL error: near "ts_start": syntax error > sqlite> CREATE TABLE IF NOT EXISTS tblIndex( > ...> frame_type INTEGER, > ...> pts VARCHAR(5), > ...> ts_start INTEGER PRIMARY KEY, > ...> ts_end INTEGER, <<<=== superflous comma > ...> ); > SQL error: near ")": syntax error > sqlite> CREATE TABLE IF NOT EXISTS tblIndex( > ...> frame_type INTEGER, > ...> pts VARCHAR(5), > ...> ts_start INTEGER PRIMARY KEY, > ...> ts_end INTEGER > ...> ); > sqlite> select * from sqlite_master; > table|tblIndex|tblIndex|2|CREATE TABLE tblIndex( > frame_type INTEGER, > pts VARCHAR(5), > ts_start INTEGER PRIMARY KEY, > ts_end INTEGER > ) <<<=== no index !! > sqlite> drop table tblIndex; > sqlite> CREATE TABLE IF NOT EXISTS tblIndex( > ...> frame_type INTEGER, > ...> pts VARCHAR(5), > ...> ts_start INT PRIMARY KEY, <<<=== using your original INT > instead of Simon's INTEGER > ...> ts_end INTEGER > ...> ); > sqlite> select * from sqlite_master; > table|tblIndex|tblIndex|2|CREATE TABLE tblIndex( > frame_type INTEGER, > pts VARCHAR(5), > ts_start INT PRIMARY KEY, > ts_end INTEGER > ) > index|sqlite_autoindex_tblIndex_1|tblIndex|3| <<<=== index!! > sqlite> > > After reading your previous message , now I understand the difference between "INT PRIMARY KEY" and "INTEGER PRIMARY KEY" which I did not notice before. And sorry for "typing from memory":) -- View this message in context: http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25074149.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
On 21/08/2009 1:29 PM, pierr wrote: > > Simon Slavin-2 wrote: >> >> On 21 Aug 2009, at 3:26am, pierr wrote: >> >>> I did not know the sequence in defining the field matters. This is >>> what I should have done. >> Sorry, I should have explained better. You were right: there is no >> difference. I was just rearranging the fields in the classic way: >> with the primary key column as the first column. It helps me think >> about how the database works. You did nothing wrong. >> > Hi Simon, > It do make a difference. > With this schema, > CREATE TABLE IF NOT EXISTS tblIndex( > frame_type INTEGER, > pts VARCHAR(5) > ts_start INTEGER PRIMARY KEY, > ts_end INTEGER, > ) > There will be a rowid field in the database ; and there is a > sqlite_autoindex_tblIndex_1 build on ts_start automatically ; so 1,800,000 > records (16bytes each) takes 62M . Please don't type from memory -- what you have above has TWO syntax errors, and doesn't (with those random comma placements fixed) produce the result that you say -- and do read my previous message. Here is an (annotated) copy/paste of an actual session: sqlite> CREATE TABLE IF NOT EXISTS tblIndex( ...> frame_type INTEGER, ...> pts VARCHAR(5)<<<=== missing comma ...> ts_start INTEGER PRIMARY KEY, ...> ts_end INTEGER, ...> ) ...> ; SQL error: near "ts_start": syntax error sqlite> CREATE TABLE IF NOT EXISTS tblIndex( ...> frame_type INTEGER, ...> pts VARCHAR(5), ...> ts_start INTEGER PRIMARY KEY, ...> ts_end INTEGER, <<<=== superflous comma ...> ); SQL error: near ")": syntax error sqlite> CREATE TABLE IF NOT EXISTS tblIndex( ...> frame_type INTEGER, ...> pts VARCHAR(5), ...> ts_start INTEGER PRIMARY KEY, ...> ts_end INTEGER ...> ); sqlite> select * from sqlite_master; table|tblIndex|tblIndex|2|CREATE TABLE tblIndex( frame_type INTEGER, pts VARCHAR(5), ts_start INTEGER PRIMARY KEY, ts_end INTEGER ) <<<=== no index !! sqlite> drop table tblIndex; sqlite> CREATE TABLE IF NOT EXISTS tblIndex( ...> frame_type INTEGER, ...> pts VARCHAR(5), ...> ts_start INT PRIMARY KEY, <<<=== using your original INT instead of Simon's INTEGER ...> ts_end INTEGER ...> ); sqlite> select * from sqlite_master; table|tblIndex|tblIndex|2|CREATE TABLE tblIndex( frame_type INTEGER, pts VARCHAR(5), ts_start INT PRIMARY KEY, ts_end INTEGER ) index|sqlite_autoindex_tblIndex_1|tblIndex|3| <<<=== index!! sqlite> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
Simon Slavin-2 wrote: > > > On 21 Aug 2009, at 3:26am, pierr wrote: > >> I did not know the sequence in defining the field matters. This is >> what I should have done. > > Sorry, I should have explained better. You were right: there is no > difference. I was just rearranging the fields in the classic way: > with the primary key column as the first column. It helps me think > about how the database works. You did nothing wrong. > Hi Simon, It do make a difference. With this schema, CREATE TABLE IF NOT EXISTS tblIndex( frame_type INTEGER, pts VARCHAR(5) ts_start INTEGER PRIMARY KEY, ts_end INTEGER, ) There will be a rowid field in the database ; and there is a sqlite_autoindex_tblIndex_1 build on ts_start automatically ; so 1,800,000 records (16bytes each) takes 62M . With your schema , which is much better CREATE TABLE IF NOT EXISTS tblIndex( ts_start INTEGER PRIMARY KEY, ts_end INTEGER, frame_type INTEGER, pts VARCHAR(5) ) There will be NO rowid field in the database ;and 1,800,000 records (16bytes each) takes only 35M. >>> Whichever one of these is at fault, a delete command selecting on an >>> indexed column and deleting 9 records from a five column table >>> should not take 17 seconds. >> >> I am sorry, I should have mentioned It (17 seconds to delete 9) >> was >> tested on a 350M MIPS CPU. And after changing to the schema you >> suggested, >> it still take 17 seconds to delete 9 records. >> On my 1.8G Hz Ubuntu desktop it tooks 800ms. So is this delete >> performance >> is the limit we can achieve? Any other options I can improve this? > > >> BTW: I used following option to build the libarary. Is there any >> thing I >> can expore here? (-O2 and -Os seem has no big difference on >> performance.) >> mips24k-linux-gcc -Os -fPIC -c *.c >> mips24k-linux-gcc -shared -o mips_libsqlite3.so.1 sqlite3.o > > > Putting these together, your 350 MIPS CPU machine is a MIPS 24K > machine. There are various things to consider: not only CPU speed but > also memory bandwidth, memory speed, hard disk throughput, and other > things I have no idea about. And I have no idea what part threading > would pay on such a platform. Sorry but I have no experience with > these. Perhaps someone else here does. > > Simon. > Thanks for your insight,Simon. -- View this message in context: http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25073602.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
On 21/08/2009 12:59 PM, Simon Slavin wrote: > On 21 Aug 2009, at 3:26am, pierr wrote: > >> I did not know the sequence in defining the field matters. This is >> what I should have done. > > Sorry, I should have explained better. You were right: there is no > difference. I was just rearranging the fields in the classic way: > with the primary key column as the first column. It helps me think > about how the database works. You did nothing wrong. Pierr had "INT primary key". Simon suggested "INTEGER primary key" which is *better* -- it means that the PK is also the rowid (saves space in the table) and you don't need a separate index for the PK. See http://sqlite.org/lang_createtable.html#rowid Perhaps Pierr's problem is related to the "usb interface" on the hard drive? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
On 21 Aug 2009, at 3:26am, pierr wrote: > I did not know the sequence in defining the field matters. This is > what I should have done. Sorry, I should have explained better. You were right: there is no difference. I was just rearranging the fields in the classic way: with the primary key column as the first column. It helps me think about how the database works. You did nothing wrong. >> Whichever one of these is at fault, a delete command selecting on an >> indexed column and deleting 9 records from a five column table >> should not take 17 seconds. > > I am sorry, I should have mentioned It (17 seconds to delete 9) > was > tested on a 350M MIPS CPU. And after changing to the schema you > suggested, > it still take 17 seconds to delete 9 records. > On my 1.8G Hz Ubuntu desktop it tooks 800ms. So is this delete > performance > is the limit we can achieve? Any other options I can improve this? Ah. Okay, less than 1 second is reasonable for a standard computer. If your code performs at this speed then there's no evidence of any problem with your code. > BTW: I used following option to build the libarary. Is there any > thing I > can expore here? (-O2 and -Os seem has no big difference on > performance.) > mips24k-linux-gcc -Os -fPIC -c *.c > mips24k-linux-gcc -shared -o mips_libsqlite3.so.1 sqlite3.o Putting these together, your 350 MIPS CPU machine is a MIPS 24K machine. There are various things to consider: not only CPU speed but also memory bandwidth, memory speed, hard disk throughput, and other things I have no idea about. And I have no idea what part threading would pay on such a platform. Sorry but I have no experience with these. Perhaps someone else here does. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
Simon Slavin-2 wrote: > > If ts_start is an always increasing integer (i.e. no two records have > the same value) then I would see it as your primary key and as your > row id. In other words, your structure is more like > > CREATE TABLE IF NOT EXISTS tblIndex( > ts_start INTEGER PRIMARY KEY, > ts_end INTEGER, > frame_type INTEGER, > pts VARCHAR(5)) > > Thanks. I did not know the sequence in defining the field matters. This is what I should have done. Simon Slavin-2 wrote: > > inserting 50 records a second shouldn't be a problem. If they're > generated in batches rather than individually, use BEGIN ... COMMIT to > speed up the insertion dramatically but it should be fast enough > either way. > > What interface or API to SQLite are you using ? Or are you writing C > code and using the native library commands ? 17 seconds is far longer > than expected if you did everything as you described above. It should > be far faster even without multiple threads/connections. > > To test it, get your database ready to do one of the DELETE FROM > commands, and have your program quit or abort. Then open the database > in the sqlite3 command-line application, and issue the exact DELETE > FROM command your application would execute. If the command-line > program takes 17 seconds then the problem is in your data somewhere. > If it doesn't, the problem is in your own application or in your use > of the API you're using. > >> Any way to reduce this ? We don't care if the records is synced to >> the hard >> disk immediately. We are thinking start a seperated thread to do the >> delete >> so to make this call to be async. The things I am not sure is >> whether the >> (long time )delete will impact the insert performance if they share >> the same >> connection? Or should I use a seperated connection for insert and >> delete? >> But In this way, do they need be synced in application level? > > Whichever one of these is at fault, a delete command selecting on an > indexed column and deleting 9 records from a five column table > should not take 17 seconds. > > I am sorry, I should have mentioned It (17 seconds to delete 9) was tested on a 350M MIPS CPU. And after changing to the schema you suggested, it still take 17 seconds to delete 9 records. On my 1.8G Hz Ubuntu desktop it tooks 800ms. So is this delete performance is the limit we can achieve? Any other options I can improve this? BTW: I used following option to build the libarary. Is there any thing I can expore here? (-O2 and -Os seem has no big difference on performance.) mips24k-linux-gcc -Os -fPIC -c *.c mips24k-linux-gcc -shared -o mips_libsqlite3.so.1 sqlite3.o -- View this message in context: http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25073115.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
On 20 Aug 2009, at 2:33pm, pierr wrote: > We have a database with a very simple schema: >CREATE TABLE IF NOT EXISTS tblIndex( > frame_type INT, > pts VARCHAR(5), > ts_start INT primary key, > ts_end INT) > > And the application scenario is : > 1. Every second usr will insert 2 ~ 50 records ,and the ts_start > fields of > those records is always increasing. If ts_start is an always increasing integer (i.e. no two records have the same value) then I would see it as your primary key and as your row id. In other words, your structure is more like CREATE TABLE IF NOT EXISTS tblIndex( ts_start INTEGER PRIMARY KEY, ts_end INTEGER, frame_type INTEGER, ptsVARCHAR(5)) inserting 50 records a second shouldn't be a problem. If they're generated in batches rather than individually, use BEGIN ... COMMIT to speed up the insertion dramatically but it should be fast enough either way. > After 8 hours ,there are at most > 1_800_000 records. By setting the sync mode to off, the performance > seems OK > so far.And because the data of each record has only 16 bytes, we may > use > some buffer even if the insert speed is not too fast. Won't do any harm, but might not be worth doing either. Test it with minimal programming first. > 2. After 8 hours , usr will told me to delete the oldest data by > telling the > upper bound ts_start , so I will do DELETE FROM tblIndex WHERE > ts_start < > upper_bound_ts_start. Delete 90_000 (which is the records for half a > hour) > out of the 1_800_000 now take 17 seconds. A litter bit longer than > expected. What interface or API to SQLite are you using ? Or are you writing C code and using the native library commands ? 17 seconds is far longer than expected if you did everything as you described above. It should be far faster even without multiple threads/connections. To test it, get your database ready to do one of the DELETE FROM commands, and have your program quit or abort. Then open the database in the sqlite3 command-line application, and issue the exact DELETE FROM command your application would execute. If the command-line program takes 17 seconds then the problem is in your data somewhere. If it doesn't, the problem is in your own application or in your use of the API you're using. > Any way to reduce this ? We don't care if the records is synced to > the hard > disk immediately. We are thinking start a seperated thread to do the > delete > so to make this call to be async. The things I am not sure is > whether the > (long time )delete will impact the insert performance if they share > the same > connection? Or should I use a seperated connection for insert and > delete? > But In this way, do they need be synced in application level? Whichever one of these is at fault, a delete command selecting on an indexed column and deleting 9 records from a five column table should not take 17 seconds. > 3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ? > AND ? " - > As ts_start is the primary key so the performance is OK for our need > now. I > am thinking i should use a seperated connection for search , right? Your data requirements are not huge. The system should be fast enough no matter whether you use one connection or many. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] suggestion on the database design
On Thu, Aug 20, 2009 at 8:33 AM, pierrwrote: > > Hi, > We have a database with a very simple schema: > CREATE TABLE IF NOT EXISTS tblIndex( > frame_type INT, > pts VARCHAR(5), > ts_start INT primary key, > ts_end INT) > > And the application scenario is : > 1. Every second usr will insert 2 ~ 50 records ,and the ts_start fields of > those records is always increasing. After 8 hours ,there are at most > 1_800_000 records. By setting the sync mode to off, the performance seems OK > so far.And because the data of each record has only 16 bytes, we may use > some buffer even if the insert speed is not too fast. > > 2. After 8 hours , usr will told me to delete the oldest data by telling the > upper bound ts_start , so I will do DELETE FROM tblIndex WHERE ts_start < > upper_bound_ts_start. Delete 90_000 (which is the records for half a hour) > out of the 1_800_000 now take 17 seconds. A litter bit longer than expected. > Any way to reduce this ? With a simple schema as below (no indexes other than the stock INTEGER PRIMARY KEY on ts_start CREATE TABLE tblIndex ( ts_start INTEGER PRIMARY KEY, frame_type INTEGER, pts VARCHAR(5), ts_end INTEGER ) and using Perl DBD::SQLite on my Macbook, I get the following times 65 seconds to insert 1,800,000 records with a commit every 100,000 1 second to delete the first 90,000 records > We don't care if the records is synced to the hard > disk immediately. We are thinking start a seperated thread to do the delete > so to make this call to be async. The things I am not sure is whether the > (long time )delete will impact the insert performance if they share the same > connection? Or should I use a seperated connection for insert and delete? > But In this way, do they need be synced in application level? > > 3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ? AND ? " - > As ts_start is the primary key so the performance is OK for our need now. I > am thinking i should use a seperated connection for search , right? > > Configuration of sqlite : > hard disk database (usb interface) > cache size is 2000 > page size is 1024 > sync mode is 0 (off) > journal_mode mode is truncate > > > Thanks for any suggestion to improve the delete performance or about the > overall design. > -- > View this message in context: > http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25062134.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu --- Assertions are politics; backing up assertions with evidence is science === ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] suggestion on the database design
Hi, We have a database with a very simple schema: CREATE TABLE IF NOT EXISTS tblIndex( frame_type INT, pts VARCHAR(5), ts_start INT primary key, ts_end INT) And the application scenario is : 1. Every second usr will insert 2 ~ 50 records ,and the ts_start fields of those records is always increasing. After 8 hours ,there are at most 1_800_000 records. By setting the sync mode to off, the performance seems OK so far.And because the data of each record has only 16 bytes, we may use some buffer even if the insert speed is not too fast. 2. After 8 hours , usr will told me to delete the oldest data by telling the upper bound ts_start , so I will do DELETE FROM tblIndex WHERE ts_start < upper_bound_ts_start. Delete 90_000 (which is the records for half a hour) out of the 1_800_000 now take 17 seconds. A litter bit longer than expected. Any way to reduce this ? We don't care if the records is synced to the hard disk immediately. We are thinking start a seperated thread to do the delete so to make this call to be async. The things I am not sure is whether the (long time )delete will impact the insert performance if they share the same connection? Or should I use a seperated connection for insert and delete? But In this way, do they need be synced in application level? 3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ? AND ? " - As ts_start is the primary key so the performance is OK for our need now. I am thinking i should use a seperated connection for search , right? Configuration of sqlite : hard disk database (usb interface) cache size is 2000 page size is 1024 sync mode is 0 (off) journal_mode mode is truncate Thanks for any suggestion to improve the delete performance or about the overall design. -- View this message in context: http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25062134.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users