Re: [sqlite] suggestion on the database design

2009-08-20 Thread pierr


John Machin wrote:
> 
> On 21/08/2009 1:29 PM, pierr wrote:
>> 
>> Simon Slavin-2 wrote:
>>>
>>> On 21 Aug 2009, at 3:26am, pierr wrote:
>>>
 I did not know the sequence in defining the field matters. This is
 what I should have done.
>>> Sorry, I should have explained better.  You were right: there is no  
>>> difference.  I was just rearranging the fields in the classic way:  
>>> with the primary key column as the first column.  It helps me think  
>>> about how the database works.  You did nothing wrong.
>>>
>> Hi Simon,
>>   It do make a difference. 
>>   With this schema, 
>>   CREATE TABLE IF NOT EXISTS tblIndex(
>>  frame_type INTEGER,
>>  pts VARCHAR(5)
>>  ts_start INTEGER PRIMARY KEY,
>>  ts_end INTEGER,
>>  ) 
>>  There will be a rowid field in the database ; and there is a
>> sqlite_autoindex_tblIndex_1 build on ts_start automatically ; so
>> 1,800,000
>> records (16bytes each) takes 62M . 
> 
> Please don't type from memory -- what you have above has TWO syntax 
> errors, and doesn't (with those random comma placements fixed) produce 
> the result that you say -- and do read my previous message.
> 
> Here is an (annotated) copy/paste of an actual session:
> 
> sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
> ...>  frame_type INTEGER,
> ...>  pts VARCHAR(5)<<<=== missing comma
> ...>  ts_start INTEGER PRIMARY KEY,
> ...>  ts_end INTEGER,
> ...>  )
> ...> ;
> SQL error: near "ts_start": syntax error
> sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
> ...>  frame_type INTEGER,
> ...>  pts VARCHAR(5),
> ...>  ts_start INTEGER PRIMARY KEY,
> ...>  ts_end INTEGER, <<<=== superflous comma
> ...>  );
> SQL error: near ")": syntax error
> sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
> ...>  frame_type INTEGER,
> ...>  pts VARCHAR(5),
> ...>  ts_start INTEGER PRIMARY KEY,
> ...>  ts_end INTEGER
> ...>  );
> sqlite> select * from sqlite_master;
> table|tblIndex|tblIndex|2|CREATE TABLE tblIndex(
>   frame_type INTEGER,
>   pts VARCHAR(5),
>   ts_start INTEGER PRIMARY KEY,
>   ts_end INTEGER
>   ) <<<=== no index !!
> sqlite> drop table tblIndex;
> sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
> ...>  frame_type INTEGER,
> ...>  pts VARCHAR(5),
> ...>  ts_start INT PRIMARY KEY,  <<<=== using your original INT 
> instead of Simon's INTEGER
> ...>  ts_end INTEGER
> ...>  );
> sqlite> select * from sqlite_master;
> table|tblIndex|tblIndex|2|CREATE TABLE tblIndex(
>   frame_type INTEGER,
>   pts VARCHAR(5),
>   ts_start INT PRIMARY KEY,
>   ts_end INTEGER
>   )
> index|sqlite_autoindex_tblIndex_1|tblIndex|3| <<<=== index!!
> sqlite>
> 
> 
After reading your previous message , now I understand the difference
between "INT PRIMARY KEY" and "INTEGER PRIMARY KEY" which I did not notice
before. And sorry for "typing from memory":)
 

-- 
View this message in context: 
http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25074149.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread John Machin
On 21/08/2009 1:29 PM, pierr wrote:
> 
> Simon Slavin-2 wrote:
>>
>> On 21 Aug 2009, at 3:26am, pierr wrote:
>>
>>> I did not know the sequence in defining the field matters. This is
>>> what I should have done.
>> Sorry, I should have explained better.  You were right: there is no  
>> difference.  I was just rearranging the fields in the classic way:  
>> with the primary key column as the first column.  It helps me think  
>> about how the database works.  You did nothing wrong.
>>
> Hi Simon,
>   It do make a difference. 
>   With this schema, 
>   CREATE TABLE IF NOT EXISTS tblIndex(
>  frame_type INTEGER,
>  pts VARCHAR(5)
>  ts_start INTEGER PRIMARY KEY,
>  ts_end INTEGER,
>  ) 
>  There will be a rowid field in the database ; and there is a
> sqlite_autoindex_tblIndex_1 build on ts_start automatically ; so 1,800,000
> records (16bytes each) takes 62M . 

Please don't type from memory -- what you have above has TWO syntax 
errors, and doesn't (with those random comma placements fixed) produce 
the result that you say -- and do read my previous message.

Here is an (annotated) copy/paste of an actual session:

sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
...>  frame_type INTEGER,
...>  pts VARCHAR(5)<<<=== missing comma
...>  ts_start INTEGER PRIMARY KEY,
...>  ts_end INTEGER,
...>  )
...> ;
SQL error: near "ts_start": syntax error
sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
...>  frame_type INTEGER,
...>  pts VARCHAR(5),
...>  ts_start INTEGER PRIMARY KEY,
...>  ts_end INTEGER, <<<=== superflous comma
...>  );
SQL error: near ")": syntax error
sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
...>  frame_type INTEGER,
...>  pts VARCHAR(5),
...>  ts_start INTEGER PRIMARY KEY,
...>  ts_end INTEGER
...>  );
sqlite> select * from sqlite_master;
table|tblIndex|tblIndex|2|CREATE TABLE tblIndex(
  frame_type INTEGER,
  pts VARCHAR(5),
  ts_start INTEGER PRIMARY KEY,
  ts_end INTEGER
  ) <<<=== no index !!
sqlite> drop table tblIndex;
sqlite>   CREATE TABLE IF NOT EXISTS tblIndex(
...>  frame_type INTEGER,
...>  pts VARCHAR(5),
...>  ts_start INT PRIMARY KEY,  <<<=== using your original INT 
instead of Simon's INTEGER
...>  ts_end INTEGER
...>  );
sqlite> select * from sqlite_master;
table|tblIndex|tblIndex|2|CREATE TABLE tblIndex(
  frame_type INTEGER,
  pts VARCHAR(5),
  ts_start INT PRIMARY KEY,
  ts_end INTEGER
  )
index|sqlite_autoindex_tblIndex_1|tblIndex|3| <<<=== index!!
sqlite>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread pierr


Simon Slavin-2 wrote:
> 
> 
> On 21 Aug 2009, at 3:26am, pierr wrote:
> 
>> I did not know the sequence in defining the field matters. This is
>> what I should have done.
> 
> Sorry, I should have explained better.  You were right: there is no  
> difference.  I was just rearranging the fields in the classic way:  
> with the primary key column as the first column.  It helps me think  
> about how the database works.  You did nothing wrong.
> 
Hi Simon,
  It do make a difference. 
  With this schema, 
  CREATE TABLE IF NOT EXISTS tblIndex(
 frame_type INTEGER,
 pts VARCHAR(5)
 ts_start INTEGER PRIMARY KEY,
 ts_end INTEGER,
 ) 
 There will be a rowid field in the database ; and there is a
sqlite_autoindex_tblIndex_1 build on ts_start automatically ; so 1,800,000
records (16bytes each) takes 62M . 

  With your schema , which is much better
  CREATE TABLE IF NOT EXISTS tblIndex(
 ts_start INTEGER PRIMARY KEY,
 ts_end INTEGER,
 frame_type INTEGER,
 pts VARCHAR(5)
 ) 
 There will be NO rowid field in the database ;and 1,800,000 records
(16bytes each) takes only 35M. 




>>> Whichever one of these is at fault, a delete command selecting on an
>>> indexed column and deleting 9 records from a five column table
>>> should not take 17 seconds.
>>
>> I am sorry, I should have mentioned It (17 seconds to delete 9)  
>> was
>> tested on a 350M MIPS CPU. And after changing to the schema you  
>> suggested,
>> it still take 17 seconds to delete 9 records.
>> On my 1.8G Hz Ubuntu desktop it tooks 800ms. So is this delete  
>> performance
>> is the limit we can achieve? Any other options I can improve this?
> 
> 
>> BTW:  I used following option to build the libarary. Is there any  
>> thing I
>> can expore here? (-O2 and -Os seem has no big difference on  
>> performance.)
>> mips24k-linux-gcc -Os -fPIC -c *.c
>> mips24k-linux-gcc -shared  -o mips_libsqlite3.so.1 sqlite3.o
> 
> 
> Putting these together, your 350 MIPS CPU machine is a MIPS 24K  
> machine.  There are various things to consider: not only CPU speed but  
> also memory bandwidth, memory speed, hard disk throughput, and other  
> things I have no idea about.  And I have no idea what part threading  
> would pay on such a platform.  Sorry but I have no experience with  
> these.  Perhaps someone else here does.
> 
> Simon.
> 
Thanks for your insight,Simon.

-- 
View this message in context: 
http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25073602.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread John Machin
On 21/08/2009 12:59 PM, Simon Slavin wrote:
> On 21 Aug 2009, at 3:26am, pierr wrote:
> 
>> I did not know the sequence in defining the field matters. This is
>> what I should have done.
> 
> Sorry, I should have explained better.  You were right: there is no  
> difference.  I was just rearranging the fields in the classic way:  
> with the primary key column as the first column.  It helps me think  
> about how the database works.  You did nothing wrong.

Pierr had "INT primary key". Simon suggested "INTEGER primary key" which 
is *better* -- it means that the PK is also the rowid (saves space in 
the table) and you don't need a separate index for the PK. See 
http://sqlite.org/lang_createtable.html#rowid

Perhaps Pierr's problem is related to the "usb interface" on the hard drive?


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread Simon Slavin

On 21 Aug 2009, at 3:26am, pierr wrote:

> I did not know the sequence in defining the field matters. This is
> what I should have done.

Sorry, I should have explained better.  You were right: there is no  
difference.  I was just rearranging the fields in the classic way:  
with the primary key column as the first column.  It helps me think  
about how the database works.  You did nothing wrong.

>> Whichever one of these is at fault, a delete command selecting on an
>> indexed column and deleting 9 records from a five column table
>> should not take 17 seconds.
>
> I am sorry, I should have mentioned It (17 seconds to delete 9)  
> was
> tested on a 350M MIPS CPU. And after changing to the schema you  
> suggested,
> it still take 17 seconds to delete 9 records.
> On my 1.8G Hz Ubuntu desktop it tooks 800ms. So is this delete  
> performance
> is the limit we can achieve? Any other options I can improve this?

Ah.  Okay, less than 1 second is reasonable for a standard computer.   
If your code performs at this speed then there's no evidence of any  
problem with your code.

> BTW:  I used following option to build the libarary. Is there any  
> thing I
> can expore here? (-O2 and -Os seem has no big difference on  
> performance.)
> mips24k-linux-gcc -Os -fPIC -c *.c
> mips24k-linux-gcc -shared  -o mips_libsqlite3.so.1 sqlite3.o

Putting these together, your 350 MIPS CPU machine is a MIPS 24K  
machine.  There are various things to consider: not only CPU speed but  
also memory bandwidth, memory speed, hard disk throughput, and other  
things I have no idea about.  And I have no idea what part threading  
would pay on such a platform.  Sorry but I have no experience with  
these.  Perhaps someone else here does.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread pierr


Simon Slavin-2 wrote:
> 
> If ts_start is an always increasing integer (i.e. no two records have  
> the same value) then I would see it as your primary key and as your  
> row id.  In other words, your structure is more like
> 
>   CREATE TABLE IF NOT EXISTS tblIndex(
>  ts_start INTEGER PRIMARY KEY,
>  ts_end   INTEGER,
>  frame_type   INTEGER,
>  pts  VARCHAR(5))
> 
> 
Thanks. I did not know the sequence in defining the field matters. This is
what I should have done.


Simon Slavin-2 wrote:
> 
> inserting 50 records a second shouldn't be a problem.  If they're  
> generated in batches rather than individually, use BEGIN ... COMMIT to  
> speed up the insertion dramatically but it should be fast enough  
> either way.
> 
> What interface or API to SQLite are you using ?  Or are you writing C  
> code and using the native library commands ?  17 seconds is far longer  
> than expected if you did everything as you described above.  It should  
> be far faster even without multiple threads/connections.
> 
> To test it, get your database ready to do one of the DELETE FROM  
> commands, and have your program quit or abort.  Then open the database  
> in the sqlite3 command-line application, and issue the exact DELETE  
> FROM command your application would execute.  If the command-line  
> program takes 17 seconds then the problem is in your data somewhere.   
> If it doesn't, the problem is in your own application or in your use  
> of the API you're using.
> 
>> Any way to reduce this ? We don't care if the records is synced to  
>> the hard
>> disk immediately. We are thinking start a seperated thread to do the  
>> delete
>> so to make this call to be async. The things I am not sure is  
>> whether the
>> (long time )delete will impact the insert performance if they share  
>> the same
>> connection? Or should I use a seperated connection for insert and  
>> delete?
>> But In this way, do they need be synced in application level?
> 
> Whichever one of these is at fault, a delete command selecting on an  
> indexed column and deleting 9 records from a five column table  
> should not take 17 seconds.
> 
> 

I am sorry, I should have mentioned It (17 seconds to delete 9) was
tested on a 350M MIPS CPU. And after changing to the schema you suggested,
it still take 17 seconds to delete 9 records.
On my 1.8G Hz Ubuntu desktop it tooks 800ms. So is this delete performance
is the limit we can achieve? Any other options I can improve this?

BTW:  I used following option to build the libarary. Is there any thing I
can expore here? (-O2 and -Os seem has no big difference on performance.)
mips24k-linux-gcc -Os -fPIC -c *.c
mips24k-linux-gcc -shared  -o mips_libsqlite3.so.1 sqlite3.o 

-- 
View this message in context: 
http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25073115.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread Simon Slavin

On 20 Aug 2009, at 2:33pm, pierr wrote:

>  We have a database with a very simple schema:
>CREATE TABLE IF NOT EXISTS tblIndex(
> frame_type  INT,
> pts  VARCHAR(5),
> ts_start  INT primary key,
> ts_end  INT)
>
> And the application scenario is :
> 1. Every second usr will insert 2 ~ 50 records ,and the ts_start  
> fields of
> those records is always increasing.

If ts_start is an always increasing integer (i.e. no two records have  
the same value) then I would see it as your primary key and as your  
row id.  In other words, your structure is more like

  CREATE TABLE IF NOT EXISTS tblIndex(
 ts_start   INTEGER PRIMARY KEY,
 ts_end INTEGER,
 frame_type INTEGER,
 ptsVARCHAR(5))

inserting 50 records a second shouldn't be a problem.  If they're  
generated in batches rather than individually, use BEGIN ... COMMIT to  
speed up the insertion dramatically but it should be fast enough  
either way.

> After 8 hours ,there are at most
> 1_800_000 records. By setting the sync mode to off, the performance  
> seems OK
> so far.And because the data of each record has only 16 bytes, we may  
> use
> some buffer even if the insert speed is not too fast.


Won't do any harm, but might not be worth doing either.  Test it with  
minimal programming first.

> 2. After 8 hours , usr will told me to delete the oldest data by  
> telling the
> upper bound ts_start , so I will do DELETE FROM tblIndex WHERE  
> ts_start <
> upper_bound_ts_start. Delete 90_000 (which is the records for half a  
> hour)
> out of the 1_800_000 now take 17 seconds. A litter bit longer than  
> expected.

What interface or API to SQLite are you using ?  Or are you writing C  
code and using the native library commands ?  17 seconds is far longer  
than expected if you did everything as you described above.  It should  
be far faster even without multiple threads/connections.

To test it, get your database ready to do one of the DELETE FROM  
commands, and have your program quit or abort.  Then open the database  
in the sqlite3 command-line application, and issue the exact DELETE  
FROM command your application would execute.  If the command-line  
program takes 17 seconds then the problem is in your data somewhere.   
If it doesn't, the problem is in your own application or in your use  
of the API you're using.

> Any way to reduce this ? We don't care if the records is synced to  
> the hard
> disk immediately. We are thinking start a seperated thread to do the  
> delete
> so to make this call to be async. The things I am not sure is  
> whether the
> (long time )delete will impact the insert performance if they share  
> the same
> connection? Or should I use a seperated connection for insert and  
> delete?
> But In this way, do they need be synced in application level?

Whichever one of these is at fault, a delete command selecting on an  
indexed column and deleting 9 records from a five column table  
should not take 17 seconds.

> 3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ?  
> AND ? " -
> As ts_start is the primary key so the performance is OK for our need  
> now. I
> am thinking i should use a seperated connection for search , right?

Your data requirements are not huge.  The system should be fast enough  
no matter whether you use one connection or many.

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] suggestion on the database design

2009-08-20 Thread P Kishor
On Thu, Aug 20, 2009 at 8:33 AM, pierr wrote:
>
> Hi,
>  We have a database with a very simple schema:
>    CREATE TABLE IF NOT EXISTS tblIndex(
>     frame_type  INT,
>     pts  VARCHAR(5),
>     ts_start  INT primary key,
>     ts_end  INT)
>
> And the application scenario is :
> 1. Every second usr will insert 2 ~ 50 records ,and the ts_start fields of
> those records is always increasing. After 8 hours ,there are at most
> 1_800_000 records. By setting the sync mode to off, the performance seems OK
> so far.And because the data of each record has only 16 bytes, we may use
> some buffer even if the insert speed is not too fast.
>
> 2. After 8 hours , usr will told me to delete the oldest data by telling the
> upper bound ts_start , so I will do DELETE FROM tblIndex WHERE ts_start <
> upper_bound_ts_start. Delete 90_000 (which is the records for half a hour)
> out of the 1_800_000 now take 17 seconds. A litter bit longer than expected.
> Any way to reduce this ?


With a simple schema as below (no indexes other than the stock INTEGER
PRIMARY KEY on ts_start

CREATE TABLE tblIndex (
ts_start INTEGER PRIMARY KEY,
frame_type INTEGER,
pts VARCHAR(5),
ts_end INTEGER
)

and using Perl DBD::SQLite on my Macbook, I get the following times

65 seconds to insert 1,800,000 records with a commit every 100,000

1 second to delete the first 90,000 records


> We don't care if the records is synced to the hard
> disk immediately. We are thinking start a seperated thread to do the delete
> so to make this call to be async. The things I am not sure is whether the
> (long time )delete will impact the insert performance if they share the same
> connection? Or should I use a seperated connection for insert and delete?
> But In this way, do they need be synced in application level?
>
> 3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ? AND ? " -
> As ts_start is the primary key so the performance is OK for our need now. I
> am thinking i should use a seperated connection for search , right?
>
> Configuration of sqlite :
> hard disk database (usb interface)
> cache size is 2000
> page size is 1024
> sync mode is 0 (off)
> journal_mode mode is truncate
>
>
> Thanks for any suggestion to improve the delete performance or about the
> overall design.
> --
> View this message in context: 
> http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25062134.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] suggestion on the database design

2009-08-20 Thread pierr

Hi, 
  We have a database with a very simple schema:
CREATE TABLE IF NOT EXISTS tblIndex(
 frame_type  INT,
 pts  VARCHAR(5),
 ts_start  INT primary key,
 ts_end  INT)

And the application scenario is :
1. Every second usr will insert 2 ~ 50 records ,and the ts_start fields of
those records is always increasing. After 8 hours ,there are at most
1_800_000 records. By setting the sync mode to off, the performance seems OK
so far.And because the data of each record has only 16 bytes, we may use
some buffer even if the insert speed is not too fast.

2. After 8 hours , usr will told me to delete the oldest data by telling the
upper bound ts_start , so I will do DELETE FROM tblIndex WHERE ts_start <
upper_bound_ts_start. Delete 90_000 (which is the records for half a hour)
out of the 1_800_000 now take 17 seconds. A litter bit longer than expected.
Any way to reduce this ? We don't care if the records is synced to the hard
disk immediately. We are thinking start a seperated thread to do the delete
so to make this call to be async. The things I am not sure is whether the
(long time )delete will impact the insert performance if they share the same
connection? Or should I use a seperated connection for insert and delete?
But In this way, do they need be synced in application level?

3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ? AND ? " -
As ts_start is the primary key so the performance is OK for our need now. I
am thinking i should use a seperated connection for search , right?

Configuration of sqlite :
hard disk database (usb interface)
cache size is 2000 
page size is 1024 
sync mode is 0 (off)
journal_mode mode is truncate 


Thanks for any suggestion to improve the delete performance or about the
overall design.
-- 
View this message in context: 
http://www.nabble.com/suggestion-on-the-database-design-tp25062134p25062134.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users