Re: [sqlite] Strange performance behavior

2007-03-19 Thread Dennis Cote

Hubertus wrote:

Well now, if this isn't something!
fist I wann thank you for your quick reply. Finally I have to wait for my result
satisfying 0.6 sec. This is great. Now I can add lots more data.
What I did:
- I added a second index for the column "campId" and did the analyze trick
  (I had this column in the index before, but that time it was quicker without 
  that column).

  This already improved the waiting time from the former best 3 sec to 1 sec
- I increased the page_size to 4096, this decresed the size of the
  database from 650 Mb to 450.
- And at last I replaced the -.99 values with NULL (don't think about it. I 
  was asked to do this as missing value, now I found the .nullvalue...)
  This again decreased the size to stunning 165 Mb!! and improved the
  query time to even better 0.6 sec.

To Dennis:
I'm afraid I haven't quite understood the quote thing. 
First how can I do a query like

select "42" from data where campId='stream94' and "14">-;
from my shell?
Secondondly usually I use python or R to access the data where I do somthing 
like

INSERT = 'SELECT "%i" FROM data where campId="%s"' % col, campId

query <- paste('SELECT "34" AS "N2O_TDL", "29" AS "O3"',
  'FROM data where campId="polstar97"')
rs <- dbSendQuery(con, statement = query)

How is this done correctly?

Thanks a lot

Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-


  

Hubertus,

There are a couple of issues involved here. First, standard SQL syntax 
uses single quote to delimit literal values and double quotes to delimit 
quoted identifiers (usually used for identifiers that are keywords or 
contain special characters like space). Second you are using both types 
of quoting in your SQL query. Third, the bash shell uses both single and 
double quotes to delimit strings with or without variable substitution 
applied. And finally, Python uses both types of quotes (and several 
other) for string literals.


In python you can use a triple quote to delimit a string that contains 
other quotes to treat them as literal quotes. Your query can be done 
like this.


>>> print '''select "%i" from data where campId='%s';''' % (14, 'polestar')
select "14" from data where campId='polestar';

or you can use a backslash to escape the quotes used to delimit the 
string like this.


>>> print 'select "%i" from data where campId=\'%s\';' % (14, 'polestar')
select "14" from data where campId='polestar';

In the bash shell single quotes are used where command and variable 
substitution are not desired, and single quotes can not appear in a 
single quoted string (even with a backslash escape). Double quotes allow 
command and variable substitution and escaping of literal characters. To 
prepare a command that includes both types of quotes you need to use 
double quotes as the outer delimiters and then escape any double quotes 
in the string using a backslash.


sqlite3 mydb "select \"14\" from data where campId='polestar';"

HTH
Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread John Stanton

Hubertus wrote:

Dear John,

You might also look at using the Sqlite date format rather than seperate 
columns for year, month etc.


That was what I considered first. The problem was, that other people
are also supposed to use this datbase. Some of them use Fortran and they
said that it's easier to compile the data in this format but in the
prefered current_date. I couldn't find a way to do the splitting in day,
month, year with sqlite, so I choose this, admittedly not very nice,
implementation. I think even Fortran should know something like subset or split,
but...
Can this splitting be done by sqlite?

Thank

Hubertus

Sqlite has a set of built in date/time functions which will transform 
the internal format according to your requirement.  It uses an offset 
from an epoch, the standard way to handle dates and times.


The big advantage of the single date/time value is with searching. 
There is just a single comparison of a REAL instead of a complex expression.


You could look at using an ISO standard time format like 8601 to 
communicate with other users.  A VIEW would output you date and time in 
that format.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread Ken
Did you ever determine the cardinality of the campID field? I'm guessing its 
pretty good since your query is now .6 seconds.
 
 Lets say your cardinality was low, ie say less than .3 (arbitrary number). 
Then using the index to perform you data lookups would probably be slower than 
just reading the entire file. In this case the index would actually hurt rather 
than help. 
 
 

Hubertus <[EMAIL PROTECTED]> wrote: Well now, if this isn't something!
fist I wann thank you for your quick reply. Finally I have to wait for my result
satisfying 0.6 sec. This is great. Now I can add lots more data.
What I did:
- I added a second index for the column "campId" and did the analyze trick
  (I had this column in the index before, but that time it was quicker without 
  that column).
  This already improved the waiting time from the former best 3 sec to 1 sec
- I increased the page_size to 4096, this decresed the size of the
  database from 650 Mb to 450.
- And at last I replaced the -.99 values with NULL (don't think about it. I 
was asked to do this as missing value, now I found the .nullvalue...)
  This again decreased the size to stunning 165 Mb!! and improved the
  query time to even better 0.6 sec.

To Dennis:
I'm afraid I haven't quite understood the quote thing. 
First how can I do a query like
select "42" from data where campId='stream94' and "14">-;
from my shell?
Secondondly usually I use python or R to access the data where I do somthing 
like
INSERT = 'SELECT "%i" FROM data where campId="%s"' % col, campId

query <- paste('SELECT "34" AS "N2O_TDL", "29" AS "O3"',
  'FROM data where campId="polstar97"')
rs <- dbSendQuery(con, statement = query)

How is this done correctly?

Thanks a lot

Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




Re: [sqlite] Strange performance behavior

2007-03-19 Thread Hubertus
Dear John,
> You might also look at using the Sqlite date format rather than seperate 
> columns for year, month etc.
That was what I considered first. The problem was, that other people
are also supposed to use this datbase. Some of them use Fortran and they
said that it's easier to compile the data in this format but in the
prefered current_date. I couldn't find a way to do the splitting in day,
month, year with sqlite, so I choose this, admittedly not very nice,
implementation. I think even Fortran should know something like subset or split,
but...
Can this splitting be done by sqlite?

Thank

Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread Hubertus
Well now, if this isn't something!
fist I wann thank you for your quick reply. Finally I have to wait for my result
satisfying 0.6 sec. This is great. Now I can add lots more data.
What I did:
- I added a second index for the column "campId" and did the analyze trick
  (I had this column in the index before, but that time it was quicker without 
  that column).
  This already improved the waiting time from the former best 3 sec to 1 sec
- I increased the page_size to 4096, this decresed the size of the
  database from 650 Mb to 450.
- And at last I replaced the -.99 values with NULL (don't think about it. I 
  was asked to do this as missing value, now I found the .nullvalue...)
  This again decreased the size to stunning 165 Mb!! and improved the
  query time to even better 0.6 sec.

To Dennis:
I'm afraid I haven't quite understood the quote thing. 
First how can I do a query like
select "42" from data where campId='stream94' and "14">-;
from my shell?
Secondondly usually I use python or R to access the data where I do somthing 
like
INSERT = 'SELECT "%i" FROM data where campId="%s"' % col, campId

query <- paste('SELECT "34" AS "N2O_TDL", "29" AS "O3"',
  'FROM data where campId="polstar97"')
rs <- dbSendQuery(con, statement = query)

How is this done correctly?

Thanks a lot

Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread P Kishor

On 3/19/07, Hubertus <[EMAIL PROTECTED]> wrote:

Dear list,
sorry to just come up with another performance question. I build a yet small
database with one table. It has about 650.000 rows, 75 columns and
has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The
Laptop runs Suse 10.2 and does basicly nothing but this database.
sqlite3 -version is 3.3.8

This is the schema:
CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY,
  'year' INTEGER,
  'month' INTEGER,
  'day' INTEGER,
  'sec' REAL,
  'campId' TEXT,
  'flightNr' INTEGER,
  '1' REAL,
  ...
  '71' REAL
  );
CREATE INDEX sec on data(year,month,day,sec);

I experience a big variability of time a query needs:


The index on the psuedo time fields is not being used at all. How
about an index on campId as well as on "14" (that is, if "14" is
always going to be in your query)?

--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread John Stanton
First, you have your double and single quotes mixed up.  SQL uses single 
quotes for literals.


Second, you are performing row scans and not using any of your indices. 
 You will do better if you have an index on the column you specify in 
your search.


You might also look at using the Sqlite date format rather than seperate 
columns for year, month etc.


Hubertus wrote:

Dear list,
sorry to just come up with another performance question. I build a yet small 
database with one table. It has about 650.000 rows, 75 columns and
has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The 
Laptop runs Suse 10.2 and does basicly nothing but this database.

sqlite3 -version is 3.3.8

This is the schema:
CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY,
  'year' INTEGER,
  'month' INTEGER,
  'day' INTEGER,
  'sec' REAL,
  'campId' TEXT,
  'flightNr' INTEGER,
  '1' REAL,
  ...
  '71' REAL
  );
CREATE INDEX sec on data(year,month,day,sec);
  
I experience a big variability of time a query needs:

~database> time sqlite3  data.db 'select "14" from data where campId="polstar98" and 
"14">-;' >/dev/null
real0m3.115s
user0m1.748s
sys 0m1.368s
~/database> time sqlite3  data.db 'select "14" from data where campId="polstar98" and 
"14">-;' >/dev/null
real0m3.139s
user0m1.756s
sys 0m1.380s
~/database> time sqlite3  data.db 'select "42" from data where campId="stream94" and 
"14">-;' >/dev/null
real0m50.227s
user0m4.692s
sys 0m3.028s

I've tried it with and without index and at different times. Most of the
time such simple queries take about 35 sec. Why is that so and what can be done?
I can live with 3 sec of response but not up to one minute and the
database is still not complete. Would pytables with hdf5 be an alternative? 
Knowing that this is probably not the right place to ask...


Tips, suggestions, recommendation are gratefuly appreciated! 
Thanks in advance


Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread Dennis Cote

Hubertus wrote:


This is the schema:
CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY,
  'year' INTEGER,
  'month' INTEGER,
  'day' INTEGER,
  'sec' REAL,
  'campId' TEXT,
  'flightNr' INTEGER,
  '1' REAL,
  ...
  '71' REAL
  );
You should use double quotes around column and table names. Single 
quotes delimit literal strings in SQL.



CREATE INDEX sec on data(year,month,day,sec);

This index will not help with the queries you are testing.

  
I experience a big variability of time a query needs:

~database> time sqlite3  data.db 'select "14" from data where campId="polstar98" and 
"14">-;' >/dev/null
real0m3.115s
user0m1.748s
sys 0m1.368s
~/database> time sqlite3  data.db 'select "14" from data where campId="polstar98" and 
"14">-;' >/dev/null
real0m3.139s
user0m1.756s
sys 0m1.380s
~/database> time sqlite3  data.db 'select "42" from data where campId="stream94" and 
"14">-;' >/dev/null
real0m50.227s
user0m4.692s
sys 0m3.028s

I've tried it with and without index and at different times. Most of the
time such simple queries take about 35 sec. Why is that so and what can be done?
I can live with 3 sec of response but not up to one minute and the
database is still not complete. Would pytables with hdf5 be an alternative? 
Knowing that this is probably not the right place to ask...




You have your quotes mixed up in your queries as well.

   select "42" from data where campId="stream94" and "14">-;

should be

   select "42" from data where campId='stream94' and "14">-;

The value 'stream94' is a literal string and should be delimited with 
single quotes. You will need to escape those single quotes on the bash 
command line to get the correct query passed to sqlite.


Your query is scanning the entire table looking for records that match 
your conditions. I suspect that a lot of your variability is due to the 
relative number of records matching your first condition. To speed these 
searches you need an index on the columns you are searching.


   Create index dataCampId on data (campId);

Note that sqlite will only use a single index per table per query. In 
your case you are searching based on two columns. You can either create 
a compound index on both columns


   Create index dataCompound1 on data (campId, "14");

or create multiple indices on the individual columns and then use the 
analyze command to let sqlite gather the statistics that it needs to 
select the best index to use for a given query. The compound index 
approach will be the fastest if you use the same pair of columns as 
search criterion in most of your queries. If your search criteria vary 
you are probably best using single indexes and the analyze command.


HTH
Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread Christian Smith

Hubertus uttered:


Dear list,
sorry to just come up with another performance question. I build a yet small
database with one table. It has about 650.000 rows, 75 columns and
has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The
Laptop runs Suse 10.2 and does basicly nothing but this database.
sqlite3 -version is 3.3.8

This is the schema:
CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY,
 'year' INTEGER,
 'month' INTEGER,
 'day' INTEGER,
 'sec' REAL,
 'campId' TEXT,
 'flightNr' INTEGER,
 '1' REAL,
 ...
 '71' REAL
 );
CREATE INDEX sec on data(year,month,day,sec);



What a nasty schema! What exactly do the '1'...'71' fields represent? Are 
they all used in each row? If not, you might be better off putting the 
data in a seperate table and joining the data.





I experience a big variability of time a query needs:
~database> time sqlite3  data.db 'select "14" from data where campId="polstar98" and 
"14">-;' >/dev/null
real0m3.115s
user0m1.748s
sys 0m1.368s
~/database> time sqlite3  data.db 'select "14" from data where campId="polstar98" and 
"14">-;' >/dev/null
real0m3.139s
user0m1.756s
sys 0m1.380s
~/database> time sqlite3  data.db 'select "42" from data where campId="stream94" and 
"14">-;' >/dev/null
real0m50.227s
user0m4.692s
sys 0m3.028s

I've tried it with and without index and at different times. Most of the 
time such simple queries take about 35 sec. Why is that so and what can 
be done? I can live with 3 sec of response but not up to one minute and 
the database is still not complete. Would pytables with hdf5 be an 
alternative? Knowing that this is probably not the right place to ask...



The rows are probably quite big (each real value is 8 bytes), and would 
not fit in a single low level SQLite BTree cell, but instead overflow 
using an overflow page per row. As well as being inefficient for access of 
columns in the overflow page, it is also massively space inefficient, as 
the overflow page is not shared and most of it's space is probably wasted.





Tips, suggestions, recommendation are gratefuly appreciated!



If you can't change the schema, your best bet is to increase the page size 
of the database, which will hopefully allow you to keep entire rows 
together without using overflow pages. Create a new database, and use:

PRAGMA page_size=4096;

then import your existing data from your old database. Something like:

$ rm new.db
$ sqilte3 new.db
sqlite> PRAGMA page_size=4096;
sqlite> ATTACH 'old.db' AS old;
sqlite> CREATE TABLE data AS SELECT * FROM old.data;



Thanks in advance

Hubertus



Christian


--
/"\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Strange performance behavior

2007-03-19 Thread Ken
Looks like it is going to do a full scan of the entire database to complete 
that querry based upon your where clause.
   
  Are you always accessing the data by campID? What is the cardinality of 
campId data?
  Depending upon that it might be worth while putting and index on CampID.
  
Hubertus <[EMAIL PROTECTED]> wrote:
  Dear list,
sorry to just come up with another performance question. I build a yet small 
database with one table. It has about 650.000 rows, 75 columns and
has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The 
Laptop runs Suse 10.2 and does basicly nothing but this database.
sqlite3 -version is 3.3.8

This is the schema:
CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY,
'year' INTEGER,
'month' INTEGER,
'day' INTEGER,
'sec' REAL,
'campId' TEXT,
'flightNr' INTEGER,
'1' REAL,
...
'71' REAL
);
CREATE INDEX sec on data(year,month,day,sec);

I experience a big variability of time a query needs:
~database> time sqlite3 data.db 'select "14" from data where campId="polstar98" 
and "14">-;' >/dev/null
real 0m3.115s
user 0m1.748s
sys 0m1.368s
~/database> time sqlite3 data.db 'select "14" from data where 
campId="polstar98" and "14">-;' >/dev/null
real 0m3.139s
user 0m1.756s
sys 0m1.380s
~/database> time sqlite3 data.db 'select "42" from data where campId="stream94" 
and "14">-;' >/dev/null
real 0m50.227s
user 0m4.692s
sys 0m3.028s

I've tried it with and without index and at different times. Most of the
time such simple queries take about 35 sec. Why is that so and what can be done?
I can live with 3 sec of response but not up to one minute and the
database is still not complete. Would pytables with hdf5 be an alternative? 
Knowing that this is probably not the right place to ask...

Tips, suggestions, recommendation are gratefuly appreciated! 
Thanks in advance

Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




[sqlite] Strange performance behavior

2007-03-19 Thread Hubertus
Dear list,
sorry to just come up with another performance question. I build a yet small 
database with one table. It has about 650.000 rows, 75 columns and
has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The 
Laptop runs Suse 10.2 and does basicly nothing but this database.
sqlite3 -version is 3.3.8

This is the schema:
CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY,
  'year' INTEGER,
  'month' INTEGER,
  'day' INTEGER,
  'sec' REAL,
  'campId' TEXT,
  'flightNr' INTEGER,
  '1' REAL,
  ...
  '71' REAL
  );
CREATE INDEX sec on data(year,month,day,sec);
  
I experience a big variability of time a query needs:
~database> time sqlite3  data.db 'select "14" from data where 
campId="polstar98" and "14">-;' >/dev/null
real0m3.115s
user0m1.748s
sys 0m1.368s
~/database> time sqlite3  data.db 'select "14" from data where 
campId="polstar98" and "14">-;' >/dev/null
real0m3.139s
user0m1.756s
sys 0m1.380s
~/database> time sqlite3  data.db 'select "42" from data where 
campId="stream94" and "14">-;' >/dev/null
real0m50.227s
user0m4.692s
sys 0m3.028s

I've tried it with and without index and at different times. Most of the
time such simple queries take about 35 sec. Why is that so and what can be done?
I can live with 3 sec of response but not up to one minute and the
database is still not complete. Would pytables with hdf5 be an alternative? 
Knowing that this is probably not the right place to ask...

Tips, suggestions, recommendation are gratefuly appreciated! 
Thanks in advance

Hubertus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-