Re: [sqlite] Strange performance behavior
Hubertus wrote: Well now, if this isn't something! fist I wann thank you for your quick reply. Finally I have to wait for my result satisfying 0.6 sec. This is great. Now I can add lots more data. What I did: - I added a second index for the column "campId" and did the analyze trick (I had this column in the index before, but that time it was quicker without that column). This already improved the waiting time from the former best 3 sec to 1 sec - I increased the page_size to 4096, this decresed the size of the database from 650 Mb to 450. - And at last I replaced the -.99 values with NULL (don't think about it. I was asked to do this as missing value, now I found the .nullvalue...) This again decreased the size to stunning 165 Mb!! and improved the query time to even better 0.6 sec. To Dennis: I'm afraid I haven't quite understood the quote thing. First how can I do a query like select "42" from data where campId='stream94' and "14">-; from my shell? Secondondly usually I use python or R to access the data where I do somthing like INSERT = 'SELECT "%i" FROM data where campId="%s"' % col, campId query <- paste('SELECT "34" AS "N2O_TDL", "29" AS "O3"', 'FROM data where campId="polstar97"') rs <- dbSendQuery(con, statement = query) How is this done correctly? Thanks a lot Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] - Hubertus, There are a couple of issues involved here. First, standard SQL syntax uses single quote to delimit literal values and double quotes to delimit quoted identifiers (usually used for identifiers that are keywords or contain special characters like space). Second you are using both types of quoting in your SQL query. Third, the bash shell uses both single and double quotes to delimit strings with or without variable substitution applied. And finally, Python uses both types of quotes (and several other) for string literals. In python you can use a triple quote to delimit a string that contains other quotes to treat them as literal quotes. Your query can be done like this. >>> print '''select "%i" from data where campId='%s';''' % (14, 'polestar') select "14" from data where campId='polestar'; or you can use a backslash to escape the quotes used to delimit the string like this. >>> print 'select "%i" from data where campId=\'%s\';' % (14, 'polestar') select "14" from data where campId='polestar'; In the bash shell single quotes are used where command and variable substitution are not desired, and single quotes can not appear in a single quoted string (even with a backslash escape). Double quotes allow command and variable substitution and escaping of literal characters. To prepare a command that includes both types of quotes you need to use double quotes as the outer delimiters and then escape any double quotes in the string using a backslash. sqlite3 mydb "select \"14\" from data where campId='polestar';" HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Hubertus wrote: Dear John, You might also look at using the Sqlite date format rather than seperate columns for year, month etc. That was what I considered first. The problem was, that other people are also supposed to use this datbase. Some of them use Fortran and they said that it's easier to compile the data in this format but in the prefered current_date. I couldn't find a way to do the splitting in day, month, year with sqlite, so I choose this, admittedly not very nice, implementation. I think even Fortran should know something like subset or split, but... Can this splitting be done by sqlite? Thank Hubertus Sqlite has a set of built in date/time functions which will transform the internal format according to your requirement. It uses an offset from an epoch, the standard way to handle dates and times. The big advantage of the single date/time value is with searching. There is just a single comparison of a REAL instead of a complex expression. You could look at using an ISO standard time format like 8601 to communicate with other users. A VIEW would output you date and time in that format. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Did you ever determine the cardinality of the campID field? I'm guessing its pretty good since your query is now .6 seconds. Lets say your cardinality was low, ie say less than .3 (arbitrary number). Then using the index to perform you data lookups would probably be slower than just reading the entire file. In this case the index would actually hurt rather than help. Hubertus <[EMAIL PROTECTED]> wrote: Well now, if this isn't something! fist I wann thank you for your quick reply. Finally I have to wait for my result satisfying 0.6 sec. This is great. Now I can add lots more data. What I did: - I added a second index for the column "campId" and did the analyze trick (I had this column in the index before, but that time it was quicker without that column). This already improved the waiting time from the former best 3 sec to 1 sec - I increased the page_size to 4096, this decresed the size of the database from 650 Mb to 450. - And at last I replaced the -.99 values with NULL (don't think about it. I was asked to do this as missing value, now I found the .nullvalue...) This again decreased the size to stunning 165 Mb!! and improved the query time to even better 0.6 sec. To Dennis: I'm afraid I haven't quite understood the quote thing. First how can I do a query like select "42" from data where campId='stream94' and "14">-; from my shell? Secondondly usually I use python or R to access the data where I do somthing like INSERT = 'SELECT "%i" FROM data where campId="%s"' % col, campId query <- paste('SELECT "34" AS "N2O_TDL", "29" AS "O3"', 'FROM data where campId="polstar97"') rs <- dbSendQuery(con, statement = query) How is this done correctly? Thanks a lot Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Dear John, > You might also look at using the Sqlite date format rather than seperate > columns for year, month etc. That was what I considered first. The problem was, that other people are also supposed to use this datbase. Some of them use Fortran and they said that it's easier to compile the data in this format but in the prefered current_date. I couldn't find a way to do the splitting in day, month, year with sqlite, so I choose this, admittedly not very nice, implementation. I think even Fortran should know something like subset or split, but... Can this splitting be done by sqlite? Thank Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Well now, if this isn't something! fist I wann thank you for your quick reply. Finally I have to wait for my result satisfying 0.6 sec. This is great. Now I can add lots more data. What I did: - I added a second index for the column "campId" and did the analyze trick (I had this column in the index before, but that time it was quicker without that column). This already improved the waiting time from the former best 3 sec to 1 sec - I increased the page_size to 4096, this decresed the size of the database from 650 Mb to 450. - And at last I replaced the -.99 values with NULL (don't think about it. I was asked to do this as missing value, now I found the .nullvalue...) This again decreased the size to stunning 165 Mb!! and improved the query time to even better 0.6 sec. To Dennis: I'm afraid I haven't quite understood the quote thing. First how can I do a query like select "42" from data where campId='stream94' and "14">-; from my shell? Secondondly usually I use python or R to access the data where I do somthing like INSERT = 'SELECT "%i" FROM data where campId="%s"' % col, campId query <- paste('SELECT "34" AS "N2O_TDL", "29" AS "O3"', 'FROM data where campId="polstar97"') rs <- dbSendQuery(con, statement = query) How is this done correctly? Thanks a lot Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
On 3/19/07, Hubertus <[EMAIL PROTECTED]> wrote: Dear list, sorry to just come up with another performance question. I build a yet small database with one table. It has about 650.000 rows, 75 columns and has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The Laptop runs Suse 10.2 and does basicly nothing but this database. sqlite3 -version is 3.3.8 This is the schema: CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY, 'year' INTEGER, 'month' INTEGER, 'day' INTEGER, 'sec' REAL, 'campId' TEXT, 'flightNr' INTEGER, '1' REAL, ... '71' REAL ); CREATE INDEX sec on data(year,month,day,sec); I experience a big variability of time a query needs: The index on the psuedo time fields is not being used at all. How about an index on campId as well as on "14" (that is, if "14" is always going to be in your query)? -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
First, you have your double and single quotes mixed up. SQL uses single quotes for literals. Second, you are performing row scans and not using any of your indices. You will do better if you have an index on the column you specify in your search. You might also look at using the Sqlite date format rather than seperate columns for year, month etc. Hubertus wrote: Dear list, sorry to just come up with another performance question. I build a yet small database with one table. It has about 650.000 rows, 75 columns and has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The Laptop runs Suse 10.2 and does basicly nothing but this database. sqlite3 -version is 3.3.8 This is the schema: CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY, 'year' INTEGER, 'month' INTEGER, 'day' INTEGER, 'sec' REAL, 'campId' TEXT, 'flightNr' INTEGER, '1' REAL, ... '71' REAL ); CREATE INDEX sec on data(year,month,day,sec); I experience a big variability of time a query needs: ~database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.115s user0m1.748s sys 0m1.368s ~/database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.139s user0m1.756s sys 0m1.380s ~/database> time sqlite3 data.db 'select "42" from data where campId="stream94" and "14">-;' >/dev/null real0m50.227s user0m4.692s sys 0m3.028s I've tried it with and without index and at different times. Most of the time such simple queries take about 35 sec. Why is that so and what can be done? I can live with 3 sec of response but not up to one minute and the database is still not complete. Would pytables with hdf5 be an alternative? Knowing that this is probably not the right place to ask... Tips, suggestions, recommendation are gratefuly appreciated! Thanks in advance Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Hubertus wrote: This is the schema: CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY, 'year' INTEGER, 'month' INTEGER, 'day' INTEGER, 'sec' REAL, 'campId' TEXT, 'flightNr' INTEGER, '1' REAL, ... '71' REAL ); You should use double quotes around column and table names. Single quotes delimit literal strings in SQL. CREATE INDEX sec on data(year,month,day,sec); This index will not help with the queries you are testing. I experience a big variability of time a query needs: ~database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.115s user0m1.748s sys 0m1.368s ~/database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.139s user0m1.756s sys 0m1.380s ~/database> time sqlite3 data.db 'select "42" from data where campId="stream94" and "14">-;' >/dev/null real0m50.227s user0m4.692s sys 0m3.028s I've tried it with and without index and at different times. Most of the time such simple queries take about 35 sec. Why is that so and what can be done? I can live with 3 sec of response but not up to one minute and the database is still not complete. Would pytables with hdf5 be an alternative? Knowing that this is probably not the right place to ask... You have your quotes mixed up in your queries as well. select "42" from data where campId="stream94" and "14">-; should be select "42" from data where campId='stream94' and "14">-; The value 'stream94' is a literal string and should be delimited with single quotes. You will need to escape those single quotes on the bash command line to get the correct query passed to sqlite. Your query is scanning the entire table looking for records that match your conditions. I suspect that a lot of your variability is due to the relative number of records matching your first condition. To speed these searches you need an index on the columns you are searching. Create index dataCampId on data (campId); Note that sqlite will only use a single index per table per query. In your case you are searching based on two columns. You can either create a compound index on both columns Create index dataCompound1 on data (campId, "14"); or create multiple indices on the individual columns and then use the analyze command to let sqlite gather the statistics that it needs to select the best index to use for a given query. The compound index approach will be the fastest if you use the same pair of columns as search criterion in most of your queries. If your search criteria vary you are probably best using single indexes and the analyze command. HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Hubertus uttered: Dear list, sorry to just come up with another performance question. I build a yet small database with one table. It has about 650.000 rows, 75 columns and has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The Laptop runs Suse 10.2 and does basicly nothing but this database. sqlite3 -version is 3.3.8 This is the schema: CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY, 'year' INTEGER, 'month' INTEGER, 'day' INTEGER, 'sec' REAL, 'campId' TEXT, 'flightNr' INTEGER, '1' REAL, ... '71' REAL ); CREATE INDEX sec on data(year,month,day,sec); What a nasty schema! What exactly do the '1'...'71' fields represent? Are they all used in each row? If not, you might be better off putting the data in a seperate table and joining the data. I experience a big variability of time a query needs: ~database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.115s user0m1.748s sys 0m1.368s ~/database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.139s user0m1.756s sys 0m1.380s ~/database> time sqlite3 data.db 'select "42" from data where campId="stream94" and "14">-;' >/dev/null real0m50.227s user0m4.692s sys 0m3.028s I've tried it with and without index and at different times. Most of the time such simple queries take about 35 sec. Why is that so and what can be done? I can live with 3 sec of response but not up to one minute and the database is still not complete. Would pytables with hdf5 be an alternative? Knowing that this is probably not the right place to ask... The rows are probably quite big (each real value is 8 bytes), and would not fit in a single low level SQLite BTree cell, but instead overflow using an overflow page per row. As well as being inefficient for access of columns in the overflow page, it is also massively space inefficient, as the overflow page is not shared and most of it's space is probably wasted. Tips, suggestions, recommendation are gratefuly appreciated! If you can't change the schema, your best bet is to increase the page size of the database, which will hopefully allow you to keep entire rows together without using overflow pages. Create a new database, and use: PRAGMA page_size=4096; then import your existing data from your old database. Something like: $ rm new.db $ sqilte3 new.db sqlite> PRAGMA page_size=4096; sqlite> ATTACH 'old.db' AS old; sqlite> CREATE TABLE data AS SELECT * FROM old.data; Thanks in advance Hubertus Christian -- /"\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Strange performance behavior
Looks like it is going to do a full scan of the entire database to complete that querry based upon your where clause. Are you always accessing the data by campID? What is the cardinality of campId data? Depending upon that it might be worth while putting and index on CampID. Hubertus <[EMAIL PROTECTED]> wrote: Dear list, sorry to just come up with another performance question. I build a yet small database with one table. It has about 650.000 rows, 75 columns and has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The Laptop runs Suse 10.2 and does basicly nothing but this database. sqlite3 -version is 3.3.8 This is the schema: CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY, 'year' INTEGER, 'month' INTEGER, 'day' INTEGER, 'sec' REAL, 'campId' TEXT, 'flightNr' INTEGER, '1' REAL, ... '71' REAL ); CREATE INDEX sec on data(year,month,day,sec); I experience a big variability of time a query needs: ~database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real 0m3.115s user 0m1.748s sys 0m1.368s ~/database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real 0m3.139s user 0m1.756s sys 0m1.380s ~/database> time sqlite3 data.db 'select "42" from data where campId="stream94" and "14">-;' >/dev/null real 0m50.227s user 0m4.692s sys 0m3.028s I've tried it with and without index and at different times. Most of the time such simple queries take about 35 sec. Why is that so and what can be done? I can live with 3 sec of response but not up to one minute and the database is still not complete. Would pytables with hdf5 be an alternative? Knowing that this is probably not the right place to ask... Tips, suggestions, recommendation are gratefuly appreciated! Thanks in advance Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Strange performance behavior
Dear list, sorry to just come up with another performance question. I build a yet small database with one table. It has about 650.000 rows, 75 columns and has at the moment about 650 Mb. It runs on a Intel Pentium M with 2 GHz. The Laptop runs Suse 10.2 and does basicly nothing but this database. sqlite3 -version is 3.3.8 This is the schema: CREATE TABLE 'data'('nr' INTEGER PRIMARY KEY, 'year' INTEGER, 'month' INTEGER, 'day' INTEGER, 'sec' REAL, 'campId' TEXT, 'flightNr' INTEGER, '1' REAL, ... '71' REAL ); CREATE INDEX sec on data(year,month,day,sec); I experience a big variability of time a query needs: ~database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.115s user0m1.748s sys 0m1.368s ~/database> time sqlite3 data.db 'select "14" from data where campId="polstar98" and "14">-;' >/dev/null real0m3.139s user0m1.756s sys 0m1.380s ~/database> time sqlite3 data.db 'select "42" from data where campId="stream94" and "14">-;' >/dev/null real0m50.227s user0m4.692s sys 0m3.028s I've tried it with and without index and at different times. Most of the time such simple queries take about 35 sec. Why is that so and what can be done? I can live with 3 sec of response but not up to one minute and the database is still not complete. Would pytables with hdf5 be an alternative? Knowing that this is probably not the right place to ask... Tips, suggestions, recommendation are gratefuly appreciated! Thanks in advance Hubertus - To unsubscribe, send email to [EMAIL PROTECTED] -