[sqlite] Performance issue
On 2015-03-30 11:46 AM, Jeff Roux wrote: > Hi everyone, > > I have a daemon that collects information and stores it in a SQLite > database. The table has 1 million rows. > > This daemon is running on a HP server with 12 cores, 32 GB of RAM, > and a SSD drive. I have performance issues with some requests. For > instance, the following request takes more than 5 seconds to > accomplish with SQlite3 (in that particular case, the WHERE clause > selects all the data in the database, i.e. 100 rows): > > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item > FROM flows > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6) > GROUP BY portLan ORDER BY vol DESC LIMIT 6; > > I have done some tests with or without "INDEXED BY" clauses and got > nearly the same results. > > I compared the performance with a mySQL and the same request takes > less than 1 second to accomplish. > > Could you give me some directions to optimize this kind of request > with SQlite3 when there is a big amount of data in the table ? I > need to increase 3 times the number of rows and the performance will > become unacceptable for my application. I think something else is wrong, we might need your Schema to better judge. The reason I say something else is wrong, is that I made this script quick to reproduce what I could establish from your writings might be what the schema looks like, and then added a single index and tried the query. I deliberately added a primary key and then did not use it at all to avoid those advantages as it may not suit your needs. You will see I used 3 million rows in stead of 1 million (because you said your data may grow three times bigger) and some random values in the appropriate ranges to fake the data (where I had an idea, and guessed the rest). The query takes less than 50 milliseconds, even over 3 million rows. In fact the entire script creating the DB, inserting 3 million rows, creating an index, and then doing 3 queries all took only 18 seconds in total (without running analyze). This will even go a bit faster if I turn stats off, but I left it in for clarity. If this does not enlighten your quest, kindly provide your full schema and example data shapes which might give us a clue. NOTE: In my data it works out that about ~12K rows satisfies the where clause - if this is significantly more in your case it might change the timing a bit, but nothing like your suggested times. See results below: -- Processing Script for File: D:\Documents\TestScript3.txt -- Script Items: 7 Parameter Count: 0 -- 2015-03-30 16:16:11.795 | [Success]Script Started... -- DROP TABLE IF EXISTS flows; --Item Stats: Item No: 1 Query Size (Chars): 29 -- VM Work Steps: 92Rows Modified: 0 -- Full Query Time: 0d 00h 00m and 00.485s -- Query Result: Success. -- CREATE TABLE flows( id INTEGER PRIMARY KEY, ipLan TEXT, ipWan TEXT, portLan INT, portWan INT, protocol INT, nbBytesDecompOut INT, nbBytesCompIn INT, tos INT, timestamp INT ); --Item Stats: Item No: 2 Query Size (Chars): 207 -- VM Work Steps: 31Rows Modified: 0 -- Full Query Time: -- --- --- --- --. -- Query Result: Success. -- /* POPULATE the table with random + scaled data to emulate an actual dataset of 3M rows */ WITH acc(x,mx8,dx8,mxK,dxK,rK) AS ( SELECT 0,0,0,0,0,100 UNION ALL SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), CAST(round(x/1024,0) AS INT), CAST(abs(round(random()/10240,0)) AS INT) FROM acc WHERE (x<300) -- Testing 3 million rows ) INSERT INTO flows SELECT x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256), -- ipLan, ipWan 1000+mx8, 5000+mx8, (x%18),-- portlan, portWan, protocol 64+(rk%1024000), -- nbBytesDecompOut 1024+(rk%1024000), -- nbBytesDecompIn (dx8%3), (138000+(x*10)) -- tos, timestamp FROM acc; --Item Stats: Item No: 3 Query Size (Chars): 691 -- Result Columns:0 Result Rows: 0 -- VM Work Steps: 18087 Rows Modified: 301 -- Full Query Time: 0d 00h 00m and 12.849s -- Query Result: Success. --
[sqlite] Performance issue
Simon, here is the list of the indexes that were already defined on the table: CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan); CREATE INDEX idxProto ON flows(protocol); CREATE INDEX idxTos ON flows(tos); CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan, portLan); CREATE INDEX tsindex ON flows(timestamp); I added the ones you gave me: CREATE INDEX ts1 ON flows (portLan, protocol, timestamp); CREATE INDEX ts2 ON flows (portLan, timestamp); CREATE INDEX ts3 ON flows (protocol, portLan, timestamp); CREATE INDEX ts4 ON flows (protocol, timestamp); and it appears it now uses the ts4 index, but the time spent is still ~ 5s sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan as item FROM flows WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol DESC LIMIT 6; 0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND timestamp: > > On 30 Mar 2015, at 10:46am, Jeff Roux wrote: > > > This daemon is running on a HP server with 12 cores, 32 GB of RAM, > > and a SSD drive. I have performance issues with some requests. For > > instance, the following request takes more than 5 seconds to > > accomplish with SQlite3 (in that particular case, the WHERE clause > > selects all the data in the database, i.e. 100 rows): > > > > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item > > FROM flows > > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN > (17, 6) > > GROUP BY portLan ORDER BY vol DESC LIMIT 6; > > What indexes do you have on that table ? I would expect to get best > results from one of the following indexes > > (portLan, protocol, timestamp) > (portLan, timestamp) > (protocol, portLan, timestamp) > (protocol, timestamp) > > For best results, CREATE all of those indexes and any others that strike > your fancy, run ANALYZE, then execute your SELECT. If this gives > satisfactory results, use > > EXPLAIN QUERY PLAN [your SELECT statement] > > and see which index SQLite decided to use. You can then delete the other > unused indexes and run ANALYZE one final time for a little more > optimization. > > It's worth noting that SQLite, since it has no server, has no server-level > caching. If you're doing repeated queries using mySQL, after the first one > most of the index is cached so the times you're seeing are faster than > SQLite can ever produce. Two disadvantages of mySQL are the memory usage > and the work involved in administration. > > Simon. > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Why is empty string not equal to empty blob?
> You explicitly put strings into both columns. The two values are the same > even though you declared the columns differently. As I told you before, > if you want to see what type something is, use typeof(thing). > > ~ > ~ > ~ > ~ > > Simon. Thank you guys. I follow this list so I may keep informed to the happenins with SQLite for my project. I have been having an issue on how to handle a defined DATE column. Be it NUMBER, or TEXT with the sqlite-jdbc. If its value is TEXT and I try to retrieve with getDate() a parsing error will occur if its format is not correct, but if I handle it with getString() no problem, don't care. These responses helped to clarify what is going on and how to deal with mixed type column content. typeof(thing). :) danap.
[sqlite] Performance issue
Hi everyone, I have a daemon that collects information and stores it in a SQLite database. The table has 1 million rows. This daemon is running on a HP server with 12 cores, 32 GB of RAM, and a SSD drive. I have performance issues with some requests. For instance, the following request takes more than 5 seconds to accomplish with SQlite3 (in that particular case, the WHERE clause selects all the data in the database, i.e. 100 rows): SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item FROM flows WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol DESC LIMIT 6; I have done some tests with or without "INDEXED BY" clauses and got nearly the same results. I compared the performance with a mySQL and the same request takes less than 1 second to accomplish. Could you give me some directions to optimize this kind of request with SQlite3 when there is a big amount of data in the table ? I need to increase 3 times the number of rows and the performance will become unacceptable for my application. Thanks in advance.
[sqlite] Performance issue
On 30 Mar 2015, at 10:46am, Jeff Roux wrote: > This daemon is running on a HP server with 12 cores, 32 GB of RAM, > and a SSD drive. I have performance issues with some requests. For > instance, the following request takes more than 5 seconds to > accomplish with SQlite3 (in that particular case, the WHERE clause > selects all the data in the database, i.e. 100 rows): > > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item > FROM flows > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6) > GROUP BY portLan ORDER BY vol DESC LIMIT 6; What indexes do you have on that table ? I would expect to get best results from one of the following indexes (portLan, protocol, timestamp) (portLan, timestamp) (protocol, portLan, timestamp) (protocol, timestamp) For best results, CREATE all of those indexes and any others that strike your fancy, run ANALYZE, then execute your SELECT. If this gives satisfactory results, use EXPLAIN QUERY PLAN [your SELECT statement] and see which index SQLite decided to use. You can then delete the other unused indexes and run ANALYZE one final time for a little more optimization. It's worth noting that SQLite, since it has no server, has no server-level caching. If you're doing repeated queries using mySQL, after the first one most of the index is cached so the times you're seeing are faster than SQLite can ever produce. Two disadvantages of mySQL are the memory usage and the work involved in administration. Simon.
[sqlite] SQL Syntax To Copy A Table
On Thu, Mar 26, 2015 at 4:29 PM, Nigel Verity wrote: > My requirement is to take periodic snapshots of a names and addresses > table, to be stored in the same database as the master. > Perhaps also look into https://www.sqlite.org/backup.html, which doesn't qualify for "the same database", but seems closer to the spirit of your requirement. --DD
[sqlite] Performance issue
On Mon, Mar 30, 2015 at 6:44 AM, Jeff Roux wrote: > Simon, > > here is the list of the indexes that were already defined on the table: > CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan); > CREATE INDEX idxProto ON flows(protocol); > CREATE INDEX idxTos ON flows(tos); > CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan, > portLan); > CREATE INDEX tsindex ON flows(timestamp); > Recommend that you drop tsindex. idxTsLanWan will work just as well. > > I added the ones you gave me: > CREATE INDEX ts1 ON flows (portLan, protocol, timestamp); > CREATE INDEX ts2 ON flows (portLan, timestamp); > CREATE INDEX ts3 ON flows (protocol, portLan, timestamp); > CREATE INDEX ts4 ON flows (protocol, timestamp); > If you decide to keep ts3 and/or ts4, then recommend that you drop idxProto. > > and it appears it now uses the ts4 index, but the time spent is still ~ 5s > How many rows satisfy the WHERE clause? You might get some relief using a covering index: CREATE INDEX ts5 ON flows(protocol,timestamp,portLan,nbBytesDecompOut,nbBytesCompln); > > > sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as > vol, portLan as item FROM flows WHERE timestamp>=1383770600 AND > timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol > DESC LIMIT 6; > 0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND > timestamp 0|0|0|EXECUTE LIST SUBQUERY 1 > 0|0|0|USE TEMP B-TREE FOR GROUP BY > 0|0|0|USE TEMP B-TREE FOR ORDER BY > > > > 2015-03-30 12:20 GMT+02:00 Simon Slavin : > > > > > On 30 Mar 2015, at 10:46am, Jeff Roux wrote: > > > > > This daemon is running on a HP server with 12 cores, 32 GB of RAM, > > > and a SSD drive. I have performance issues with some requests. For > > > instance, the following request takes more than 5 seconds to > > > accomplish with SQlite3 (in that particular case, the WHERE clause > > > selects all the data in the database, i.e. 100 rows): > > > > > > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item > > > FROM flows > > > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN > > (17, 6) > > > GROUP BY portLan ORDER BY vol DESC LIMIT 6; > > > > What indexes do you have on that table ? I would expect to get best > > results from one of the following indexes > > > > (portLan, protocol, timestamp) > > (portLan, timestamp) > > (protocol, portLan, timestamp) > > (protocol, timestamp) > > > > For best results, CREATE all of those indexes and any others that strike > > your fancy, run ANALYZE, then execute your SELECT. If this gives > > satisfactory results, use > > > > EXPLAIN QUERY PLAN [your SELECT statement] > > > > and see which index SQLite decided to use. You can then delete the other > > unused indexes and run ANALYZE one final time for a little more > > optimization. > > > > It's worth noting that SQLite, since it has no server, has no > server-level > > caching. If you're doing repeated queries using mySQL, after the first > one > > most of the index is cached so the times you're seeing are faster than > > SQLite can ever produce. Two disadvantages of mySQL are the memory usage > > and the work involved in administration. > > > > Simon. > > ___ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp drh at sqlite.org
[sqlite] Performance issue
What cache_size have you specified? --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why. >-Original Message- >From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- >bounces at mailinglists.sqlite.org] On Behalf Of Jeff Roux >Sent: Monday, 30 March, 2015 03:46 >To: sqlite-users at mailinglists.sqlite.org >Subject: [sqlite] Performance issue > >Hi everyone, > >I have a daemon that collects information and stores it in a SQLite >database. The table has 1 million rows. > >This daemon is running on a HP server with 12 cores, 32 GB of RAM, >and a SSD drive. I have performance issues with some requests. For >instance, the following request takes more than 5 seconds to >accomplish with SQlite3 (in that particular case, the WHERE clause >selects all the data in the database, i.e. 100 rows): > >SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item >FROM flows >WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN >(17, 6) >GROUP BY portLan ORDER BY vol DESC LIMIT 6; > >I have done some tests with or without "INDEXED BY" clauses and got >nearly the same results. > >I compared the performance with a mySQL and the same request takes >less than 1 second to accomplish. > >Could you give me some directions to optimize this kind of request >with SQlite3 when there is a big amount of data in the table ? I >need to increase 3 times the number of rows and the performance will >become unacceptable for my application. > >Thanks in advance. >___ >sqlite-users mailing list >sqlite-users at mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users