[sqlite] Performance issue

2015-03-30 Thread R.Smith


On 2015-03-30 11:46 AM, Jeff Roux wrote:
> Hi everyone,
>
> I have a daemon that collects information and stores it in a SQLite
> database. The table has 1 million rows.
>
> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> and a SSD drive. I have performance issues with some requests. For
> instance, the following request takes more than 5 seconds to
> accomplish with SQlite3 (in that particular case, the WHERE clause
> selects all the data in the database, i.e. 100 rows):
>
> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> FROM  flows
> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
> GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
> I have done some tests with or without "INDEXED BY" clauses and got
> nearly the same results.
>
> I compared the performance with a mySQL and the same request takes
> less than 1 second to accomplish.
>
> Could you give me some directions to optimize this kind of request
> with SQlite3 when there is a big amount of data in the table ? I
> need to increase 3 times the number of rows and the performance will
> become unacceptable for my application.

I think something else is wrong, we might need your Schema to better 
judge. The reason I say something else is wrong, is that I made this 
script quick to reproduce what I could establish from your writings 
might be what the schema looks like, and then added a single index and 
tried the query.

I deliberately added a primary key and then did not use it at all to 
avoid those advantages as it may not suit your needs.

You will see I used 3 million rows in stead of 1 million (because you 
said your data may grow three times bigger) and some random values in 
the appropriate ranges to fake the data (where I had an idea, and 
guessed the rest). The query takes less than 50 milliseconds, even over 
3 million rows. In fact the entire script creating the DB, inserting 3 
million rows, creating an index, and then doing 3 queries all took only 
18 seconds in total (without running analyze).  This will even go a bit 
faster if I turn stats off, but I left it in for clarity.

If this does not enlighten your quest, kindly provide your full schema 
and example data shapes which might give us a clue.

NOTE: In my data it works out that about ~12K rows satisfies the where 
clause - if this is significantly more in your case it might change the 
timing a bit, but nothing like your suggested times.
See results below:

   -- Processing Script for File: D:\Documents\TestScript3.txt
   -- Script Items: 7  Parameter Count: 0

   -- 2015-03-30 16:16:11.795  |  [Success]Script Started...
   -- 


DROP TABLE IF EXISTS flows;

   --Item Stats:  Item No:   1 Query Size 
(Chars):  29
   -- VM Work Steps: 92Rows 
Modified:   0
   -- Full Query Time:   0d 00h 00m and 00.485s
   -- Query Result:  Success.
   -- 


CREATE TABLE flows(
   id INTEGER PRIMARY KEY,
   ipLan TEXT,
   ipWan TEXT,
   portLan INT,
   portWan INT,
   protocol INT,
   nbBytesDecompOut INT,
   nbBytesCompIn INT,
   tos INT,
   timestamp INT
);

   --Item Stats:  Item No:   2 Query Size 
(Chars):  207
   -- VM Work Steps: 31Rows 
Modified:   0
   -- Full Query Time:   -- --- --- --- --.
   -- Query Result:  Success.
   -- 


/* POPULATE the table with random + scaled data to emulate an actual 
dataset of 3M rows */

WITH acc(x,mx8,dx8,mxK,dxK,rK) AS (
 SELECT 0,0,0,0,0,100
   UNION ALL
 SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), 
CAST(round(x/1024,0) AS INT),
CAST(abs(round(random()/10240,0)) AS INT)
   FROM acc
  WHERE (x<300)  -- Testing 3 million rows
)
INSERT INTO flows SELECT
   x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256),  -- ipLan, ipWan
   1000+mx8, 5000+mx8, (x%18),-- portlan, portWan, 
protocol
   64+(rk%1024000),   -- nbBytesDecompOut
   1024+(rk%1024000), -- nbBytesDecompIn
   (dx8%3), (138000+(x*10))   -- tos, timestamp
   FROM acc;


   --Item Stats:  Item No:   3 Query Size 
(Chars):  691
   -- Result Columns:0 Result 
Rows: 0
   -- VM Work Steps: 18087  Rows 
Modified:   301
   -- Full Query Time:   0d 00h 00m and 12.849s
   -- Query Result:  Success.
   -- 

[sqlite] Performance issue

2015-03-30 Thread Jeff Roux
Simon,

here is the list of the indexes that were already defined on the table:
CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
CREATE INDEX idxProto ON flows(protocol);
CREATE INDEX idxTos ON flows(tos);
CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
portLan);
CREATE INDEX tsindex ON flows(timestamp);

I added the ones you gave me:
CREATE INDEX ts1 ON flows (portLan, protocol, timestamp);
CREATE INDEX ts2 ON flows (portLan, timestamp);
CREATE INDEX ts3 ON flows (protocol, portLan, timestamp);
CREATE INDEX ts4 ON flows (protocol, timestamp);

 and it appears it now uses the ts4 index, but the time spent is still ~ 5s


sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as
vol, portLan as item FROM  flows  WHERE timestamp>=1383770600 AND
timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol
DESC LIMIT 6;
0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND
timestamp:

>
> On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:
>
> > This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> > and a SSD drive. I have performance issues with some requests. For
> > instance, the following request takes more than 5 seconds to
> > accomplish with SQlite3 (in that particular case, the WHERE clause
> > selects all the data in the database, i.e. 100 rows):
> >
> > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> > FROM  flows
> > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
> (17, 6)
> > GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
> What indexes do you have on that table ?  I would expect to get best
> results from one of the following indexes
>
> (portLan, protocol, timestamp)
> (portLan, timestamp)
> (protocol, portLan, timestamp)
> (protocol, timestamp)
>
> For best results, CREATE all of those indexes and any others that strike
> your fancy, run ANALYZE, then execute your SELECT.  If this gives
> satisfactory results, use
>
> EXPLAIN QUERY PLAN [your SELECT statement]
>
> and see which index SQLite decided to use.  You can then delete the other
> unused indexes and run ANALYZE one final time for a little more
> optimization.
>
> It's worth noting that SQLite, since it has no server, has no server-level
> caching.  If you're doing repeated queries using mySQL, after the first one
> most of the index is cached so the times you're seeing are faster than
> SQLite can ever produce.  Two disadvantages of mySQL are the memory usage
> and the work involved in administration.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Why is empty string not equal to empty blob?

2015-03-30 Thread dmp
 > You explicitly put strings into both columns.  The two values are the same
 > even though you declared the columns differently.  As I told you before,
 > if you want to see what type something is, use typeof(thing).
 >
 > ~
 > ~
 > ~
 > ~
 >
 > Simon.

Thank you guys. I follow this list so I may keep informed to the
happenins with SQLite for my project.

I have been having an issue on how to handle a defined DATE column.
Be it NUMBER, or TEXT with the sqlite-jdbc. If its value is TEXT
and I try to retrieve with getDate() a parsing error will occur
if its format is not correct, but if I handle it with getString()
no problem, don't care.

These responses helped to clarify what is going on and how to deal
with mixed type column content.

typeof(thing). :)

danap.


[sqlite] Performance issue

2015-03-30 Thread Jeff Roux
Hi everyone,

I have a daemon that collects information and stores it in a SQLite
database. The table has 1 million rows.

This daemon is running on a HP server with 12 cores, 32 GB of RAM,
and a SSD drive. I have performance issues with some requests. For
instance, the following request takes more than 5 seconds to
accomplish with SQlite3 (in that particular case, the WHERE clause
selects all the data in the database, i.e. 100 rows):

SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
FROM  flows
WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
GROUP BY portLan ORDER BY vol DESC LIMIT 6;

I have done some tests with or without "INDEXED BY" clauses and got
nearly the same results.

I compared the performance with a mySQL and the same request takes
less than 1 second to accomplish.

Could you give me some directions to optimize this kind of request
with SQlite3 when there is a big amount of data in the table ? I
need to increase 3 times the number of rows and the performance will
become unacceptable for my application.

Thanks in advance.


[sqlite] Performance issue

2015-03-30 Thread Simon Slavin

On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:

> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> and a SSD drive. I have performance issues with some requests. For
> instance, the following request takes more than 5 seconds to
> accomplish with SQlite3 (in that particular case, the WHERE clause
> selects all the data in the database, i.e. 100 rows):
> 
> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> FROM  flows
> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
> GROUP BY portLan ORDER BY vol DESC LIMIT 6;

What indexes do you have on that table ?  I would expect to get best results 
from one of the following indexes

(portLan, protocol, timestamp)
(portLan, timestamp)
(protocol, portLan, timestamp)
(protocol, timestamp)

For best results, CREATE all of those indexes and any others that strike your 
fancy, run ANALYZE, then execute your SELECT.  If this gives satisfactory 
results, use

EXPLAIN QUERY PLAN [your SELECT statement]

and see which index SQLite decided to use.  You can then delete the other 
unused indexes and run ANALYZE one final time for a little more optimization.

It's worth noting that SQLite, since it has no server, has no server-level 
caching.  If you're doing repeated queries using mySQL, after the first one 
most of the index is cached so the times you're seeing are faster than SQLite 
can ever produce.  Two disadvantages of mySQL are the memory usage and the work 
involved in administration.

Simon.


[sqlite] SQL Syntax To Copy A Table

2015-03-30 Thread Dominique Devienne
On Thu, Mar 26, 2015 at 4:29 PM, Nigel Verity 
wrote:

> My requirement is to take periodic snapshots of a names and addresses
> table, to be stored in the same database as the master.
>

Perhaps also look into https://www.sqlite.org/backup.html, which doesn't
qualify for "the same database", but seems closer to the spirit of your
requirement. --DD


[sqlite] Performance issue

2015-03-30 Thread Richard Hipp
On Mon, Mar 30, 2015 at 6:44 AM, Jeff Roux  wrote:

> Simon,
>
> here is the list of the indexes that were already defined on the table:
> CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
> CREATE INDEX idxProto ON flows(protocol);
> CREATE INDEX idxTos ON flows(tos);
> CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
> portLan);
> CREATE INDEX tsindex ON flows(timestamp);
>

Recommend that you drop tsindex.  idxTsLanWan will work just as well.


>
> I added the ones you gave me:
> CREATE INDEX ts1 ON flows (portLan, protocol, timestamp);
> CREATE INDEX ts2 ON flows (portLan, timestamp);
> CREATE INDEX ts3 ON flows (protocol, portLan, timestamp);
> CREATE INDEX ts4 ON flows (protocol, timestamp);
>

If you decide to keep ts3 and/or ts4, then recommend that you drop idxProto.


>
>  and it appears it now uses the ts4 index, but the time spent is still ~ 5s
>

How many rows satisfy the WHERE clause?

You might get some relief using a covering index:

CREATE INDEX ts5 ON
flows(protocol,timestamp,portLan,nbBytesDecompOut,nbBytesCompln);


>
>
> sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as
> vol, portLan as item FROM  flows  WHERE timestamp>=1383770600 AND
> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol
> DESC LIMIT 6;
> 0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND
> timestamp 0|0|0|EXECUTE LIST SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
>
> 2015-03-30 12:20 GMT+02:00 Simon Slavin :
>
> >
> > On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:
> >
> > > This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> > > and a SSD drive. I have performance issues with some requests. For
> > > instance, the following request takes more than 5 seconds to
> > > accomplish with SQlite3 (in that particular case, the WHERE clause
> > > selects all the data in the database, i.e. 100 rows):
> > >
> > > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> > > FROM  flows
> > > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
> > (17, 6)
> > > GROUP BY portLan ORDER BY vol DESC LIMIT 6;
> >
> > What indexes do you have on that table ?  I would expect to get best
> > results from one of the following indexes
> >
> > (portLan, protocol, timestamp)
> > (portLan, timestamp)
> > (protocol, portLan, timestamp)
> > (protocol, timestamp)
> >
> > For best results, CREATE all of those indexes and any others that strike
> > your fancy, run ANALYZE, then execute your SELECT.  If this gives
> > satisfactory results, use
> >
> > EXPLAIN QUERY PLAN [your SELECT statement]
> >
> > and see which index SQLite decided to use.  You can then delete the other
> > unused indexes and run ANALYZE one final time for a little more
> > optimization.
> >
> > It's worth noting that SQLite, since it has no server, has no
> server-level
> > caching.  If you're doing repeated queries using mySQL, after the first
> one
> > most of the index is cached so the times you're seeing are faster than
> > SQLite can ever produce.  Two disadvantages of mySQL are the memory usage
> > and the work involved in administration.
> >
> > Simon.
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Performance issue

2015-03-30 Thread Keith Medcalf

What cache_size have you specified?

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-Original Message-
>From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>bounces at mailinglists.sqlite.org] On Behalf Of Jeff Roux
>Sent: Monday, 30 March, 2015 03:46
>To: sqlite-users at mailinglists.sqlite.org
>Subject: [sqlite] Performance issue
>
>Hi everyone,
>
>I have a daemon that collects information and stores it in a SQLite
>database. The table has 1 million rows.
>
>This daemon is running on a HP server with 12 cores, 32 GB of RAM,
>and a SSD drive. I have performance issues with some requests. For
>instance, the following request takes more than 5 seconds to
>accomplish with SQlite3 (in that particular case, the WHERE clause
>selects all the data in the database, i.e. 100 rows):
>
>SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
>FROM  flows
>WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
>(17, 6)
>GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
>I have done some tests with or without "INDEXED BY" clauses and got
>nearly the same results.
>
>I compared the performance with a mySQL and the same request takes
>less than 1 second to accomplish.
>
>Could you give me some directions to optimize this kind of request
>with SQlite3 when there is a big amount of data in the table ? I
>need to increase 3 times the number of rows and the performance will
>become unacceptable for my application.
>
>Thanks in advance.
>___
>sqlite-users mailing list
>sqlite-users at mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users