Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-25 Thread Donald Griggs
Hi Navin,

Excuse me if some of the points below repeat things you already know.

1. Dr. Hipp's advice not to create redundant indexes was *not* intended to
give you very quick row counts -- Simon Slavin et al had already given
advice to speed up row counts -- and just now Stefen Keller even provided
trigger sql statements for you.   The trigger solution will result in
almost instantaneous row counts (at the expense of slightly slower inserts
and deletes.)


2. Alternatively, since you write that you only do deletes every 5 days,
then if by chance the time for running VACUUM just after these periodic
batch deletes is tolerable, then
   SELECT max(ROWID) FROM hp_table1;
should give you instant satisfaction without using the TRIGGERs.  (if you
choose this solution you would NOT specify WITHOUT ROWID of course.)

3. We're all assuming that your application truly NEEDS to know the row
count (and that you're not just using "select count(*) from hp_table1" as
some "arbitrary test sql" for sqlite.)

4. The fact that
  select count(*) from very_large_table;
can be slow is *not* evidence that sqlite can't handle tables as large as
yours.   Sqlite can update its b-trees till the cows come home.   It was a
design trade-off.  Sqlite could have been designed to always keep up with
table sizes, but inserts and deletes would be slightly slower for all
tables for all users, and in a great many applications it's not needed.  On
the other hand, applications that *do* need quick access to the current row
count can have it using triggers.Other databases might have been
designed differently.(There *are* however, many applications ill-suited
to sqlite -- often because of high concurrency needs or the need for fancy
features, but row counting is not a veto item.
https://www.sqlite.org/whentouse.html)

5. IMPORTANTLY -- I see you tried different page sizes, but did you did you
also set a large CACHE size?   E.g., "PRAGMA CACHE_SIZE= -100"
would set the cache to about 1 GByte.   Did I see that your postgres test
was using 4GB?   You'd want to compare using about the same cache I would
think. (A large cache alone does NOT substitute for one of the "fast
row count" solutions above, but it might make a big difference in your
other operations.)
https://www.sqlite.org/pragma.html#pragma_cache_size

Regards,
Donald Griggs
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-25 Thread Stefan Keller
Hi Navin

I've compared with PostgreSQL. It's twice as as "fast" as SQLite with
100 mio. records on my old laptop - but still too slow using count().

So, as Eduardo suggested, you have to solve this problem with a
separate table and triggers, like shown below.

Yours, S.

-- Create test table
CREATE TABLE test (id INTEGER PRIMARY KEY, r REAL, t TEXT );

-- Fill test table with test data...!
INSERT INTO test (r,t)
  WITH RECURSIVE
cte(r,t) AS (
   SELECT random(), hex(random()*1000)
   UNION ALL
   SELECT random(), hex(random()*1000)
 FROM cte
 LIMIT 1000 -- 10 mio.
  )
  SELECT * FROM cte;

-- Create auxiliary table
CREATE TABLE count_statistics (
  table_name TEXT not NULL,
  row_count INTEGER
);

CREATE TRIGGER count_statistics_delete_test
BEFORE DELETE ON 'test'
FOR EACH ROW BEGIN
  UPDATE count_statistics SET row_count=row_count-1 WHERE table_name='test';
END;

CREATE TRIGGER count_statistics_insert_test
BEFORE INSERT ON 'test'
FOR EACH ROW BEGIN
  UPDATE count_statistics SET row_count=row_count+1 WHERE table_name='test';
END;

-- initialize/update/reset count_statistics
UPDATE count_statistics SET row_count=(
  SELECT count(*) FROM test
) WHERE table_name='test';


.timer on
SELECT max(id) FROM test;
SELECT count(*) FROM test; -- slow!

-- This is the count replacement:
SELECT row_count FROM count_statistics WHERE table_name='test'; -- fast!

-- Done.

2015-01-25 20:05 GMT+01:00 Navin S Parakkal :
>
> On Saturday 24 January 2015 03:15 PM, Stefan Keller wrote:
>>
>> I think it's time for a serious simple benchmark with sqlite and say
>> PostgreSQL.
>> PostgeSQL also had performance problems time ago but this has been
>> resolved.
>> Can you describe the hp_table1 schema (CREATE TABLE statement...) and
>> some data (INSERTs)?
>>
>>
> Yes though a avid fan of sqlite, have to reconsider it for scalability
> issues. Everything about the table is present in the archives and thread .
>
>
>
> On Sunday 25 January 2015 06:17 AM, Eduardo Morras wrote:
>>
>> You can use PostgreSQL, using part of Oracle licence cost for better
>> hardware and a dinner for the team.
>>
> Yes that is an option we have to consider seriously. I thought we were doing
> something wrong like creating the index or maybe tuning page size.
>
>
> I'm still stuck.
>
>
>
>
> I maybe completely wrong but i'm guessing your B-tree index is static.
> Something in that doesn't support dynamic updation or some bug causing it
> have linear update .
>
>
> http://pastebin.com/davqKcF8
>
>
>
> Reposting below if you have missed it.
>
>
> My process.csv is around 27G. I've gzipped it and put
> atftp://navinps:sqlit...@h2.usa.hp.com  as process.csv.gz
>
> There is only 1 file there.
> md5sum process.csv.gz
> e77a322744a26d4c8a1ad4d61a84ee72  process.csv.gz
>
>  [root@centosnavin sqlite-autoconf-3080801]# cat sqlite3commands.txt
> CREATE TABLE [hp_table1] ( InstanceId INTEGER, LogTime INTEGER, ArrivalTime
> INTEGER, CollectionTime INTEGER, [dml_PROC_TIME] TIME, [dml_PROC_INTERVAL]
> INTEGER, [dml_PROC_INTEREST] TEXT, [dml_PROC_TOP_CPU_INDEX] INTEGER,
> [dml_PROC_TOP_DISK_INDEX] INTEGER, [dml_PROC_STATE_FLAG] INTEGER,
> [dml_PROC_RUN_TIME] REAL, [dml_PROC_STOP_REASON_FLAG] INTEGER,
> [dml_PROC_INTERVAL_ALIVE] INTEGER, [dml_PROC_STOP_REASON] TEXT,
> [dml_PROC_STATE] TEXT, [dml_PROC_PRI] INTEGER, [dml_PROC_NICE_PRI] INTEGER,
> [dml_PROC_CPU_LAST_USED] INTEGER, [dml_PROC_CPU_SWITCHES] INTEGER,
> [dml_PROC_IO_BYTE] REAL, [dml_PROC_VOLUNTARY_CSWITCH] INTEGER,
> [dml_PROC_FORCED_CSWITCH] INTEGER, [dml_PROC_IO_BYTE_RATE] REAL,
> [dml_PROC_CPU_TOTAL_UTIL] REAL, [dml_PROC_CPU_TOTAL_TIME] REAL,
> [dml_PROC_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYS_MODE_TIME] REAL,
> [dml_PROC_CPU_USER_MODE_UTIL] REAL, [dml_PROC_CPU_USER_MODE_TIME] REAL,
> [dml_PROC_THREAD_COUNT] INTEGER, [dml_PROC_CPU_ALIVE_TOTAL_UTIL] REAL ,
> [dml_PROC_CPU_ALIVE_USER_MODE_UTIL]  REAL,
> [dml_PROC_CPU_ALIVE_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYSCALL_UTIL] REAL,
> [dml_PROC_CPU_SYSCALL_TIME] REAL, [dml_PROC_CHILD_CPU_USER_MODE_UTIL] REAL,
> [dml_PROC_CHILD_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CHILD_CPU_TOTAL_UTIL]
> REAL, [dml_PROC_DISK_PHYS_READ] INTEGER, [dml_PROC_DISK_PHYS_READ_RATE]
> REAL, [dml_PROC_DISK_PHYS_WRITE] INTEGER, [dml_PROC_DISK_PHYS_WRITE_RATE]
> REAL, [dml_PROC_DISK_PHYS_IO_RATE] REAL, [dml_PROC_MEM_RES] REAL,
> [dml_PROC_MEM_SHARED_RES] REAL, [dml_PROC_MEM_VIRT] REAL,
> [dml_PROC_MEM_DATA_VIRT] REAL, [dml_PROC_MEM_STACK_VIRT] REAL,
> [dml_PROC_PAGEFAULT] INTEGER, [dml_PROC_PAGEFAULT_RATE] REAL,
> [dml_PROC_MINOR_FAULT] INTEGER, [dml_PROC_MAJOR_FAULT] INTEGER,
> [dml_PROC_MEM_LOCKED] REAL, [dml_PROC_DISK_SUBSYSTEM_WAIT_PCT] REAL,
> [dml_PROC_DISK_SUBSYSTEM_WAIT_TIME] REAL, [dml_PROC_PRI_WAIT_PCT] REAL,
> [dml_PROC_PRI_WAIT_TIME] REAL, PRIMARY KEY (InstanceId, CollectionTime))
> WITHOUT ROWID ;
>
>
> Also pasted with details at :
>
> http://pastebin.com/davqKcF8
>
>
> Thanks,
> Navin
>
>
>
> 

Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-25 Thread Navin S Parakkal


On Saturday 24 January 2015 03:15 PM, Stefan Keller wrote:

I think it's time for a serious simple benchmark with sqlite and say PostgreSQL.
PostgeSQL also had performance problems time ago but this has been resolved.
Can you describe the hp_table1 schema (CREATE TABLE statement...) and
some data (INSERTs)?


Yes though a avid fan of sqlite, have to reconsider it for scalability 
issues. Everything about the table is present in the archives and thread .




On Sunday 25 January 2015 06:17 AM, Eduardo Morras wrote:

You can use PostgreSQL, using part of Oracle licence cost for better hardware 
and a dinner for the team.

Yes that is an option we have to consider seriously. I thought we were 
doing something wrong like creating the index or maybe tuning page size.



I'm still stuck.




I maybe completely wrong but i'm guessing your B-tree index is static. 
Something in that doesn't support dynamic updation or some bug causing 
it have linear update .



http://pastebin.com/davqKcF8



Reposting below if you have missed it.


My process.csv is around 27G. I've gzipped it and put 
atftp://navinps:sqlit...@h2.usa.hp.com  as process.csv.gz

There is only 1 file there.
md5sum process.csv.gz
e77a322744a26d4c8a1ad4d61a84ee72  process.csv.gz

 [root@centosnavin sqlite-autoconf-3080801]# cat sqlite3commands.txt
CREATE TABLE [hp_table1] ( InstanceId INTEGER, LogTime INTEGER, ArrivalTime INTEGER, CollectionTime INTEGER, [dml_PROC_TIME] TIME, [dml_PROC_INTERVAL] INTEGER, [dml_PROC_INTEREST] TEXT, [dml_PROC_TOP_CPU_INDEX] INTEGER, [dml_PROC_TOP_DISK_INDEX] INTEGER, [dml_PROC_STATE_FLAG] INTEGER, [dml_PROC_RUN_TIME] REAL, [dml_PROC_STOP_REASON_FLAG] INTEGER, [dml_PROC_INTERVAL_ALIVE] INTEGER, [dml_PROC_STOP_REASON] TEXT, [dml_PROC_STATE] TEXT, [dml_PROC_PRI] INTEGER, [dml_PROC_NICE_PRI] INTEGER, [dml_PROC_CPU_LAST_USED] INTEGER, [dml_PROC_CPU_SWITCHES] INTEGER, [dml_PROC_IO_BYTE] REAL, [dml_PROC_VOLUNTARY_CSWITCH] INTEGER, [dml_PROC_FORCED_CSWITCH] INTEGER, [dml_PROC_IO_BYTE_RATE] REAL, [dml_PROC_CPU_TOTAL_UTIL] REAL, [dml_PROC_CPU_TOTAL_TIME] REAL, [dml_PROC_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYS_MODE_TIME] REAL, [dml_PROC_CPU_USER_MODE_UTIL] REAL, [dml_PROC_CPU_USER_MODE_TIME] REAL, [dml_PROC_THREAD_COUNT] INTEGER, [dml_PROC_CPU_ALIVE_TOTAL_UTIL] REAL , [dml_PROC_CPU_ALIVE_USER_MODE_UTIL] 
 REAL, [dml_PROC_CPU_ALIVE_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYSCALL_UTIL] REAL, [dml_PROC_CPU_SYSCALL_TIME] REAL, [dml_PROC_CHILD_CPU_USER_MODE_UTIL] REAL, [dml_PROC_CHILD_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CHILD_CPU_TOTAL_UTIL] REAL, [dml_PROC_DISK_PHYS_READ] INTEGER, [dml_PROC_DISK_PHYS_READ_RATE] REAL, [dml_PROC_DISK_PHYS_WRITE] INTEGER, [dml_PROC_DISK_PHYS_WRITE_RATE] REAL, [dml_PROC_DISK_PHYS_IO_RATE] REAL, [dml_PROC_MEM_RES] REAL, [dml_PROC_MEM_SHARED_RES] REAL, [dml_PROC_MEM_VIRT] REAL, [dml_PROC_MEM_DATA_VIRT] REAL, [dml_PROC_MEM_STACK_VIRT] REAL, [dml_PROC_PAGEFAULT] INTEGER, [dml_PROC_PAGEFAULT_RATE] REAL, [dml_PROC_MINOR_FAULT] INTEGER, [dml_PROC_MAJOR_FAULT] INTEGER, [dml_PROC_MEM_LOCKED] REAL, [dml_PROC_DISK_SUBSYSTEM_WAIT_PCT] REAL, [dml_PROC_DISK_SUBSYSTEM_WAIT_TIME] REAL, [dml_PROC_PRI_WAIT_PCT] REAL, [dml_PROC_PRI_WAIT_TIME] REAL, PRIMARY KEY (InstanceId, CollectionTime)) WITHOUT ROWID ;



Also pasted with details at :

http://pastebin.com/davqKcF8


Thanks,
Navin


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-25 Thread Navin S Parakkal

Hi,

On Monday 26 January 2015 12:35 AM, Navin S Parakkal wrote:


On Saturday 24 January 2015 03:15 PM, Stefan Keller wrote:
I think it's time for a serious simple benchmark with sqlite and say 
PostgreSQL.
PostgeSQL also had performance problems time ago but this has been 
resolved.

Can you describe the hp_table1 schema (CREATE TABLE statement...) and
some data (INSERTs)?


Yes though a avid fan of sqlite, have to reconsider it for scalability 
issues. Everything about the table is present in the archives and 
thread .




On Sunday 25 January 2015 06:17 AM, Eduardo Morras wrote:
You can use PostgreSQL, using part of Oracle licence cost for better 
hardware and a dinner for the team.


Yes that is an option we have to consider seriously. I thought we were 
doing something wrong like creating the index or maybe tuning page size.



I'm still stuck.




I maybe completely wrong but i'm guessing your B-tree index is static. 
Something in that doesn't support dynamic updation or some bug causing 
it have linear update .



http://pastebin.com/davqKcF8



Reposting below if you have missed it.


My process.csv is around 27G. I've gzipped it and put 
atftp://navinps:sqlit...@h2.usa.hp.com as process.csv.gz


There is only 1 file there.
md5sum process.csv.gz
e77a322744a26d4c8a1ad4d61a84ee72  process.csv.gz

 [root@centosnavin sqlite-autoconf-3080801]# cat sqlite3commands.txt
CREATE TABLE [hp_table1] ( InstanceId INTEGER, LogTime INTEGER, 
ArrivalTime INTEGER, CollectionTime INTEGER, [dml_PROC_TIME] TIME, 
[dml_PROC_INTERVAL] INTEGER, [dml_PROC_INTEREST] TEXT, 
[dml_PROC_TOP_CPU_INDEX] INTEGER, [dml_PROC_TOP_DISK_INDEX] INTEGER, 
[dml_PROC_STATE_FLAG] INTEGER, [dml_PROC_RUN_TIME] REAL, 
[dml_PROC_STOP_REASON_FLAG] INTEGER, [dml_PROC_INTERVAL_ALIVE] 
INTEGER, [dml_PROC_STOP_REASON] TEXT, [dml_PROC_STATE] TEXT, 
[dml_PROC_PRI] INTEGER, [dml_PROC_NICE_PRI] INTEGER, 
[dml_PROC_CPU_LAST_USED] INTEGER, [dml_PROC_CPU_SWITCHES] INTEGER, 
[dml_PROC_IO_BYTE] REAL, [dml_PROC_VOLUNTARY_CSWITCH] INTEGER, 
[dml_PROC_FORCED_CSWITCH] INTEGER, [dml_PROC_IO_BYTE_RATE] REAL, 
[dml_PROC_CPU_TOTAL_UTIL] REAL, [dml_PROC_CPU_TOTAL_TIME] REAL, 
[dml_PROC_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYS_MODE_TIME] REAL, 
[dml_PROC_CPU_USER_MODE_UTIL] REAL, [dml_PROC_CPU_USER_MODE_TIME] 
REAL, [dml_PROC_THREAD_COUNT] INTEGER, [dml_PROC_CPU_ALIVE_TOTAL_UTIL] 
REAL , [dml_PROC_CPU_ALIVE_USER_MODE_UTIL]  REAL, 
[dml_PROC_CPU_ALIVE_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYSCALL_UTIL] 
REAL, [dml_PROC_CPU_SYSCALL_TIME] REAL, 
[dml_PROC_CHILD_CPU_USER_MODE_UTIL] REAL, 
[dml_PROC_CHILD_CPU_SYS_MODE_UTIL] REAL, 
[dml_PROC_CHILD_CPU_TOTAL_UTIL] REAL, [dml_PROC_DISK_PHYS_READ] 
INTEGER, [dml_PROC_DISK_PHYS_READ_RATE] REAL, 
[dml_PROC_DISK_PHYS_WRITE] INTEGER, [dml_PROC_DISK_PHYS_WRITE_RATE] 
REAL, [dml_PROC_DISK_PHYS_IO_RATE] REAL, [dml_PROC_MEM_RES] REAL, 
[dml_PROC_MEM_SHARED_RES] REAL, [dml_PROC_MEM_VIRT] REAL, 
[dml_PROC_MEM_DATA_VIRT] REAL, [dml_PROC_MEM_STACK_VIRT] REAL, 
[dml_PROC_PAGEFAULT] INTEGER, [dml_PROC_PAGEFAULT_RATE] REAL, 
[dml_PROC_MINOR_FAULT] INTEGER, [dml_PROC_MAJOR_FAULT] INTEGER, 
[dml_PROC_MEM_LOCKED] REAL, [dml_PROC_DISK_SUBSYSTEM_WAIT_PCT] REAL, 
[dml_PROC_DISK_SUBSYSTEM_WAIT_TIME] REAL, [dml_PROC_PRI_WAIT_PCT] 
REAL, [dml_PROC_PRI_WAIT_TIME] REAL, PRIMARY KEY (InstanceId, 
CollectionTime)) WITHOUT ROWID ;



Also pasted with details at :

http://pastebin.com/davqKcF8


Thanks,
Navin




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-24 Thread Eduardo Morras
On Tue, 20 Jan 2015 12:12:00 +
"Parakkal, Navin S (Software Engineer)"  wrote:

> Hello,
> 
>I've few questions about sqlite3 , the database it creates.
> Actually I'm finding lot of differences in performance.
> 
>  My story:
> I have this sqlite3 database called  hp.db which is like 100+
> million  records for table1. The size of hp.db on Linux x64 (CentOS
> 7) is like 16 GB. When I do a select count(*) on hp_table1 it takes
> more than 5 mins which is quite a huge time. This file is static for
> now ie we copied it from a production server for analysis. Now I
> create a index by create index nvnhpindex on hp_table1
> (column1,column2) . The primary key of the table hp_table1 is
> (column1,column2). It takes around some time (maybe 20 minutes or
> less , I went for lunch and came back , really didn't note the time).
> Now I do select count(*) on hp_table1 , it takes around 15 secs. This
> is what we want our objective. We want it to be fast. The create
> index has increased the size of hp.db to 18 GB. This is OK with us
> and the customers.
> 
> The problem is this is not a static database. We keep inserting data
> (insert rows every 10 secs or like atleast 1 minute ) and
> occassionally delete rows (like once in 5 days). This is a 24x7
> system.
> 
> 
> So to identify the problem , I created a empty_database similar to
> hp.db with no rows. I created a index on column1,column2 on
> empty_table1 inside empty_database.
> 
> Now I inserted the rows from csv (this CSV was created by .mode csv,
> output myhp.csv, select * from hp_table1).
> 
> The size of database is around 18GB (empty_database) with rows. Now I
> do a select count(*) on empty_table1 (actually it contains lots of
> rows like 100M+ records ) and it takes more than 5 mins. 5 mins is
> too much of a time for us to bear. The customer wants the information
> within a minute.
> 
> 
> Can you please help in resolving this problem ? We are planning to
> deploy this across 1+ nodes on Linux x64 on one customer and many
> other customers are going in the similar direction. 
> 
> How do we go about resolving this ie what should we do to create a
> table with sub minute access for 100-500 million . How do we create
> the indexes ? Any other performance incentives.

Use a trigger on insert and a trigger on delete that modifies a value on 
another table with current count(*) number. Table can be temporal if you want 
and stay in memory, but you should do a count(*) on application startup.

> Some say we should buy/use Oracle but I just am holding onto sqlite3
> assuming it would help me solve our problem.

You can use PostgreSQL, using part of Oracle licence cost for better hardware 
and a dinner for the team.

> Regards,
> Navin
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-24 Thread Stefan Keller
Hi,

Relying on sequence will not work (and is a wrong hack) since the use
case includes deleting rows explicitly.

I think it's time for a serious simple benchmark with sqlite and say PostgreSQL.
PostgeSQL also had performance problems time ago but this has been resolved.
Can you describe the hp_table1 schema (CREATE TABLE statement...) and
some data (INSERTs)?

Yours, S.


2015-01-24 10:33 GMT+01:00 Clemens Ladisch :
> Jim Wilcoxson wrote:
>> If you have a table where rows are inserted but never deleted, and you
>> have a rowid column, you can use this:
>>
>> select seq from sqlite_sequence where name = 'tablename'
>
> This works only for an AUTOINCREMENT column.
>
>> This will return instantly, without scanning any rows or indexes, and
>> is much faster than max(rowid) for huge tables.
>
> Max(rowid) has a special optimization and looks only at the last entry
> in the index.  It is what SQLite uses internally for tables without
> AUTOINCREMENT, and is actually faster than looking up the sequence value
> in a separate table.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-24 Thread Clemens Ladisch
Jim Wilcoxson wrote:
> If you have a table where rows are inserted but never deleted, and you
> have a rowid column, you can use this:
>
> select seq from sqlite_sequence where name = 'tablename'

This works only for an AUTOINCREMENT column.

> This will return instantly, without scanning any rows or indexes, and
> is much faster than max(rowid) for huge tables.

Max(rowid) has a special optimization and looks only at the last entry
in the index.  It is what SQLite uses internally for tables without
AUTOINCREMENT, and is actually faster than looking up the sequence value
in a separate table.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-23 Thread Jim Wilcoxson
If you have a table where rows are inserted but never deleted, and you
have a rowid column, you can use this:

select seq from sqlite_sequence where name = 'tablename'

This will return instantly, without scanning any rows or indexes, and
is much faster than max(rowid) for huge tables.

If no rows have been inserted, you will get NULL.  If rows have been
inserted, you will get back the last rowid inserted.

Jim


Simon wrote:

If this is a table for which rows are inserted but never deleted, then
you will find that

SELECT max(rowid) FROM hp_table1

returns the same value almost immediately.  Perhaps value-1, but
whatever it is it'll be consistent.

-- 
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-20 Thread Simon Slavin

On 20 Jan 2015, at 12:12pm, Parakkal, Navin S (Software Engineer) 
 wrote:

>When I do a select count(*) on hp_table1 it takes more than 5 mins which 
> is quite a huge time.

If this is a table for which rows are inserted but never deleted, then you will 
find that

SELECT max(rowid) FROM hp_table1

returns the same value almost immediately.  Perhaps value-1, but whatever it is 
it'll be consistent.

> Now I create a index [snip]

> Now I do select count(*) on hp_table1 , it takes around 15 secs.
> 

> [...]

> 

> The size of database is around 18GB (empty_database) with rows. Now I do a 
> select count(*) on empty_table1 (actually it contains lots of rows like 100M+ 
> records ) and it takes more than 5 mins. 5 mins is too much of a time for us 
> to bear. The customer wants the information within a minute.

What information ?  The number of rows in a table ?  That's rarely important 
information for a user.  It looks more like the kind of information a database 
admin would want.

The problem is that an unusual aspect of how SQLite works means that the total 
number of rows for a table is not stored anywhere.  To calculate count(*) it 
has to look at every row in the table.  It can do this by looking at every row 
in a table and counting the entries, which is what it did originally to take 5 
minutes.

But a full index on the table has the same number of entries but involves 
handling less data, and SQLite knows that counting the entries in the index you 
created would be faster.  So once you had created the index to find count(*) it 
went through all the entries in that index instead, which took it just 15 
seconds.

> How do we go about resolving this ie what should we do to create a table with 
> sub minute access for 100-500 million .

Access is not the problem here.  SQLite can access any row in a table that big 
in a few milliseconds.  The problem here is the specific function of counting 
every row which is something SQLite does not do quickly.

Take a look at the 'max(rowid)' trick I showed above.

If that's no good for you, you use TRIGGERs which add 1 for each INSERT and 
delete 1 for each DELETE to keep track of the number of rows.  You would create 
another table to keep the total counts in.  This would decrease the time taken 
to return the counts at the cost of increasing the time taken to insert and 
delete rows.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-20 Thread Parakkal, Navin S (Software Engineer)
Hello,

   I've few questions about sqlite3 , the database it creates. Actually I'm 
finding lot of differences in performance.

 My story:
I have this sqlite3 database called  hp.db which is like 100+ million  
records for table1. The size of hp.db on Linux x64 (CentOS 7) is like 16 GB.
When I do a select count(*) on hp_table1 it takes more than 5 mins which is 
quite a huge time.
This file is static for now ie we copied it from a production server for 
analysis.
Now I create a index by create index nvnhpindex on hp_table1 
(column1,column2) . The primary key of the table hp_table1 is (column1,column2).
It takes around some time (maybe 20 minutes or less , I went for lunch and 
came back , really didn't note the time).

Now I do select count(*) on hp_table1 , it takes around 15 secs. This is what 
we want our objective. We want it to be fast. The create index has increased 
the size of hp.db to 18 GB. This is OK with us and the customers.

The problem is this is not a static database. We keep inserting data (insert 
rows every 10 secs or like atleast 1 minute ) and occassionally delete rows 
(like once in 5 days). This is a 24x7 system.


So to identify the problem , I created a empty_database similar to hp.db with 
no rows.
I created a index on column1,column2 on empty_table1 inside empty_database.

Now I inserted the rows from csv (this CSV was created by .mode csv, output 
myhp.csv, select * from hp_table1).

The size of database is around 18GB (empty_database) with rows. Now I do a 
select count(*) on empty_table1 (actually it contains lots of rows like 100M+ 
records ) and it takes more than 5 mins. 5 mins is too much of a time for us to 
bear. The customer wants the information within a minute.


Can you please help in resolving this problem ? We are planning to deploy this 
across 1+ nodes on Linux x64 on one customer and many other customers are 
going in the similar direction. 

How do we go about resolving this ie what should we do to create a table with 
sub minute access for 100-500 million . How do we create the indexes ? Any 
other performance incentives.

Some say we should buy/use Oracle but I just am holding onto sqlite3 assuming 
it would help me solve our problem.


Regards,
Navin

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-20 Thread Richard Hipp
On 1/20/15, Parakkal, Navin S (Software Engineer)  wrote:
> Hello,
>
>I've few questions about sqlite3 , the database it creates. Actually I'm
> finding lot of differences in performance.
>
>  My story:
> I have this sqlite3 database called  hp.db which is like 100+ million
> records for table1. The size of hp.db on Linux x64 (CentOS 7) is like 16
> GB.
> When I do a select count(*) on hp_table1 it takes more than 5 mins which
> is quite a huge time.
> This file is static for now ie we copied it from a production server for
> analysis.
> Now I create a index by create index nvnhpindex on hp_table1
> (column1,column2) . The primary key of the table hp_table1 is
> (column1,column2).

Are you saying that you have a table like this:

CREATE TABLE hp_table1(
 column1 TEXT,
 column2 TEXT,
 other_columns MISC,
 PRIMARY KEY(column1,column2)
);

And then you are doing:

CREATE INDEX idx1 ON hp_table1(column1,column2);

Don't do that!!!  The index is redundant.  You already have a primary
key on those two columns.  The primary key is sufficient.  The index
just makes your database bigger and slower.

For a multi-column primary key, your best performance will (probably)
result if you say:

CREATE TABLE hp_table1(
 column1 TEXT,
 column2 TEXT,
 other_columns MISC,
 PRIMARY KEY(column1,column2)
) WITHOUT ROWID;

Note the use of WITHOUT ROWID at the end of the table declaration.
Note also the absence of any indexes.  This will likely make a big
size and performance difference for you.

I say "probably" because there are cases where it might be better to
omit the WITHOUT ROWID - specifically if "other_columns" contain very
large strings and/or blobs - larger than about 1/5th of your page
size.  You can run experiments using and omitting WITHOUT ROWID to see
which form works best on your system.


> It takes around some time (maybe 20 minutes or less , I went for lunch
> and came back , really didn't note the time).
>
> Now I do select count(*) on hp_table1 , it takes around 15 secs. This is
> what we want our objective. We want it to be fast. The create index has
> increased the size of hp.db to 18 GB. This is OK with us and the customers.
>
> The problem is this is not a static database. We keep inserting data (insert
> rows every 10 secs or like atleast 1 minute ) and occassionally delete rows
> (like once in 5 days). This is a 24x7 system.
>
>
> So to identify the problem , I created a empty_database similar to hp.db
> with no rows.
> I created a index on column1,column2 on empty_table1 inside empty_database.
>
> Now I inserted the rows from csv (this CSV was created by .mode csv, output
> myhp.csv, select * from hp_table1).
>
> The size of database is around 18GB (empty_database) with rows. Now I do a
> select count(*) on empty_table1 (actually it contains lots of rows like
> 100M+ records ) and it takes more than 5 mins. 5 mins is too much of a time
> for us to bear. The customer wants the information within a minute.
>
>
> Can you please help in resolving this problem ? We are planning to deploy
> this across 1+ nodes on Linux x64 on one customer and many other
> customers are going in the similar direction.
>
> How do we go about resolving this ie what should we do to create a table
> with sub minute access for 100-500 million . How do we create the indexes ?
> Any other performance incentives.
>
> Some say we should buy/use Oracle but I just am holding onto sqlite3
> assuming it would help me solve our problem.
>
>
> Regards,
> Navin
>
>


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users