Repost:  Since it didn't get into the archives or in the mailing list. Sorry 
about that.
Quoted and replied to simon after [Repost End]


Hello,


[Repost Begin]

My Process.csv is around 27G. I've gzipped it and put at 
ftp://navinps:sqlit...@h2.usa.hp.com as process.csv.gz

There is only 1 file there. 
md5sum process.csv.gz
e77a322744a26d4c8a1ad4d61a84ee72  process.csv.gz

 [root@centosnavin sqlite-autoconf-3080801]# cat sqlite3commands.txt 
CREATE TABLE [hp_table1] ( InstanceId INTEGER, LogTime INTEGER, ArrivalTime 
INTEGER, CollectionTime INTEGER, [dml_PROC_TIME] TIME, [dml_PROC_INTERVAL] 
INTEGER, [dml_PROC_INTEREST] TEXT, [dml_PROC_TOP_CPU_INDEX] INTEGER, 
[dml_PROC_TOP_DISK_INDEX] INTEGER, [dml_PROC_STATE_FLAG] INTEGER, 
[dml_PROC_RUN_TIME] REAL, [dml_PROC_STOP_REASON_FLAG] INTEGER, 
[dml_PROC_INTERVAL_ALIVE] INTEGER, [dml_PROC_STOP_REASON] TEXT, 
[dml_PROC_STATE] TEXT, [dml_PROC_PRI] INTEGER, [dml_PROC_NICE_PRI] INTEGER, 
[dml_PROC_CPU_LAST_USED] INTEGER, [dml_PROC_CPU_SWITCHES] INTEGER, 
[dml_PROC_IO_BYTE] REAL, [dml_PROC_VOLUNTARY_CSWITCH] INTEGER, 
[dml_PROC_FORCED_CSWITCH] INTEGER, [dml_PROC_IO_BYTE_RATE] REAL, 
[dml_PROC_CPU_TOTAL_UTIL] REAL, [dml_PROC_CPU_TOTAL_TIME] REAL, 
[dml_PROC_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYS_MODE_TIME] REAL, 
[dml_PROC_CPU_USER_MODE_UTIL] REAL, [dml_PROC_CPU_USER_MODE_TIME] REAL, 
[dml_PROC_THREAD_COUNT] INTEGER, [dml_PROC_CPU_ALIVE_TOTAL_UTIL] REAL , 
[dml_PROC_CPU_ALIVE_USER_MODE_UTIL] 
 REAL, [dml_PROC_CPU_ALIVE_SYS_MODE_UTIL] REAL, [dml_PROC_CPU_SYSCALL_UTIL] 
REAL, [dml_PROC_CPU_SYSCALL_TIME] REAL, [dml_PROC_CHILD_CPU_USER_MODE_UTIL] 
REAL, [dml_PROC_CHILD_CPU_SYS_MODE_UTIL] REAL, [dml_PROC_CHILD_CPU_TOTAL_UTIL] 
REAL, [dml_PROC_DISK_PHYS_READ] INTEGER, [dml_PROC_DISK_PHYS_READ_RATE] REAL, 
[dml_PROC_DISK_PHYS_WRITE] INTEGER, [dml_PROC_DISK_PHYS_WRITE_RATE] REAL, 
[dml_PROC_DISK_PHYS_IO_RATE] REAL, [dml_PROC_MEM_RES] REAL, 
[dml_PROC_MEM_SHARED_RES] REAL, [dml_PROC_MEM_VIRT] REAL, 
[dml_PROC_MEM_DATA_VIRT] REAL, [dml_PROC_MEM_STACK_VIRT] REAL, 
[dml_PROC_PAGEFAULT] INTEGER, [dml_PROC_PAGEFAULT_RATE] REAL, 
[dml_PROC_MINOR_FAULT] INTEGER, [dml_PROC_MAJOR_FAULT] INTEGER, 
[dml_PROC_MEM_LOCKED] REAL, [dml_PROC_DISK_SUBSYSTEM_WAIT_PCT] REAL, 
[dml_PROC_DISK_SUBSYSTEM_WAIT_TIME] REAL, [dml_PROC_PRI_WAIT_PCT] REAL, 
[dml_PROC_PRI_WAIT_TIME] REAL, PRIMARY KEY (InstanceId, CollectionTime)) 
WITHOUT ROWID ; 
.timer on .mode csv .import /home/navin/oa_nvn/process.csv hp_table1 

[root@centosnavin sqlite-autoconf-3080801]# sync 
[root@centosnavin sqlite-autoconf-3080801]# ./sqlite3 hptest.db < 
sqlite3commands.txt 
[root@centosnavin sqlite-autoconf-3080801]# du -sh hptest.db
14G     hptest.db
[root@centosnavin sqlite-autoconf-3080801]# time ./sqlite3 hptest.db "select 
count(*) from hp_table1; "


115349845

real    26m56.435s
user    0m1.591s
sys     0m21.262s
 [root@centosnavin sqlite-autoconf-3080801]# echo "pragma page_size; " | 
./sqlite3
65536  [ tried with default page size also ]

 [root@centosnavin sqlite-autoconf-3080801]#



POSTGRES: [ Same machine, same FS disk , similar load, RAM 4G] 

DB size is around 34.2 GB.

bash-4.2$ time psql -c "select count(*) from dml_Scope__Process; "
   count   
-----------
 115349845
(1 row)


real    4m28.946s
user    0m0.001s
sys     0m0.004s
-bash-4.2$

The table is named differently. The data is same. I think postgres SQL is 
around 2 times the size of sqlite3 database but as you see it is like more than 
6 times faster.

I also did another experiment. I created this table and did a vaccum and then 
the select count(*) in sqlite3 was around 2 mins. 

When I create an index manually after the table is loaded (imported from csv), 
select count(*) in sqlite3 was within 30 to 40 secs.

I'm  stuck here how to go about achieving better speeds without always creating 
index after inserting data. ? 

[Repost end]


Quoting Simon:

> I also did another experiment. I created this table and did a vaccum and then 
> the select count(*) in sqlite3 was around 2 mins. 
> 
> When I create an index manually after the table is loaded (imported from 
> csv), select count(*) in sqlite3 was within 30 to 40 secs.

>In the second case, to calculate count(*) SQLite was able to use the index you 
>had created.  Since this index was smaller than the >table, SQLite was able to 
>count the entries in it faster.  The result would have been the same if you 
>had done whenever the >index >had been created

>CREATE TABLE
.>import
>CREATE INDEX
>time the 'select count(*) from hp_table1' command here


>should yield pretty-much the same result as

>CREATE TABLE
>CREATE INDEX
>.import
>time the 'select count(*) from hp_table1' command here

Actually this didn't give me what was expected. It also took more than 20 mins 
twice . 
I'll rerun it again if you insist. 
That is the reason I uploaded the file  to ftp and the schema.

Also I saw that autoindexes were present for the table (primary keys).


>If you are using a table for which rows are INSERTed but never DELETEd, then 
>you will get the same result almost instantly using

>       select max(rowid) from hp_table1

> instead of counting the rows.

We purge data once a week automatically and it is configurable. So we can't use 
the max(rowid) trick always. Yes it works if you don't DELETE..

I'm doing all this on CentOS 7 x64.
I built sqlite myself with latest sqlite-autoconf-3080801

Regards,
Navin
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to