On 1/20/15, Parakkal, Navin S (Software Engineer) <navin.parak...@hp.com> wrote:
> Hello,
>
>    I've few questions about sqlite3 , the database it creates. Actually I'm
> finding lot of differences in performance.
>
>  My story:
>     I have this sqlite3 database called  hp.db which is like 100+ million
> records for table1. The size of hp.db on Linux x64 (CentOS 7) is like 16
> GB.
>     When I do a select count(*) on hp_table1 it takes more than 5 mins which
> is quite a huge time.
>     This file is static for now ie we copied it from a production server for
> analysis.
>     Now I create a index by create index nvnhpindex on hp_table1
> (column1,column2) . The primary key of the table hp_table1 is
> (column1,column2).

Are you saying that you have a table like this:

    CREATE TABLE hp_table1(
         column1 TEXT,
         column2 TEXT,
         other_columns MISC,
         PRIMARY KEY(column1,column2)
    );

And then you are doing:

    CREATE INDEX idx1 ON hp_table1(column1,column2);

Don't do that!!!  The index is redundant.  You already have a primary
key on those two columns.  The primary key is sufficient.  The index
just makes your database bigger and slower.

For a multi-column primary key, your best performance will (probably)
result if you say:

    CREATE TABLE hp_table1(
         column1 TEXT,
         column2 TEXT,
         other_columns MISC,
         PRIMARY KEY(column1,column2)
    ) WITHOUT ROWID;

Note the use of WITHOUT ROWID at the end of the table declaration.
Note also the absence of any indexes.  This will likely make a big
size and performance difference for you.

I say "probably" because there are cases where it might be better to
omit the WITHOUT ROWID - specifically if "other_columns" contain very
large strings and/or blobs - larger than about 1/5th of your page
size.  You can run experiments using and omitting WITHOUT ROWID to see
which form works best on your system.


>     It takes around some time (maybe 20 minutes or less , I went for lunch
> and came back , really didn't note the time).
>
> Now I do select count(*) on hp_table1 , it takes around 15 secs. This is
> what we want our objective. We want it to be fast. The create index has
> increased the size of hp.db to 18 GB. This is OK with us and the customers.
>
> The problem is this is not a static database. We keep inserting data (insert
> rows every 10 secs or like atleast 1 minute ) and occassionally delete rows
> (like once in 5 days). This is a 24x7 system.
>
>
> So to identify the problem , I created a empty_database similar to hp.db
> with no rows.
> I created a index on column1,column2 on empty_table1 inside empty_database.
>
> Now I inserted the rows from csv (this CSV was created by .mode csv, output
> myhp.csv, select * from hp_table1).
>
> The size of database is around 18GB (empty_database) with rows. Now I do a
> select count(*) on empty_table1 (actually it contains lots of rows like
> 100M+ records ) and it takes more than 5 mins. 5 mins is too much of a time
> for us to bear. The customer wants the information within a minute.
>
>
> Can you please help in resolving this problem ? We are planning to deploy
> this across 10000+ nodes on Linux x64 on one customer and many other
> customers are going in the similar direction.
>
> How do we go about resolving this ie what should we do to create a table
> with sub minute access for 100-500 million . How do we create the indexes ?
> Any other performance incentives.
>
> Some say we should buy/use Oracle but I just am holding onto sqlite3
> assuming it would help me solve our problem.
>
>
> Regards,
> Navin
>
>


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to