Re: [sqlite] How long time to index this table

2008-02-28 Thread Derrell Lipman
On Thu, Feb 28, 2008 at 11:48 AM, Lars Aronsson <[EMAIL PROTECTED]> wrote:
> Gilles Ganault wrote:
>
>  > But then, how many people use SQLite to handle 68 milions rows
>  > in a table?

I've got 60 million in one table, and this is with an sqlite2
database.  Works just fine, BTW.

Derrell
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How long time to index this table

2008-02-28 Thread Lars Aronsson
Gilles Ganault wrote:

> But then, how many people use SQLite to handle 68 milions rows 
> in a table?

That's a good question.  I don't know.  And I don't know if there 
is a recommended size for SQLite databases.  But I was able to 
create the index in 12 minutes after I set the right cache_size. 
The usage I foresee for SQLite is that structured data is provided 
for download and personal analysis, rather than XML dumps or Excel 
spreadsheets. And today I can easily download a 3 gigabyte 
database file, which is only 600 megabytes compressed.

Over on the unison mailing list (Unison is a tool for doing 2-way 
rsync, file transfers) we're constantly amazed that people 
complain they are unable to transfer terabytes in a single 
transfer.  Who could have imagined.  Maybe Gordon Moore.

My 68 million rows come from the Swedish Wikipedia, which is my 
small experimentation base before I try this on the full size 
German or English Wikipedia.  But I might have to back down to the 
Estonian or Faroese Wikipedia to get an even smaller dataset.


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How long time to index this table

2008-02-25 Thread Gilles Ganault
On Mon, 25 Feb 2008 13:34:04 +0100 (CET), Lars Aronsson
<[EMAIL PROTECTED]> wrote:
>You might say that it's my fault if I specify an insufficient 
>cache_size, but I think that (1) I shouldn't really need to 
>specify this in a "zero configuration" database engine, and (2) 
>the behaviour is exceptional.

But then, how many people use SQLite to handle 68 milions rows in a
table?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How long time to index this table

2008-02-24 Thread khansen
I asked a very similar question a few weeks ago and got a very precise
answer. You should search for that.

Kasper

> Lars Aronsson wrote:
>> I'm using SQLite 3.4.2, which is the standard package in
>> Ubuntu Linux 7.10.  I'm running this on an otherwise idle
>> Intel Core 2 Duo CPU with 4 GB of RAM.  The operation takes
>> several hours and still running.  Performance graphs indicate
>> lots of iowait, with many thousand blocks being written to
>> disk each second.  But very little CPU power is being used.
>> Is that reasonable?
>
> It sounds to me like your operating system is doing a really bad job of
> caching. What is the output of "cat /proc/meminfo", "cat
> /proc/sys/vm/swappiness", and "cat /proc/sys/vm/dirty_ratio"?
>
> How much free memory do you have when this is happening? What is your
> vm.swappiness If nothing else is running then the operating system
> should be able to cache the files in the buffer
>>
>> Do I need to use PRAGMA cache_size?
>
> I would try setting the cache size to ~3GB (PRAGMA cache_size=X, where
> X=30/the page size). The default page size is only 2000, which
> is thousands of times smaller than what you have available. Or, try
> setting it to whatever virtual memory you have (4GB+swap size - 1GB) and
> let the operating system handle all the disk I/O via swapping. Just make
> sure that other processes aren't actively using memory.
>
> If none that doesn't help, then try decreasing the cache size and
> increasing your /proc/sys/vm/dirty_ratio and
> /proc/sys/vm/background_dirty_ratio to 95 each.
>
> - Brian
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How long time to index this table

2008-02-24 Thread Brian Smith
Lars Aronsson wrote:
> I'm using SQLite 3.4.2, which is the standard package in 
> Ubuntu Linux 7.10.  I'm running this on an otherwise idle 
> Intel Core 2 Duo CPU with 4 GB of RAM.  The operation takes 
> several hours and still running.  Performance graphs indicate 
> lots of iowait, with many thousand blocks being written to 
> disk each second.  But very little CPU power is being used.  
> Is that reasonable?

It sounds to me like your operating system is doing a really bad job of
caching. What is the output of "cat /proc/meminfo", "cat
/proc/sys/vm/swappiness", and "cat /proc/sys/vm/dirty_ratio"?

How much free memory do you have when this is happening? What is your
vm.swappiness If nothing else is running then the operating system
should be able to cache the files in the buffer
> 
> Do I need to use PRAGMA cache_size?

I would try setting the cache size to ~3GB (PRAGMA cache_size=X, where
X=30/the page size). The default page size is only 2000, which
is thousands of times smaller than what you have available. Or, try
setting it to whatever virtual memory you have (4GB+swap size - 1GB) and
let the operating system handle all the disk I/O via swapping. Just make
sure that other processes aren't actively using memory.

If none that doesn't help, then try decreasing the cache size and
increasing your /proc/sys/vm/dirty_ratio and
/proc/sys/vm/background_dirty_ratio to 95 each. 

- Brian

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How long time to index this table

2008-02-24 Thread John Stanton
I think that you have hit a situation where the sort and index build 
algorithm in Sqlite is a problem and gets limited by disk head seeking.
Others know more about this situation but I don't think there is much 
you can do to improve the speed.

Lars Aronsson wrote:
> I have a database table with 68 million rows and 4 columns, of 
> which 3 are integers and one is a short text string.  I'm 
> now trying to create an index on one of the integer columns, that 
> I forgot to create before I populated the table.  But it takes for 
> ever and doesn't seem to stop.
> 
> I estimate that each row takes 20 bytes, so the 68 million rows 
> should be a little over a gigabyte.  Together with some other 
> data, the database file is 3.1 gigabytes.  You can download a 
> compressed version (638 MB using bzip2) from
> 
> http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.db.bz2
> 
> The schema (767 bytes) is available at
> 
> http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.schema
> 
> The operation I'm trying is
> 
>   create index counts_2 on counts(name);
> 
> I'm using SQLite 3.4.2, which is the standard package in Ubuntu 
> Linux 7.10.  I'm running this on an otherwise idle Intel Core 2 
> Duo CPU with 4 GB of RAM.  The operation takes several hours and 
> still running.  Performance graphs indicate lots of iowait, with 
> many thousand blocks being written to disk each second.  But very 
> little CPU power is being used.  Is that reasonable?
> 
> Do I need to use PRAGMA cache_size?
> 
> Is there a good way to find out what sqlite3 is doing?  I know the 
> "explain" command lists the byte code, but can it predict how long 
> an operation will take?
> 
> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users