Re: [sqlite] How long time to index this table
On Thu, Feb 28, 2008 at 11:48 AM, Lars Aronsson <[EMAIL PROTECTED]> wrote: > Gilles Ganault wrote: > > > But then, how many people use SQLite to handle 68 milions rows > > in a table? I've got 60 million in one table, and this is with an sqlite2 database. Works just fine, BTW. Derrell ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How long time to index this table
Gilles Ganault wrote: > But then, how many people use SQLite to handle 68 milions rows > in a table? That's a good question. I don't know. And I don't know if there is a recommended size for SQLite databases. But I was able to create the index in 12 minutes after I set the right cache_size. The usage I foresee for SQLite is that structured data is provided for download and personal analysis, rather than XML dumps or Excel spreadsheets. And today I can easily download a 3 gigabyte database file, which is only 600 megabytes compressed. Over on the unison mailing list (Unison is a tool for doing 2-way rsync, file transfers) we're constantly amazed that people complain they are unable to transfer terabytes in a single transfer. Who could have imagined. Maybe Gordon Moore. My 68 million rows come from the Swedish Wikipedia, which is my small experimentation base before I try this on the full size German or English Wikipedia. But I might have to back down to the Estonian or Faroese Wikipedia to get an even smaller dataset. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How long time to index this table
On Mon, 25 Feb 2008 13:34:04 +0100 (CET), Lars Aronsson <[EMAIL PROTECTED]> wrote: >You might say that it's my fault if I specify an insufficient >cache_size, but I think that (1) I shouldn't really need to >specify this in a "zero configuration" database engine, and (2) >the behaviour is exceptional. But then, how many people use SQLite to handle 68 milions rows in a table? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How long time to index this table
I asked a very similar question a few weeks ago and got a very precise answer. You should search for that. Kasper > Lars Aronsson wrote: >> I'm using SQLite 3.4.2, which is the standard package in >> Ubuntu Linux 7.10. I'm running this on an otherwise idle >> Intel Core 2 Duo CPU with 4 GB of RAM. The operation takes >> several hours and still running. Performance graphs indicate >> lots of iowait, with many thousand blocks being written to >> disk each second. But very little CPU power is being used. >> Is that reasonable? > > It sounds to me like your operating system is doing a really bad job of > caching. What is the output of "cat /proc/meminfo", "cat > /proc/sys/vm/swappiness", and "cat /proc/sys/vm/dirty_ratio"? > > How much free memory do you have when this is happening? What is your > vm.swappiness If nothing else is running then the operating system > should be able to cache the files in the buffer >> >> Do I need to use PRAGMA cache_size? > > I would try setting the cache size to ~3GB (PRAGMA cache_size=X, where > X=30/the page size). The default page size is only 2000, which > is thousands of times smaller than what you have available. Or, try > setting it to whatever virtual memory you have (4GB+swap size - 1GB) and > let the operating system handle all the disk I/O via swapping. Just make > sure that other processes aren't actively using memory. > > If none that doesn't help, then try decreasing the cache size and > increasing your /proc/sys/vm/dirty_ratio and > /proc/sys/vm/background_dirty_ratio to 95 each. > > - Brian > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How long time to index this table
Lars Aronsson wrote: > I'm using SQLite 3.4.2, which is the standard package in > Ubuntu Linux 7.10. I'm running this on an otherwise idle > Intel Core 2 Duo CPU with 4 GB of RAM. The operation takes > several hours and still running. Performance graphs indicate > lots of iowait, with many thousand blocks being written to > disk each second. But very little CPU power is being used. > Is that reasonable? It sounds to me like your operating system is doing a really bad job of caching. What is the output of "cat /proc/meminfo", "cat /proc/sys/vm/swappiness", and "cat /proc/sys/vm/dirty_ratio"? How much free memory do you have when this is happening? What is your vm.swappiness If nothing else is running then the operating system should be able to cache the files in the buffer > > Do I need to use PRAGMA cache_size? I would try setting the cache size to ~3GB (PRAGMA cache_size=X, where X=30/the page size). The default page size is only 2000, which is thousands of times smaller than what you have available. Or, try setting it to whatever virtual memory you have (4GB+swap size - 1GB) and let the operating system handle all the disk I/O via swapping. Just make sure that other processes aren't actively using memory. If none that doesn't help, then try decreasing the cache size and increasing your /proc/sys/vm/dirty_ratio and /proc/sys/vm/background_dirty_ratio to 95 each. - Brian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How long time to index this table
I think that you have hit a situation where the sort and index build algorithm in Sqlite is a problem and gets limited by disk head seeking. Others know more about this situation but I don't think there is much you can do to improve the speed. Lars Aronsson wrote: > I have a database table with 68 million rows and 4 columns, of > which 3 are integers and one is a short text string. I'm > now trying to create an index on one of the integer columns, that > I forgot to create before I populated the table. But it takes for > ever and doesn't seem to stop. > > I estimate that each row takes 20 bytes, so the 68 million rows > should be a little over a gigabyte. Together with some other > data, the database file is 3.1 gigabytes. You can download a > compressed version (638 MB using bzip2) from > > http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.db.bz2 > > The schema (767 bytes) is available at > > http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.schema > > The operation I'm trying is > > create index counts_2 on counts(name); > > I'm using SQLite 3.4.2, which is the standard package in Ubuntu > Linux 7.10. I'm running this on an otherwise idle Intel Core 2 > Duo CPU with 4 GB of RAM. The operation takes several hours and > still running. Performance graphs indicate lots of iowait, with > many thousand blocks being written to disk each second. But very > little CPU power is being used. Is that reasonable? > > Do I need to use PRAGMA cache_size? > > Is there a good way to find out what sqlite3 is doing? I know the > "explain" command lists the byte code, but can it predict how long > an operation will take? > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users