Re: [sqlite] Optimizing an insert/update

2008-02-29 Thread Lars Aronsson
Michael Miller wrote:

> "If the string doesn't exist in the table, create a new row with 
> the string in the first column and 1 in the second column. If 
> the string does exist in the table, increment the second column 
> by 1"

One way you can do it is to insert everything into a temporary 
table which doesn't require the string to be unique, and then do a 
select sum(count),string from temp group by string;


create table mystrings (string text primary key, count integer);
create table mytemp (string text, count integer);
.import mydata.txt mytemp
insert into mystrings (string, count)
  select string, sum(count) from mytemp group by string;


As a complement to the "insert or replace" it would be nice to 
have "insert or accumulate" for this kind of a job.  However, that 
would go outside of the SQL standard.


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How long time to index this table

2008-02-28 Thread Lars Aronsson
Gilles Ganault wrote:

> But then, how many people use SQLite to handle 68 milions rows 
> in a table?

That's a good question.  I don't know.  And I don't know if there 
is a recommended size for SQLite databases.  But I was able to 
create the index in 12 minutes after I set the right cache_size. 
The usage I foresee for SQLite is that structured data is provided 
for download and personal analysis, rather than XML dumps or Excel 
spreadsheets. And today I can easily download a 3 gigabyte 
database file, which is only 600 megabytes compressed.

Over on the unison mailing list (Unison is a tool for doing 2-way 
rsync, file transfers) we're constantly amazed that people 
complain they are unable to transfer terabytes in a single 
transfer.  Who could have imagined.  Maybe Gordon Moore.

My 68 million rows come from the Swedish Wikipedia, which is my 
small experimentation base before I try this on the full size 
German or English Wikipedia.  But I might have to back down to the 
Estonian or Faroese Wikipedia to get an even smaller dataset.


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How long time to index this table

2008-02-24 Thread Lars Aronsson

I have a database table with 68 million rows and 4 columns, of 
which 3 are integers and one is a short text string.  I'm 
now trying to create an index on one of the integer columns, that 
I forgot to create before I populated the table.  But it takes for 
ever and doesn't seem to stop.

I estimate that each row takes 20 bytes, so the 68 million rows 
should be a little over a gigabyte.  Together with some other 
data, the database file is 3.1 gigabytes.  You can download a 
compressed version (638 MB using bzip2) from

http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.db.bz2

The schema (767 bytes) is available at

http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.schema

The operation I'm trying is

  create index counts_2 on counts(name);

I'm using SQLite 3.4.2, which is the standard package in Ubuntu 
Linux 7.10.  I'm running this on an otherwise idle Intel Core 2 
Duo CPU with 4 GB of RAM.  The operation takes several hours and 
still running.  Performance graphs indicate lots of iowait, with 
many thousand blocks being written to disk each second.  But very 
little CPU power is being used.  Is that reasonable?

Do I need to use PRAGMA cache_size?

Is there a good way to find out what sqlite3 is doing?  I know the 
"explain" command lists the byte code, but can it predict how long 
an operation will take?


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Innovative examples / user stories

2008-02-11 Thread Lars Aronsson

Is there any documentation of how people use SQLite in odd ways in 
their everyday activities?  For example, do you e-mail SQLite DB 
files between you, as if they were Excel spreadsheets?  Or do you 
distribute SQLite database files via BitTorrent?  Even with multi 
table databases? That would be a kind of database use that was 
unheard of in the Oracle/DB2 era, but it certainly should be 
possible with SQLite.  SQLite databases files could be used in a 
"seti @ home" kind of application, where a screensaver downloads 
an existing DB file, processes it for some hours, and then uploads 
the resulting DB file again.  But are any such applications or 
user stories known? Is there a list already?

I've read the Wikipedia article and its list of Google Gears, 
Android, Mac OS X Tiger, etc.  I've also seen the "well-known 
users of SQLite" page on sqlite.org/famous.html and the "When to 
use SQLite" page.

I've used Oracle since version 7 and MySQL since 3.23 (which isn't 
very long at all, but anyway).  The switch from Oracle to MySQL 
was made possible for many people because their database was 
embedded behind a web application anyway (look, no DBA!), so full 
transaction handling wasn't really needed.  That shift in usage 
pattern opened up for a simpler and more affordable solution.  
Many other such technology shifts are described in an old book 
titled "The Innovator's Dilemma".


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users