Re: [sqlite] Speeding up index creation?
On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote: > I have an interesting problem. I need to generate a large table > periodically. The table contains a unique SHA1 hash code and 6 integer > values and has about 6 million rows. Generating this table is fast. I > can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and > slow disk). The thing that takes a (relatively) long time is the index > creation on the unique hash code .. 720 seconds. > > (I'm importing within transactions with chunks of 25.000 records) > > The question is, is there any way to speed up the indexing proces? > > Couple of things on my mind: > > Doing this all on an in-memory database takes about 150 seconds in > total. Is it possible to build a database in memory and then dump it > to disk? > > Would it make sense to sort the records that i import? Could that > result in a quicker index operation? This certainly helps for Berkeley > DB. But it is lower level and you can use it's internal hashing > functions on your data to pre-sort it. Maybe SQLite has something > similar? > It does make sense to presort records before inserting into on-disk db. You should insert into memory db first and then insert sorted records into disk db from there. You can batch this into chunks of 25K rows as you're doing now. But if you have enough memory, it will probably be better to increase number of rows. You should also play with increasing cache_size and page_size values. Pseudo code: OPEN :MEMORY: ATTACH DISK_DB; DO BEGIN; INSERT INTO MEMORY_TABLE VALUES(...); x 25.000 INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash; COMMIT; WHILE has_more_rows Search this list's archives for "locality of reference" for more helpful tips on the subject of building large indexes in SQLite. > Are there any other tunable options that can improve indexing speed? > > S. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Nemanja Čorlija <[EMAIL PROTECTED]> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation?
On Fri, May 23, 2008 at 10:20:45AM -0400, Stefan Arentz scratched on the wall: > I have an interesting problem. I need to generate a large table > periodically. The table contains a unique SHA1 hash code and 6 integer > values and has about 6 million rows. Generating this table is fast. I > can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and > slow disk). The thing that takes a (relatively) long time is the index > creation on the unique hash code .. 720 seconds. > Doing this all on an in-memory database takes about 150 seconds in > total. Is it possible to build a database in memory and then dump it > to disk? Not at this time. > Are there any other tunable options that can improve indexing speed? The biggest thing you can do to improve indexing performance is to increase the size of the page cache. Assuming you're using the default 1K page size, each page takes up about 1.5K of RAM in the cache. The default cache size is 2000 pages (3MB), but you should crank this up as high as you can while having a reasonable chance of still keeping the whole thing in physical RAM... say 75% of your machine's total RAM, if you've got a gig or two. Just be aware that the value is the number of pages, not the number of KB. The other thing that helps a little is to turn synchronous to off while you're creating the index. This is normally a dangerous thing to do, but if you're just dumping data into a database chances are you can re-start from scratch if things go wrong. You may also see some performance from setting the temp_store to memory. The biggest single thing is the page cache, however. http://www.sqlite.org/pragma.html PRAGMA page_cache PRAGMA synchronous PRAGMA temp_store -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema optimization question
On Fri, May 23, 2008 at 3:01 PM, Jeff Gibson <[EMAIL PROTECTED]> wrote: > I'm sorry if this is an obvious question - I'm new to databases. I have > an application where the database is used to log a large number of > simulation events. The database is written once and read many times > (i.e., there are never any inserts or updates after database creation). > The three most interesting tables I have are: > > CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER, > subtype INTEGER); > > CREATE TABLE actions (aid INTEGER PRIMARY KEY, type INTEGER, seqnum > INTEGER, tid INTEGER, instid INTEGER); > > CREATE TABLE subtypes (type INTEGER, subtype INTEGER, name TEXT, verbose > INTEGER, PRIMARY KEY(type,subtype) ); > > The column names are such that columns in different tables with the same > name act as foreign keys. The largest (and most often queried) table is > events, and it can have many millions of entries. The actions table is > also large (about a fifth as big as events) and subtypes is very small > (dozens of entries). My application involves querying events many > times, but very common queries include events that match a particular > verbose value and/or a particular type value. This leads to queries > that have one or two joins, and such queries are substantially slower > than just a query on just the events table. >The question is, what can I do to speed up those queries? The > obvious answer would be to put type and verbose as columns in the events > table, but they would be redundant. Is that par for the course, or is > there some best practice I'm overlooking? >Thanks, >Jeff > Well, I'd like to note that in general, reporting databases are denormalized in order to improve performance. However, you shouldn't need to do that for a case this simple. Let's start by creating your database and having fun with EXPLAIN QUERY PLAN. I created the tables as you provided and did this: create view eventdetail as select e.eid as eid, e.time as time, a.aid as aid, a.seqnum as seqnum, a.tid as tid, a.instid as instid, s.type as type, s.subtype as subtype, s.name as name, s.verbose as verbose from events e join actions a on a.aid=e.aid join subtypes s on s.subtype = e.subtype; This makes life a lot easier. Besides, SQLite is extremely well-written and will handle this stuff beautifully. ( I noticed that you have a 'type' on both the 'actions' table and the 'subtypes' table. I assume that they are the same thing and used the version from subtypes. ) sqlite> explain query plan select * from eventdetail where type=123; orde from deta - 0 0 TABLE events AS e 1 1 TABLE actions AS a USING PRIMARY KEY 2 2 TABLE subtypes AS s WITH INDEX sqlite_autoindex_subtypes_1 You can see here that we are table-scanning 'events'. This is bad. The solution here is to add an index so events can be searched by subtype: [[ create index events_subtype_ix on events(subtype); ]] sqlite> explain query plan select * from eventdetail where type=123; orde from deta - 0 2 TABLE subtypes AS s WITH INDEX sqlite_autoindex_subtypes_1 1 0 TABLE events AS e WITH INDEX events_subtype_ix 2 1 TABLE actions AS a USING PRIMARY KEY As you can see here, SQLite is actually figuring out which subtypes have type=123, then looking up that subset of the 'events' table using the newly created index, then joining to 'actions' based on the 'aid' column. All in all, if there are very many different top-level types, the first one will only find a few subtypes (the term is 'high selectivity'). In general, EXPLAIN QUERY PLAN will give you a good idea on what SQLite is doing to perform the requested actions. Index tweaking and ANALYZE (http://www.sqlite.org/lang_analyze.html) will enable you to filter out the amount of data SQLite has to consider when returning a resultset. -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema optimization question
On Fri, May 23, 2008 at 12:01 PM, Jeff Gibson <[EMAIL PROTECTED]> wrote: > I'm sorry if this is an obvious question - I'm new to databases. I have > an application where the database is used to log a large number of > simulation events. The database is written once and read many times > (i.e., there are never any inserts or updates after database creation). > The three most interesting tables I have are: > > CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER, > subtype INTEGER); > > CREATE TABLE actions (aid INTEGER PRIMARY KEY, type INTEGER, seqnum > INTEGER, tid INTEGER, instid INTEGER); > > CREATE TABLE subtypes (type INTEGER, subtype INTEGER, name TEXT, verbose > INTEGER, PRIMARY KEY(type,subtype) ); > > The column names are such that columns in different tables with the same > name act as foreign keys. The largest (and most often queried) table is > events, and it can have many millions of entries. The actions table is > also large (about a fifth as big as events) and subtypes is very small > (dozens of entries). My application involves querying events many > times, but very common queries include events that match a particular > verbose value and/or a particular type value. This leads to queries > that have one or two joins, and such queries are substantially slower > than just a query on just the events table. >The question is, what can I do to speed up those queries? The > obvious answer would be to put type and verbose as columns in the events > table, but they would be redundant. Is that par for the course, or is > there some best practice I'm overlooking? I'm new to databases too. But since no one has answered I'll take a stab at it: make an index for each table on the column that you join. That should speed up the joins. But I don't think it will be as fast as putting everything in one table and making an index on columns you do WHERE on. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Autocommit .read inserts are so slow it must be a bug
On Fri, 23 May 2008 15:21:42 + (GMT), Morten wrote: > Hi > > I've been puzzled for some days. I believed I had a > hang in my system, but it turns out using the .read > is horribly slow. I believed it to be an > improper trigger, but removing it doesn't help a bit. > 1min36sec to read 1000 lines with 6 simple columns > into one table. > > I know insert with autocommit was slow, but not that > slow. Is this filesystem-io based? > it works very hard on the filesystem. I also tried > simplifying the primary key, but timing is the same. > Memory caching, etc? I can emulate file-inserts > faster in a palmpilot simulator. > > Nice if someone could share some light on this. > > I'm easily going around this bug with commit, so it > is not much of a problem once you know. > > SQLite 3.3.8 on linux 32bit. > > % wc -l test_autocommit.sql test_commit.sql > 1024 test_autocommit.sql > 1026 test_commit.sql > 2050 total > % time echo ".read test_autocommit.sql" | sqlite3 test.db > > real1m36.116s > user0m0.408s > sys 0m0.988s > > % rm -rf test.db > > If I add Begin transaction and commit it is blasting fast: > % time echo ".read test_commit.sql" | sqlite3 test.db > > real0m0.309s > user0m0.168s > sys 0m0.040s > > > sqlite> .schema > CREATE TABLE secanaconfig ( > name TEXT not null, > valueTEXT not null, > type TEXT not null, > grp TEXT, > desc TEXT, > updated DATE, > PRIMARY KEY (name, type) > ); > CREATE TABLE subtypes ( > type TEXT PRIMARY KEY, > desc TEXT > ); > CREATE TRIGGER fki_secanaconfig > before insert on secanaconfig > FOR EACH ROW BEGIN > SELECT RAISE(ABORT, 'insert into secanaconfig foreign-key type does not > exists in subtypes table') > WHERE (SELECT type FROM subtypes WHERE type = NEW.type) IS NULL; > END; It is expected behaviour, not a bug. For every transaction there are several fsync() calls to the operating system to flush buffers of both journal and database to disk. See the documentation for more details. http://www.sqlite.org/atomiccommit.html If you don't care about SQLites ACID features you can avoid some delays by applying the right PRAGMAs. http://www.sqlite.org/pragma.html#pragma_journal_mode http://www.sqlite.org/pragma.html#pragma_synchronous > Thanks > -- > Morten Bjoernsvik, Oslo, Norway -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] schema optimization question
I'm sorry if this is an obvious question - I'm new to databases. I have an application where the database is used to log a large number of simulation events. The database is written once and read many times (i.e., there are never any inserts or updates after database creation). The three most interesting tables I have are: CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER, subtype INTEGER); CREATE TABLE actions (aid INTEGER PRIMARY KEY, type INTEGER, seqnum INTEGER, tid INTEGER, instid INTEGER); CREATE TABLE subtypes (type INTEGER, subtype INTEGER, name TEXT, verbose INTEGER, PRIMARY KEY(type,subtype) ); The column names are such that columns in different tables with the same name act as foreign keys. The largest (and most often queried) table is events, and it can have many millions of entries. The actions table is also large (about a fifth as big as events) and subtypes is very small (dozens of entries). My application involves querying events many times, but very common queries include events that match a particular verbose value and/or a particular type value. This leads to queries that have one or two joins, and such queries are substantially slower than just a query on just the events table. The question is, what can I do to speed up those queries? The obvious answer would be to put type and verbose as columns in the events table, but they would be redundant. Is that par for the course, or is there some best practice I'm overlooking? Thanks, Jeff ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is SQLite so slow across a network (afp Mac OS X)?
On Fri, May 23, 2008 at 4:31 AM, Christian Smith < [EMAIL PROTECTED]> wrote: > On Tue, May 13, 2008 at 11:15:51PM -0600, Peter K. Stys wrote: > > Hi Folks: > > I'm new to this list and just came across a major issue so I thought I'd > > post here: I'm using SQLite (from REALbasic which uses the SQLite DB > engine > > for SQL support) to store image files, and while R/W to local files is > > excellent, the same operations on files residing on a remote volume > mounted > > via afp or smb (all Mac OS X) suffer a 20-fold performance hit. > > > > Why is this, and is there a workaround? > > > > To be clear, while the remote files are on a server, this is still single > > user access to a single file at a time, just remotely. > > > > Any input greatly appreciated! > > > A local disk is on a >1Gb/s connection, probably at the end of a wire <0.5m > long. The remote volume will have that, but also a <1Gb/s connection, on > top > of a >10m length of cabling to implement the network. > > Laws of physics, such as the speed of light, limit the turn-around of > synchronous writes across a network. Your hard disk has probably an order > of magnitude quicker synchronous write than your network share. > > Try playing with synchronous writes turned off. The reduced synchronous > requirements may allow you to make more optimum use of the network file > protocols, which operate best asynchronously. > > Try, in order: > PRAGMA synchronous = NORMAL; > PRAGMA synchronous = OFF; > > And measure performance of each. But SQLite is simply not designed to work > efficiently across a network based file system, so manage your > expectations. > Thanks Christian, however, laws of physics says that a packet will travel along a 100ft run of cable in about 100 ns, give or take. Also laws of physics (actually laws of well coded copy routines I suspect) allows Finder copies to run @ 50 Mbytes/sec over our Gbit LAN (this includes reading from the drive at one end AND writing to the drive at the other) vs. SQLite that can barely manage 2 MB/sec only reading BLOBs from one end into RAM at the other. So the other explanations people have advanced (namely huge protocol overheads for SQL access/networked volume access, etc...) are more likely the cause. With much effort I suspect this can be overcome ('cause raw binary file copies can do it), but unlikely to happen. BTW, those PRAGMAs made little difference. I resorted to caching the remote file to the local drive via a fast OS-level file copy then doing the SQL R/W, then copying back to the remote in a bkgnd thread. A programming headache to keep everything in sync, but very acceptable performance. thanks for responding anyway, Peter. -- - Peter K. Stys, MD Dept. of Clinical Neurosciences Hotchkiss Brain Institute University of Calgary tel (403) 210-8646 - ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Autocommit .read inserts are so slow it must be a bug
Hi I've been puzzled for some days. I believed I had a hang in my system, but it turns out using the .read file.sql is horribly slow. I believed it to be an improper trigger, but removing it doesn't help a bit. 1min36sec to read 1000 lines with 6 simple columns into one table. I know insert with autocommit was slow, but not that slow. Is this filesystem-io based? it works very hard on the filesystem. I also tried simplifying the primary key, but timing is the same. Memory caching, etc? I can emulate file-inserts faster in a palmpilot simulator. Nice if someone could share some light on this. I'm easily going around this bug with commit, so it is not much of a problem once you know. SQLite 3.3.8 on linux 32bit. % wc -l test_autocommit.sql test_commit.sql 1024 test_autocommit.sql 1026 test_commit.sql 2050 total % time echo ".read test_autocommit.sql" | sqlite3 test.db real 1m36.116s user 0m0.408s sys 0m0.988s % rm -rf test.db If I add Begin transaction and commit it is blasting fast: % time echo ".read test_commit.sql" | sqlite3 test.db real 0m0.309s user 0m0.168s sys 0m0.040s sqlite .schema CREATE TABLE secanaconfig ( name TEXT not null, value TEXT not null, type TEXT not null, grp TEXT, desc TEXT, updated DATE, PRIMARY KEY (name, type) ); CREATE TABLE subtypes ( type TEXT PRIMARY KEY, desc TEXT ); CREATE TRIGGER fki_secanaconfig before insert on secanaconfig FOR EACH ROW BEGIN SELECT RAISE(ABORT, 'insert into secanaconfig foreign-key type does not exists in subtypes table') WHERE (SELECT type FROM subtypes WHERE type = NEW.type) IS NULL; END; Thanks -- Morten Bjoernsvik, Oslo, Norway _ Alt i ett. Få Yahoo! Mail med adressekartotek, kalender og notisblokk. http://no.mail.yahoo.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] baffling performance decrease across network (specific case)
Thank you, that was immensely helpful - as well as Roger Binns many informative posts about OpLocks. Looks like there is nothing to be done about the slowdown (in fact I should be grateful there is even such a thing as opportunistic locking to make the single client situation faster), but it is a relief to have an explanation about what is happening! Serena. On Thu, May 22, 2008 at 5:59 PM, Dennis Cote <[EMAIL PROTECTED]> wrote: > > This slowdown is almost certainly due to the use of opportunistic > locking in the SMB protocols. With a single client the it uses exclusive > oplocks and can cache remote file data locally. When a second client > connects it can no longer do this and subsequently slows down to the > real speed of remote file access. > > See http://en.wikipedia.org/wiki/Opportunistic_Locking for additional > info. > > HTH > Dennis Cote > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Speeding up index creation?
I have an interesting problem. I need to generate a large table periodically. The table contains a unique SHA1 hash code and 6 integer values and has about 6 million rows. Generating this table is fast. I can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and slow disk). The thing that takes a (relatively) long time is the index creation on the unique hash code .. 720 seconds. (I'm importing within transactions with chunks of 25.000 records) The question is, is there any way to speed up the indexing proces? Couple of things on my mind: Doing this all on an in-memory database takes about 150 seconds in total. Is it possible to build a database in memory and then dump it to disk? Would it make sense to sort the records that i import? Could that result in a quicker index operation? This certainly helps for Berkeley DB. But it is lower level and you can use it's internal hashing functions on your data to pre-sort it. Maybe SQLite has something similar? Are there any other tunable options that can improve indexing speed? S. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Moving Backwards in SQLite
On Fri, May 23, 2008 at 12:13:39AM -0700, Roger Binns scratched on the wall: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > sqlite wrote: > > our workaround to move backwards is optimistic. if some one could help me in > > this regard, if could share some code for moving backwards it would be more > > helpful to me. > > http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor This should be in the FAQ. Variations of this question seem to show up on the mailing list about once a week. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Equivalent of mysql_real_escape_string() ?
Hi all, Scott Baker wrote: > As mentioned above the BEST way to do it is with prepared statement > and bound variables. If you have to use raw SQL then just use the > PDO::quote method: > > http://php.web-ster.com/manual/en/pdo.quote.php > > $conn = new PDO('sqlite:/home/lynn/music.sql3'); $string = 'Nice'; > print "Quoted string: " . $conn->quote($string) . "\n"; > > I'm open to discussion about whether or not this is this is still > vulnerable to SQL injection. > The article Jay Kreibich linked to demonstrates a bug in MySQL's C API that was fixed, and I suppose the fix was automatically reflected back in PHP's MySQL extension when it was updated. SQLite's C API has no such escaping function, so it depends on what PHP's SQLite PDO driver does to implement the quoting. I certainly hope that it is not vulnerable to SQL injection: if it is, how do we handle the fact that we cannot bind a list of values to a single parameter, and thus have to manually build the SQL statement if we want to use IN ('x', 'y', 'z', ...), where the length and content of the list is determined by the user? At the moment I would simply apply PDO::quote to x, y, z (and the rest) individually. Regards, Eugene Wee ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Moving Backwards in SQLite
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 sqlite wrote: > our workaround to move backwards is optimistic. if some one could help me in > this regard, if could share some code for moving backwards it would be more > helpful to me. http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFINm6jmOOfHg372QQRAmyuAKCNseVVBbjT6DLkFXESCjvIqT4DMgCfbFw5 qUBAUiIhCt5mpnWOJz2tlIY= =mt1Q -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Moving Backwards in SQLite
Dear All, i have developed an application using SQLite which runs in the pocket pc device. My application has to show some messages in a list box (custom build) while moving forward im able to move the records one by one in the recordset. But when i move upwards im not able to move to the corresponding records i have to reset the recordset pointer to the initial position and then have to skip the records until i reach the desired record. is there anyway to skip directly to the desired record. Or even im not sure whether our workaround to move backwards is optimistic. if some one could help me in this regard, if could share some code for moving backwards it would be more helpful to me. Regards, Karthi -- View this message in context: http://www.nabble.com/Moving-Backwards-in-SQLite-tp17419487p17419487.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users