Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Hello, Thanks for your advice! It works pretty well when I dropped the rowid and set the other field as primary key. I also enable wal mode at the cost of a little bit read performance. Overall, it works really well now. Really appreciate your help. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Thanks for the advice! I could never thought about this. Really learnt a lot. It's a customized Android phone type device. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15;". Not zipvfs specific, but that kind of update can be quite inefficient if the record is large (in terms of bytes, or records per database page) as your table declaration hints to. This will be especially prominent with compression, as a single byte changed early in the uncompressed stream can lead to changing pretty much every byte of the compressed data, and so is bound to defeat any optimizations the storage could have to handle unmodified pages and data. Depending on your update patterns and frequencies, it could thus be beneficial to split your table into two tables (or more) with a common primary key: - one table for the rarely changed fields (even better if they are immutable) - one table (or more) for the fields that change often This will increase the database size and reduce insert performance, and it can complicate querying when you need everything (extra inner joins), but it can help reduce the amount of raw data that is affected by updates quite drastically. I have been able to use the above strategy sucessfully, however it was under Windows compressed folders, so YMMV. On Thu, Sep 7, 2017 at 9:13 PM, Yue Wu wrote: > Hello, > As mentioned in the subject, our goal is to improve performance regarding > to batch sql updates. > The update sql as follow, > > > UPDATE ITEM SET FIELD4 =? WHERE DS=? > > We run 100,000 updates in a single transaction. The zipvfs version takes > about 20 min while uncompressed version takes about 7 min. > I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS > = 15;". > > Any suggestions is very helpful. > > Some background: > We have an app running on Android 4.2 using zipvfs. > The page size is 4096 > The cache size is - 2000 > > Table schema for ITEM > > > CREATE TABLE ITEM ( > > FIELD0 NUMERIC, > > FIELD1 NUMERIC, > > DS TEXT, > > FIELD2 TEXT, > > FIELD3 TEXT, > > FIELD4 NUMERIC, > > FIELD5 NUMERIC, > > FIELD6 NUMERIC, > > FIELD7 NUMERIC, > > FIELD8 NUMERIC, > > FIELD9 NUMERIC, > > FIELD10 NUMERIC, > > FIELD11 TEXT); > > > The third column: "DS" is what we query by almost all the time. We also > created index: > > > CREATE INDEX DS_INDEX ON ITEM(DS); > > > There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the > one shipped with Android 4.2) size of 1.39gb. > Zipvfs db using zlib and aes128, which are default. > > Thanks > -- > > Yue Wu | Android Developer > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
On 9 Sep 2017, at 9:11pm, Teg wrote: > Is it a phone or tablet? I'd wonder if the CPU is going into thermal > throttle mode. As I understand it most phones and tablets can't keep > the CPU's running full speed without eventually throttling the CPU. I would never have thought of that unless I had the device in my hands. Well done. Yes. You’re trying to process 100,000 updates, and your device is working flat out for 7 minutes, or for 20 minutes. How hot is it ? If you do it unplugged, how much does that drain the battery ? I don’t have experience with Android but I can tell you that iOS will have to throttle any process that attempts to do process data continuously for 7 minutes. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Hello Yue, Is it a phone or tablet? I'd wonder if the CPU is going into thermal throttle mode. As I understand it most phones and tablets can't keep the CPU's running full speed without eventually throttling the CPU. To me the fact uncompressed, while moving much more data, is more than 2 times faster suggests the bottleneck isn't the fileIO. Thursday, September 7, 2017, 3:13:04 PM, you wrote: YW> Hello, YW> As mentioned in the subject, our goal is to improve performance regarding YW> to batch sql updates. YW> The update sql as follow, >> UPDATE ITEM SET FIELD4 =? WHERE DS=? YW> We run 100,000 updates in a single transaction. The zipvfs version takes YW> about 20 min while uncompressed version takes about 7 min. YW> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS YW> = 15;". YW> Any suggestions is very helpful. YW> Some background: YW> We have an app running on Android 4.2 using zipvfs. YW> The page size is 4096 YW> The cache size is - 2000 YW> Table schema for ITEM >> CREATE TABLE ITEM ( >> FIELD0 NUMERIC, >> FIELD1 NUMERIC, >> DS TEXT, >> FIELD2 TEXT, >> FIELD3 TEXT, >> FIELD4 NUMERIC, >> FIELD5 NUMERIC, >> FIELD6 NUMERIC, >> FIELD7 NUMERIC, >> FIELD8 NUMERIC, >> FIELD9 NUMERIC, >> FIELD10 NUMERIC, >> FIELD11 TEXT); YW> The third column: "DS" is what we query by almost all the time. We also YW> created index: >> CREATE INDEX DS_INDEX ON ITEM(DS); YW> There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the YW> one shipped with Android 4.2) size of 1.39gb. YW> Zipvfs db using zlib and aes128, which are default. YW> Thanks -- Tegmailto:t...@djii.com ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Thanks for the advice. I'll try them now. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Are your updates sorted by DS? If your queries are sorted then sequential queries are more likely to hit the same db pages while searching the index, resulting in higher cache usage and fewer decompression operations. This would have less benefit if your 100k DS values of the updates are randomly distributed through the 30m available, and more of an effect if they're tightly clustered in some range. If you mainly query by DS, is that the same as saying it is your primary key? If so, have you tried to declare it so and try the table WITHOUT ROWID, and don't bother with the index? It may help since you wouldn't have to decompress both the index pages and the table pages. > On 8 Sep 2017, at 12:33 pm, Dominique Pellé wrote: > > Yue Wu wrote: > >> Hello, >> As mentioned in the subject, our goal is to improve performance regarding >> to batch sql updates. >> The update sql as follow, >> >>> UPDATE ITEM SET FIELD4 =? WHERE DS=? >> >> We run 100,000 updates in a single transaction. The zipvfs version takes >> about 20 min while uncompressed version takes about 7 min. > > Which compression algorithm do you use with zipvfs? > Try LZ4, it's is times faster than zlib at compression and > decompression, at the cost of compressing a bit less: > > https://github.com/lz4/lz4 > > Or try zstd, which can compress more than zlib and faster (especially > on 64-bit architectures), but it's not as fast as LZ4: > > http://facebook.github.io/zstd/ > > What is also your SQLite page size? > Reducing the SQLite page size probably helps to speed up > updates, since zipvfs compresses by pages. > > Dominique > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Yue Wu wrote: > Hello, > As mentioned in the subject, our goal is to improve performance regarding > to batch sql updates. > The update sql as follow, > >> UPDATE ITEM SET FIELD4 =? WHERE DS=? > > We run 100,000 updates in a single transaction. The zipvfs version takes > about 20 min while uncompressed version takes about 7 min. Which compression algorithm do you use with zipvfs? Try LZ4, it's is times faster than zlib at compression and decompression, at the cost of compressing a bit less: https://github.com/lz4/lz4 Or try zstd, which can compress more than zlib and faster (especially on 64-bit architectures), but it's not as fast as LZ4: http://facebook.github.io/zstd/ What is also your SQLite page size? Reducing the SQLite page size probably helps to speed up updates, since zipvfs compresses by pages. Dominique ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Seeking advice on improving batch sql update performance on zipvfs
Hello, As mentioned in the subject, our goal is to improve performance regarding to batch sql updates. The update sql as follow, > UPDATE ITEM SET FIELD4 =? WHERE DS=? We run 100,000 updates in a single transaction. The zipvfs version takes about 20 min while uncompressed version takes about 7 min. I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15;". Any suggestions is very helpful. Some background: We have an app running on Android 4.2 using zipvfs. The page size is 4096 The cache size is - 2000 Table schema for ITEM > CREATE TABLE ITEM ( > FIELD0 NUMERIC, > FIELD1 NUMERIC, > DS TEXT, > FIELD2 TEXT, > FIELD3 TEXT, > FIELD4 NUMERIC, > FIELD5 NUMERIC, > FIELD6 NUMERIC, > FIELD7 NUMERIC, > FIELD8 NUMERIC, > FIELD9 NUMERIC, > FIELD10 NUMERIC, > FIELD11 TEXT); The third column: "DS" is what we query by almost all the time. We also created index: > CREATE INDEX DS_INDEX ON ITEM(DS); There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the one shipped with Android 4.2) size of 1.39gb. Zipvfs db using zlib and aes128, which are default. Thanks -- Yue Wu | Android Developer sqlite> EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15; addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 28000 Start at 28 1 Null 0 1 200 r[1..2]=NULL 2 OpenWrite 0 3 0 13 08 root=3 iDb=0; ITEM 3 OpenWrite 1 4 0 k(2,,) 02 root=4 iDb=0; ds_index 4 Integer1516000 r[16]=15 5 Affinity 161 0 B 00 affinity(r[16]) 6 SeekGE 1 27161 00 key=r[16] 7 IdxGT 1 27161 00 key=r[16] 8 DeferredSeek 1 0 000 Move 0 to 1.rowid if needed 9 Rowid 0 2 000 r[2]=rowid 10 IsNull 2 27000 if r[2]==NULL goto 27 11 Column 0 0 300 r[3]=ITEM.FIELD0 12 Column 0 1 400 r[4]=ITEM.FIELD1 13 Column 0 2 500 r[5]=ITEM.DS 14 Column 0 3 600 r[6]=ITEM.FIELD2 15 Column 0 4 700 r[7]=ITEM.FIELD3 16 Integer108 000 r[8]=10 17 Column 0 6 900 r[9]=ITEM.FIELD5 18 Column 0 7 10 00 r[10]=ITEM.FIELD6 19 Column 0 8 11 00 r[11]=ITEM.FIELD7 20 Column 0 9 12 00 r[12]=ITEM.FIELD8 21 Column 0 1013 00 r[13]=ITEM.FIELD9 22 Column 0 1114 00 r[14]=ITEM.FIELD10 23 Column 0 1215 00 r[15]=ITEM.FIELD11 24 MakeRecord 3 1317CCBBBCCCB 00 r[17]=mkrec(r[3..15]) 25 Insert 0 172 ITEM 07 intkey=r[2] data=r[17] 26Next 1 7 100 27Halt 0 0 000 28Transaction0 1 4 0 01 usesStmtJournal=0 29Goto 0 1 000___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] seeking advice
Hello, Rene > Hi there, > > I have to store and retrieve up to 2000 parameters. > The parameters can have real and integer values. > The max. change rate is 100ms and the max. duration is up to some hours. > > The simple solution would be to use plain binary files. It's fast but not > flexible. > > So I came to sqlite. Before starting I would like to ask if someone could > give > me some advice which way to go. Here are my current versions. > > Version 1: One big table with time stamp and one column for each parameter > - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values > with only SQLITE_MAX_VARIABLE_NUMBER=999? > - What about space consumption if NULL values are given? You can always recompile SQLite having desired parameters. Or you can make you own implementation of virtual table. In my opinion, having virtual table would be the most efficient way to inflate data into SQLite. Example: INSERT INTO my_table(timestamp, param_1, ..., param_n) SELECT NOW(), _1, _2, ..., _n FROM my_virtual_table; Though implementing virtual table correctly will take you some time that you would spend elswhere. The rule of thumb: do not bother with virtual table if CPU is not a bottleneck. Use this approach if you always need to query all of the parameters. > > Version 2: One table for each parameter with time stamp and value > - Does this scale for up to 2000 parameters? > - Can sqlite handle so much tables? > First of all, it depends on your usage. If at any given point of time you may need only one or few parameters and don't care about the others, you should group then in separate tables or even make a personal table for each of them. In this case, querying individual parameters or groups of parameters will be faster, since SQLite will not pick up extra data from the disk. But you must realise that each table will have its own copy of timestamp and potentially a ROWID (for non-WITHOUT ROWID tables) that will eat up extra space. Also inserts/updates will be much more expensive because data in the database will be fragmented. So for example if to insert one row into 2000-column table if would take to write a couple of pages to disk, inserting into 2000 tables would make at least 2000 pages to be written to disk. Consider the fact that in this case it would be the nightmare to make sure every timestamp value is associated with a value of every parameter. (If you have to provide this guarantee) Yes, SQLite can handle this many tables. > Version 3: One table with time stamp, parameter id and parameter value > - Is it working when all values change? > - Is retrieving values for one parameter fast? For me, this looks as the most efficient way of implementing what you want. Retrieving speed will depend on indexes you have. If you when you are querying parameters you always specific timestamp or time range, you can get away with compound index (timestamp, parameter_id). Then selects like: SELECT value FROM table WHERE timestamp = X; SELECT value FROM table WHERE timestamp = X AND ...; SELECT value FROM table WHERE timestamp > X; SELECT value FROM table WHERE timestamp < X; SELECT value FROM table WHERE timestamp > X; AND timestamp < y; SELECT value FROM table WHERE timestamp IN (X1, ..., Xn); will always be efficient. Do not create separate index like this, make it PROMARY KEY. And experement how using WITHOUT ROWID will perform. > > I will write and read the data on the same time. But writing should have > priority. I could live with a potential lost of data because of a power > failure. So I will use try the settings of the options "synchronous" and > "journal_mode". Anything else I should consider? > You may also play around with temp_store and cache_size pragmas. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] seeking advice
On 2014/12/09 22:41, Rene Zaumseil wrote: Hi there, I have to store and retrieve up to 2000 parameters. The parameters can have real and integer values. The max. change rate is 100ms and the max. duration is up to some hours. The simple solution would be to use plain binary files. It's fast but not flexible. So I came to sqlite. Before starting I would like to ask if someone could give me some advice which way to go. Here are my current versions. Version 1: One big table with time stamp and one column for each parameter - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values with only SQLITE_MAX_VARIABLE_NUMBER=999? - What about space consumption if NULL values are given? Version 2: One table for each parameter with time stamp and value - Does this scale for up to 2000 parameters? - Can sqlite handle so much tables? Version 3: One table with time stamp, parameter id and parameter value - Is it working when all values change? - Is retrieving values for one parameter fast? Definitely No. 3 always. SQLite will retrieve a value from among 2000 items on an indexed column in under 1ms (as will any other localized DB). Even I/O lag will be masked by the cache at that size. Assuming the parameter names do not change all the time and the time-stamp column is not indexed, writing will be similarly fast using a suitable journal mode. On this point, you might simply use an in-memory DB since you do not care about data loss on power failure (use ":memory:" as the file-name) which will be even faster but with an on-disk DB you will have a snapshot remain of the current parameter set if your application fails - which might assist debugging. Also, making that DB, be sure to use "WITHOUT ROWID" and specifying the parameter-name as the primary key. Use Numeric affinity for time-stamp and value columns. Cheers, Ryan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] seeking advice
On 9 Dec 2014, at 8:41pm, Rene Zaumseil wrote: > Version 3: One table with time stamp, parameter id and parameter value > - Is it working when all values change? > - Is retrieving values for one parameter fast? That one. Versions 1 & 2 will both, technically, work, but they're abuse of how SQL should be used and will result in horrible code. The speed for retrieving all parameters will be bound by your programming language. SQLite will do its side of the job very quickly. And since columns have just affinity and not type, having some values INTEGER and other REAL will work fine. > I will write and read the data on the same time. But writing should have > priority. Multithread ? Multiprocess ? Neither are needed, but those are the things you need to decide on next. Also, do not forget to pick a journal mode and to set a busy_timeout. Ah, I see you already mentioned journal_mode. Good. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] seeking advice
Hi there, I have to store and retrieve up to 2000 parameters. The parameters can have real and integer values. The max. change rate is 100ms and the max. duration is up to some hours. The simple solution would be to use plain binary files. It's fast but not flexible. So I came to sqlite. Before starting I would like to ask if someone could give me some advice which way to go. Here are my current versions. Version 1: One big table with time stamp and one column for each parameter - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values with only SQLITE_MAX_VARIABLE_NUMBER=999? - What about space consumption if NULL values are given? Version 2: One table for each parameter with time stamp and value - Does this scale for up to 2000 parameters? - Can sqlite handle so much tables? Version 3: One table with time stamp, parameter id and parameter value - Is it working when all values change? - Is retrieving values for one parameter fast? I will write and read the data on the same time. But writing should have priority. I could live with a potential lost of data because of a power failure. So I will use try the settings of the options "synchronous" and "journal_mode". Anything else I should consider? Thank you rene ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
RE: [sqlite] Seeking advice on On Demand Loading
I've found that the best trade-off in performance and memory for on-demand loading is to first run a query which retrieves all the id's of the items you want in the list and store the list in memory. Then you can use that to run a second query for full data using a where clause with "ID IN (...)". This is much faster than using LIMIT/OFFSET but has overhead of maintaining an array (or whatever) of ID's (find for a few thousand, millions could be problematic, but then again, a list with millions of records is problematic no matter how you do it). HTH, Sam --- Were Hiring! Seeking a passionate developer to join our team building products. Position is in the Washington D.C. metro area. If interested contact [EMAIL PROTECTED] -Original Message- From: Olaf Beckman Lapré [mailto:[EMAIL PROTECTED] Sent: Friday, September 28, 2007 3:48 PM To: sqlite-users@sqlite.org Subject: [sqlite] Seeking advice on On Demand Loading Hello, I'm looking for some advice on implementing 'On Demand Loading' in my program. Some GUI controls such as a Listview have a 'virtual mode' where the control only needs to be loaded with the items currently shown. The Listview will call a callback function where it indicates which items need to be loaded. BTW I'm using wxWidgets for my GUI library. My question centers on how to implement this using SQLite as I'm not really sure how to retrieve 'record numbers 445 to 456' using SQL statements efficiently. Any thoughts on this? Kind regards, Olaf - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Seeking advice on On Demand Loading
Hello, I'm looking for some advice on implementing 'On Demand Loading' in my program. Some GUI controls such as a Listview have a 'virtual mode' where the control only needs to be loaded with the items currently shown. The Listview will call a callback function where it indicates which items need to be loaded. BTW I'm using wxWidgets for my GUI library. My question centers on how to implement this using SQLite as I'm not really sure how to retrieve 'record numbers 445 to 456' using SQL statements efficiently. Any thoughts on this? Kind regards, Olaf