Re: [sqlite] Efficiency of concatenation on UPDATE
On 6 February 2012 05:20, Dan Kennedy wrote: > I think when two doclists are merged both are loaded into memory. > Some types of queries load the entire doclist for a term into > memory too. Hmm, sounds like I definitely need to stick with rows representing document pages rather than whole documents. Thank you for the advice! Best wishes, Hamish ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficiency of concatenation on UPDATE
On 02/06/2012 05:54 AM, Hamish Allan wrote: On 5 February 2012 21:20, Roger Binns wrote: SQLite doesn't work on rows - it works on pages. A row will be contained within one or more pages. FWIW, I inspected the source for OP_Concat and found that it can sometimes avoid a memcpy (but presumably not if there isn't enough space in contiguous free pages). What are you already doing with regards to size and what are your constraints? For example the default values will result in a page cache of 2MB. I don't have hard constraints. This is for iOS, which monitors user processes and kills them for what it deems unruly memory usage, where "unruly" is not publicly defined and is subject to change between OS versions! There are numerous existing APIs in SQLite for having fine grained control over memory allocation. Thanks for the tips! memory_highwater() agrees with my profiler that the transient memory requirements are largely flat under standard concatenation :) However, FTS4 does appear to take the naive approach of removing and re-adding everything from its index every time. Inspecting the source in sqlite3Fts3UpdateMethod(), an UPDATE acts just like a DELETE in removing the old record. Top-level comments state that FTS segments are built from the bottom up and are not updatable. However, the doclist format seems to be written in such a way that positions could be copied wholesale from the old record to the new, prior to appending new positions. Please, can anyone familiar with the code comment on this? I think when two doclists are merged both are loaded into memory. Some types of queries load the entire doclist for a term into memory too. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficiency of concatenation on UPDATE
On 5 February 2012 21:20, Roger Binns wrote: > SQLite doesn't work on rows - it works on pages. A row will be contained > within one or more pages. FWIW, I inspected the source for OP_Concat and found that it can sometimes avoid a memcpy (but presumably not if there isn't enough space in contiguous free pages). > What are you already doing with regards to size and what are your > constraints? For example the default values will result in a page cache > of 2MB. I don't have hard constraints. This is for iOS, which monitors user processes and kills them for what it deems unruly memory usage, where "unruly" is not publicly defined and is subject to change between OS versions! > There are numerous existing APIs in SQLite for having fine grained control > over memory allocation. Thanks for the tips! memory_highwater() agrees with my profiler that the transient memory requirements are largely flat under standard concatenation :) However, FTS4 does appear to take the naive approach of removing and re-adding everything from its index every time. Inspecting the source in sqlite3Fts3UpdateMethod(), an UPDATE acts just like a DELETE in removing the old record. Top-level comments state that FTS segments are built from the bottom up and are not updatable. However, the doclist format seems to be written in such a way that positions could be copied wholesale from the old record to the new, prior to appending new positions. Please, can anyone familiar with the code comment on this? Many thanks, Hamish ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficiency of concatenation on UPDATE
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/02/12 12:09, Hamish Allan wrote: > Sure, but a row can be re-written by copying each byte to memory and > back out again, SQLite doesn't work on rows - it works on pages. A row will be contained within one or more pages. > WRT premature optimisation, I just need to keep the high memory > watermark as low as I can. You should have said that! SQLite has been used in some very low memory devices. You can purchase support if you need to get lower than stock SQLite. (For example they have a way you can omit the parser by having the prepared queries stored in the database.) What are you already doing with regards to size and what are your constraints? For example the default values will result in a page cache of 2MB. There are numerous existing APIs in SQLite for having fine grained control over memory allocation. http://www.sqlite.org/c3ref/c_config_getmalloc.html http://www.sqlite.org/c3ref/soft_heap_limit64.html http://www.sqlite.org/c3ref/memory_highwater.html Run your representative workload and see what numbers you get and tune accordingly. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8u8owACgkQmOOfHg372QRtAQCfeaSpuABYCGuk0MRrbNnwhVXb WoYAoMjKmpenzekb1Xvtz6IRAeQKycUA =4u/1 -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficiency of concatenation on UPDATE
On 5 February 2012 19:11, Roger Binns wrote: > The values for a row are stored sequentially. Changing the size of a > value will at least require rewriting the row. Sure, but a row can be re-written by copying each byte to memory and back out again, or by copying the whole row into memory and then back out again. The former case uses O(1) space, the latter O(n). Reading the file format doesn't help me differentiate between these cases. WRT premature optimisation, I just need to keep the high memory watermark as low as I can. Consider a document X split into pages X1, X2... Xn, where n can be arbitrarily large. I don't have enough memory to contain all the text in X, so I need to work out whether I need to make my rows Xi, which will make the kind of query I want to do much less efficient, or whether I can store the whole of X in a single row. The profiler might be able to answer this question, but so might someone on this mailing list. I don't need the profiler to know which question to *ask*. Also, if UPDATE fts4table SET content = content || 'new content' effectively removes and re-inserts entries for every word in the existing content, this is prima facie unacceptable. Just because it's a bad idea to prematurely optimise doesn't mean it's a good idea to code as if you have no idea of algorithmic complexity. If there is a more suitable mailing list on which to ask my question, please let me know. Thanks, Hamish ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficiency of concatenation on UPDATE
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/02/12 10:13, Hamish Allan wrote: > 1) Is concatenation on UPDATE performed in-place? The values for a row are stored sequentially. Changing the size of a value will at least require rewriting the row. I suggest reading the documentation on the file format which will make it clear how everything is stored: http://www.sqlite.org/fileformat2.html I also suggest you use a profiler to work out where your bottlenecks actually are rather than worrying about theoretical "efficiency" like this. (Put another way, if your code spends 3% of its time in the critical path doing these updates then look in the other 97% for improvements.) If you are trying to reduce the amount of I/O then you'll need something repeatable while you tweak and tune. blktrace on Linux can help see what I/O happens. If you have more deep instrumenting in your program then you can make a new VFS that calls the existing ones but in the xRead and xWrite routines also grab the current query being executed so you can assign blame. TLDR: Only worry about "efficiency" when you have accurately measured issues, and then make sure you have instrumentation as you fine tune to improve it. This is rarely a good use of your time. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8u1FwACgkQmOOfHg372QTHbgCfRl1sY63frasmccFBckdwRWjg RPAAoJ9xECJNVMu+jvGnbOpOuLZQ0VfQ =Zuhs -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Efficiency of concatenation on UPDATE
1) Is concatenation on UPDATE performed in-place? For example, with the following: CREATE TABLE example (content TEXT); INSERT INTO example(content) VALUES ('Hello, world!'); UPDATE example SET content = content || ' How are you?' WHERE rowid = 1; Is the old value copied out to memory before being concatenated with the new value and re-inserted, or is SQLite able to avoid that step? (This is for an incremental index of documents on a low-memory device.) 2) For the same operation an FTS4 table: CREATE VIRTUAL TABLE example USING fts4(content TEXT); Is FTS4 smart enough to know not to remove and re-add the 'hello' and 'world' from its shadow tables before adding the 'how', 'are' and 'you'? Thanks, Hamish ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users