Re: [sqlite] Efficiency of concatenation on UPDATE

2012-02-06 Thread Hamish Allan
On 6 February 2012 05:20, Dan Kennedy  wrote:

> I think when two doclists are merged both are loaded into memory.
> Some types of queries load the entire doclist for a term into
> memory too.

Hmm, sounds like I definitely need to stick with rows representing
document pages rather than whole documents. Thank you for the advice!

Best wishes,
Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficiency of concatenation on UPDATE

2012-02-05 Thread Dan Kennedy

On 02/06/2012 05:54 AM, Hamish Allan wrote:

On 5 February 2012 21:20, Roger Binns  wrote:


SQLite doesn't work on rows - it works on pages.  A row will be contained
within one or more pages.


FWIW, I inspected the source for OP_Concat and found that it can
sometimes avoid a memcpy (but presumably not if there isn't enough
space in contiguous free pages).


What are you already doing with regards to size and what are your
constraints?  For example the default values will result in a page cache
of 2MB.


I don't have hard constraints. This is for iOS, which monitors user
processes and kills them for what it deems unruly memory usage, where
"unruly" is not publicly defined and is subject to change between OS
versions!


There are numerous existing APIs in SQLite for having fine grained control
over memory allocation.


Thanks for the tips! memory_highwater() agrees with my profiler that
the transient memory requirements are largely flat under standard
concatenation :)

However, FTS4 does appear to take the naive approach of removing and
re-adding everything from its index every time.

Inspecting the source in sqlite3Fts3UpdateMethod(), an UPDATE acts
just like a DELETE in removing the old record. Top-level comments
state that FTS segments are built from the bottom up and are not
updatable.

However, the doclist format seems to be written in such a way that
positions could be copied wholesale from the old record to the new,
prior to appending new positions. Please, can anyone familiar with the
code comment on this?


I think when two doclists are merged both are loaded into memory.
Some types of queries load the entire doclist for a term into
memory too.

Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficiency of concatenation on UPDATE

2012-02-05 Thread Hamish Allan
On 5 February 2012 21:20, Roger Binns  wrote:

> SQLite doesn't work on rows - it works on pages.  A row will be contained
> within one or more pages.

FWIW, I inspected the source for OP_Concat and found that it can
sometimes avoid a memcpy (but presumably not if there isn't enough
space in contiguous free pages).

> What are you already doing with regards to size and what are your
> constraints?  For example the default values will result in a page cache
> of 2MB.

I don't have hard constraints. This is for iOS, which monitors user
processes and kills them for what it deems unruly memory usage, where
"unruly" is not publicly defined and is subject to change between OS
versions!

> There are numerous existing APIs in SQLite for having fine grained control
> over memory allocation.

Thanks for the tips! memory_highwater() agrees with my profiler that
the transient memory requirements are largely flat under standard
concatenation :)

However, FTS4 does appear to take the naive approach of removing and
re-adding everything from its index every time.

Inspecting the source in sqlite3Fts3UpdateMethod(), an UPDATE acts
just like a DELETE in removing the old record. Top-level comments
state that FTS segments are built from the bottom up and are not
updatable.

However, the doclist format seems to be written in such a way that
positions could be copied wholesale from the old record to the new,
prior to appending new positions. Please, can anyone familiar with the
code comment on this?

Many thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficiency of concatenation on UPDATE

2012-02-05 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/02/12 12:09, Hamish Allan wrote:
> Sure, but a row can be re-written by copying each byte to memory and 
> back out again,

SQLite doesn't work on rows - it works on pages.  A row will be contained
within one or more pages.

> WRT premature optimisation, I just need to keep the high memory 
> watermark as low as I can.

You should have said that!  SQLite has been used in some very low memory
devices.  You can purchase support if you need to get lower than stock
SQLite.  (For example they have a way you can omit the parser by having
the prepared queries stored in the database.)

What are you already doing with regards to size and what are your
constraints?  For example the default values will result in a page cache
of 2MB.

There are numerous existing APIs in SQLite for having fine grained control
over memory allocation.

  http://www.sqlite.org/c3ref/c_config_getmalloc.html
  http://www.sqlite.org/c3ref/soft_heap_limit64.html
  http://www.sqlite.org/c3ref/memory_highwater.html

Run your representative workload and see what numbers you get and tune
accordingly.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8u8owACgkQmOOfHg372QRtAQCfeaSpuABYCGuk0MRrbNnwhVXb
WoYAoMjKmpenzekb1Xvtz6IRAeQKycUA
=4u/1
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficiency of concatenation on UPDATE

2012-02-05 Thread Hamish Allan
On 5 February 2012 19:11, Roger Binns  wrote:

> The values for a row are stored sequentially.  Changing the size of a
> value will at least require rewriting the row.

Sure, but a row can be re-written by copying each byte to memory and
back out again, or by copying the whole row into memory and then back
out again. The former case uses O(1) space, the latter O(n). Reading
the file format doesn't help me differentiate between these cases.

WRT premature optimisation, I just need to keep the high memory
watermark as low as I can. Consider a document X split into pages X1,
X2... Xn, where n can be arbitrarily large. I don't have enough memory
to contain all the text in X, so I need to work out whether I need to
make my rows Xi, which will make the kind of query I want to do much
less efficient, or whether I can store the whole of X in a single row.
The profiler might be able to answer this question, but so might
someone on this mailing list. I don't need the profiler to know which
question to *ask*.

Also, if UPDATE fts4table SET content = content || 'new content'
effectively removes and re-inserts entries for every word in the
existing content, this is prima facie unacceptable. Just because it's
a bad idea to prematurely optimise doesn't mean it's a good idea to
code as if you have no idea of algorithmic complexity.

If there is a more suitable mailing list on which to ask my question,
please let me know.

Thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficiency of concatenation on UPDATE

2012-02-05 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/02/12 10:13, Hamish Allan wrote:
> 1) Is concatenation on UPDATE performed in-place?

The values for a row are stored sequentially.  Changing the size of a
value will at least require rewriting the row.

I suggest reading the documentation on the file format which will make it
clear how everything is stored:

  http://www.sqlite.org/fileformat2.html

I also suggest you use a profiler to work out where your bottlenecks
actually are rather than worrying about theoretical "efficiency" like
this.  (Put another way, if your code spends 3% of its time in the
critical path doing these updates then look in the other 97% for
improvements.)

If you are trying to reduce the amount of I/O then you'll need something
repeatable while you tweak and tune.  blktrace on Linux can help see what
I/O happens.  If you have more deep instrumenting in your program then you
can make a new VFS that calls the existing ones but in the xRead and
xWrite routines also grab the current query being executed so you can
assign blame.

TLDR: Only worry about "efficiency" when you have accurately measured
issues, and then make sure you have instrumentation as you fine tune to
improve it.  This is rarely a good use of your time.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8u1FwACgkQmOOfHg372QTHbgCfRl1sY63frasmccFBckdwRWjg
RPAAoJ9xECJNVMu+jvGnbOpOuLZQ0VfQ
=Zuhs
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Efficiency of concatenation on UPDATE

2012-02-05 Thread Hamish Allan
1) Is concatenation on UPDATE performed in-place?

For example, with the following:

CREATE TABLE example (content TEXT);
INSERT INTO example(content) VALUES ('Hello, world!');
UPDATE example SET content = content || ' How are you?' WHERE rowid = 1;

Is the old value copied out to memory before being concatenated with
the new value and re-inserted, or is SQLite able to avoid that step?
(This is for an incremental index of documents on a low-memory
device.)

2) For the same operation an FTS4 table:

CREATE VIRTUAL TABLE example USING fts4(content TEXT);

Is FTS4 smart enough to know not to remove and re-add the 'hello' and
'world' from its shadow tables before adding the 'how', 'are' and
'you'?

Thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users