Re: [HACKERS] TOAST usage setting

2007-06-08 Thread Zeugswetter Andreas ADI SD
My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but: Split data wider than a page into page sized chunks as long as they fill whole pages. Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now. This would not waste more space than currently, but improve performance

Re: [HACKERS] TOAST usage setting

2007-06-07 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Well, it is summarized here: http://momjian.us/expire/TOAST/SUMMARY.html It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed like a good compromise. Is this still testing with all data fitting

Re: [HACKERS] TOAST usage setting

2007-06-07 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Is this still testing with all data fitting in RAM? Yes. Having things out of RAM is going to make access even slower, but it is going to allow the heap to be in RAM more often. It would let us measure the actual impact of

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Zeugswetter Andreas ADI SD
No, you misunderstood. Bruce was suggesting changing the target to 512. That means if a row is wider than ~2k, toaster will try to toast until the base row is ~512 bytes. I would not do that part for 8.3. OK, what do you suggest for 8.3? Attached are my suggestion to use 512 and a

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Bruce Momjian
Zeugswetter Andreas ADI SD wrote: No, you misunderstood. Bruce was suggesting changing the target to 512. That means if a row is wider than ~2k, toaster will try to toast until the base row is ~512 bytes. I would not do that part for 8.3. OK, what do you suggest for 8.3?

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Well, it is summarized here: http://momjian.us/expire/TOAST/SUMMARY.html It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed like a good compromise. Is this still testing with all data fitting in RAM? -- Gregory Stark

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Zeugswetter Andreas ADI SD
While I agree, that 2 might be a good compromise with low risc for now, I think that toasting all rows down to ~512 bytes is too narrowly targeted at not reading wider columns. Well, it is summarized here: http://momjian.us/expire/TOAST/SUMMARY.html It made non-TOAST access

Re: [HACKERS] TOAST usage setting

2007-06-05 Thread Zeugswetter Andreas ADI SD
The big question is do we want to drop the target tuple size down to 512, and increase the chunk size to 8k for 8.3? Dropping the tuple size down to 512 is going to give us some smaller TOAST values to fill in free space created by the 8k chuck size, assuming you have both types

Re: [HACKERS] TOAST usage setting

2007-06-05 Thread Bruce Momjian
Zeugswetter Andreas ADI SD wrote: The big question is do we want to drop the target tuple size down to 512, and increase the chunk size to 8k for 8.3? Dropping the tuple size down to 512 is going to give us some smaller TOAST values to fill in free space created by the 8k

Re: [HACKERS] TOAST usage setting

2007-06-04 Thread Jim C. Nasby
On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote: I think the long-term solution is to go to a 2k/8k fragment/block model, but that isn't going to happen for 8.3. There might well have been lessons learned since UFS (anyone know what ZFS does in this regard?), but I agree that we

Re: [HACKERS] TOAST usage setting

2007-06-04 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote: The big question is do we want to drop the target tuple size down to 512, and increase the chunk size to 8k for 8.3? If we do that people could see their disk space usage increase by up to

Re: [HACKERS] TOAST usage setting

2007-06-01 Thread Bruce Momjian
Gregory Stark wrote: Gregory Stark [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: shared_buffers again was 32MB so all the data was in memory. The case where all the data is in memory is simply not interesting. The cost of TOAST is the random access seeks it

Re: [HACKERS] TOAST usage setting

2007-06-01 Thread Bruce Momjian
Zeugswetter Andreas ADI SD wrote: It is good. It shows, that we even see a small advantage in the everything cached case. What we don't have yet is numbers for whether EXTERN_TUPLES_PER_PAGE=1 substantially increases the toast table size for real life scenarios, what happens in the worst

Re: [HACKERS] TOAST usage setting

2007-05-31 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: shared_buffers again was 32MB so all the data was in memory. The case where all the data is in memory is simply not interesting. The cost of TOAST is the random access seeks it causes. You seem to be intentionally avoiding testing the precise thing we're

Re: [HACKERS] TOAST usage setting

2007-05-31 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: shared_buffers again was 32MB so all the data was in memory. The case where all the data is in memory is simply not interesting. The cost of TOAST is the random access seeks it causes. You seem to be

Re: [HACKERS] TOAST usage setting

2007-05-31 Thread Zeugswetter Andreas ADI SD
I tested EXTERN_TUPLES_PER_PAGE for values 4(default), 2, and 1: 4 15.596 2 15.197 1 14.6 which is basically a 3% decrease from 4-2 and 2-1. The test script and result are here: http://momjian.us/expire/TOAST2/ shared_buffers again was

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Zeugswetter Andreas ADI SD
I reran the tests with hashtext(), and created a SUMMARY.HTML chart: http://momjian.us/expire/TOAST/ What you will see is that pushing TEXT to a TOAST column allows quick access to non-TOAST values and single-row TOAST values, but accessing all TOAST columns is slower than

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Tom Lane
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: My expectation would be, that we want to allow a toast tuple to fill a whole page (TOAST_TUPLES_PER_PAGE = 1), I've been wondering about that too. It certainly needs to be experimented with, independently of TOAST_TUPLE_THRESHOLD, now that

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: My expectation would be, that we want to allow a toast tuple to fill a whole page (TOAST_TUPLES_PER_PAGE = 1), I thought that previously but then I started thinking maybe that's not true. Picture

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Whereas if you set toast_tuples_per_page to 8k then the only option for Postgres will be to put each datum in its own page and waste 1-3k on every page. No, because actually the code is designed to make the toast chunk size just enough less than 8K that

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: The shorter-than-normal tuples carrying the last chunk of any particular datum are going to result in wasted space to the extent that we can't pack them together on a page, but that's true now. Right now, if you have a large toasted datum, it mostly will

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Zeugswetter Andreas ADI SD
Whereas if you set toast_tuples_per_page to 8k then the only option for Postgres will be to put each datum in its own page and waste 1-3k on every page. No, because actually the code is designed to make the toast chunk size just enough less than 8K that the tuples fit. He

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Bruce Momjian
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Whereas if you set toast_tuples_per_page to 8k then the only option for Postgres will be to put each datum in its own page and waste 1-3k on every page. No, because actually the code is designed to make the toast chunk size just

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Heikki Linnakangas
Bruce Momjian wrote: What you will see is that pushing TEXT to a TOAST column allows quick access to non-TOAST values and single-row TOAST values, but accessing all TOAST columns is slower than accessing them in the heap, by a factor of 3-18x. Looking at the chart, it seems 512 is the proper

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Uh, am I supposed to be running more TOAST tests? Would someone explain what they want tested? If you want my opinion I would say we need two tests: 1) For TOAST_TUPLE_TARGET: We need to run the test scripts you have already for sizes that cause

Re: [HACKERS] TOAST usage setting

2007-05-30 Thread Bruce Momjian
I tested EXTERN_TUPLES_PER_PAGE for values 4(default), 2, and 1: 4 15.596 2 15.197 1 14.6 which is basically a 3% decrease from 4-2 and 2-1. The test script and result are here: http://momjian.us/expire/TOAST2/ shared_buffers again was 32MB

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: I tested TOAST using a method similar to the above method against CVS HEAD, with default shared_buffers = 32MB and no assert()s. I created backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), 8, 16, 32, 64) which gives

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I tested TOAST using a method similar to the above method against CVS HEAD, with default shared_buffers = 32MB and no assert()s. I created backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), 8, 16, 32,

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I tested TOAST using a method similar to the above method against CVS HEAD, with default shared_buffers = 32MB and no assert()s. I created backends with power-of-2 seetings for

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes: Is your database initialized with C locale? If so then length(text) is optimized to not have to detoast: if (pg_database_encoding_max_length() == 1) PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); Of course I got that wrong.

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I tested TOAST using a method similar to the above method against CVS HEAD, with default shared_buffers = 32MB and no assert()s. I created backends with

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Alvaro Herrera
Bruce Momjian wrote: Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I tested TOAST using a method similar to the above method against CVS HEAD, with default shared_buffers = 32MB and no assert()s. I

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Alvaro Herrera wrote: Strangely, 128 bytes seems to be the break-even point for TOAST and non-TOAST, even for sequential scans of the entire heap touching all long row values. I am somewhat confused why TOAST has faster access than inline heap data. Is your database

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Is your database initialized with C locale? If so then length(text) is optimized to not have to detoast: if (pg_database_encoding_max_length() == 1) PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); Wow,

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Alvaro Herrera
Bruce Momjian wrote: My test uses random data, which I figured was a close to real-world as I could get, and I have a test that makes sure the data was pushed to the TOAST table. Should I still try EXTERNAL? My point is that you probably want to measure separately the effect of compression

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Alvaro Herrera wrote: Bruce Momjian wrote: My test uses random data, which I figured was a close to real-world as I could get, and I have a test that makes sure the data was pushed to the TOAST table. Should I still try EXTERNAL? My point is that you probably want to measure

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Is your database initialized with C locale? If so then length(text) is optimized to not have to detoast: if (pg_database_encoding_max_length() == 1)

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: No, we did substring() too :) Uh, I looked at text_substring(), and while there is an optimization to do character counting for encoding length == 1, it is still accessing the data. Sure but it'll only access the first chunk. There are two chunks in

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: No, we did substring() too :) Uh, I looked at text_substring(), and while there is an optimization to do character counting for encoding length == 1, it is still accessing the data. Sure but it'll only access the first

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Bruce Momjian wrote: Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: No, we did substring() too :) Uh, I looked at text_substring(), and while there is an optimization to do character counting for encoding length == 1, it is still accessing the data. Sure but

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: I reran the tests with hashtext(), and created a SUMMARY.HTML chart: http://momjian.us/expire/TOAST/ I don't understand what the numbers in this chart are? regards, tom lane ---(end of

Re: [HACKERS] TOAST usage setting

2007-05-29 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I reran the tests with hashtext(), and created a SUMMARY.HTML chart: http://momjian.us/expire/TOAST/ I don't understand what the numbers in this chart are? They are taken from the test script and output files that are also in

Re: [HACKERS] TOAST usage setting

2007-05-28 Thread Chris Browne
[EMAIL PROTECTED] (Bruce Momjian) writes: The results are here: http://momjian.us/expire/TOAST/ I'll take a look and see if there's anything further it makes sense for me to try testing. Thanks for following up so quickly; what with the cold I have had, I haven't yet gotten back to the