recyclable empty pages.
I think piggybacking of I/Os are very useful. Buffer manager helps us
folding up some of I/Os, but explicit orders are more effective.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast
, in the case we can retrive several pages in one disk read, etc.
This is also an optimizing issue.
Any ideas?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
the patch. It must be fixed, but I want to measure the advantage
before that.
I'm interested in which parameter is useful for each environment.
Any comments and testing reports will be appreciated.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end
statements, and so on.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do
Jochem van Dieten [EMAIL PROTECTED] wrote:
On 12/28/06, ITAGAKI Takahiro wrote:
| [TODO item] Allow data to be pulled directly from indexes
| Another idea is to maintain a bitmap of heap pages where all rows are
| visible to all backends, and allow index lookups to reference that bitmap
is not in both bitmaps. Otherwise, put it into (2).
We can use (1) for the index only scanning, because all of the tuples not
recorded in (1) are visible to all backends, regardless of whether they
are also recorded in (2).
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
, we'd better to set those values as
default. I assume we can derive them from existing checkpoint_timeout.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 5: don't forget to increase your free space map
It is not done yet, but we can use DSM for this purpose. If the corresponding
bit in DSM is '0', all tuples in the page are frozen and visible to all
backends. We don't have to look up frozen pages only for visibiliby checking.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
time_t start = time(NULL);
while (time(NULL) - start 30) // (2) sleep -- 30s
{
pg_usleep(BgWriterDelay * 1000L);
BgBufferSync();
AbsorbFsyncRequests();
}
smgrsync(); // (3) fsync -- less than 200ms
}
Regards,
---
ITAGAKI
WIFSIGNALED(w) ((w) 0x4000) != 0)
#define WTERMSIG(w) (w) // or ((w) 0x3FFF)
However, it comes from reverse engineering of the headers of Windows.
I cannot find any official documentation.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end
...
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 6: explain analyze is your friend
O_SYNC or O_DIRECT, but very poor performance.
4. We may settle for single fsync(), but not many fsync()s in a short time.
I just suggested 4.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 2: Don't 'kill
checkpoint_segements, or else new WAL files would be
created unboundedly, as Bruce pointed.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose
a difficult combination
of tuning. If you have *idea for improvements*, please suggest it.
I think we've already understood *problem itself*.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 9: In versions
the same way to fsync? We call fsync()s to all modified
files without rest in mdsync(), but it's not difficult at all to insert
sleeps between fsync()s. Do you think it helps us? One of issues is that
we have to sleep in file unit, which is maybe rough granularity.
Regards,
---
ITAGAKI Takahiro
NTT
ones in the tail of LRU? We might need activity control of bgwriter. Buffers
are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
if its settings are same as usual.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast
the result will be between (3) and (5).
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so
files by renaming the old ones to higher numbers (we can't rename
them until the checkpoint is complete)?
Checkpoints should be done by the next one, so we need WAL files for two
checkpoints. It is the same as now.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
entries generated in index-vacuuming or heap-vacuuming
phase are not so serious. However, entries for FREEZE are generated in
heap-scanning phase, it is before index-vacuuming.
Are there any better fixes? Comments welcome.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
basis, but fsync() only on a file basis.
Also, database has own access-frequency information for its buffers,
so I think 1st approach behaves better in handling re-dirty of buffers.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast
if you could give us some more background.
- discuss various other approaches to the problem, and why we are now
proposing one specific approach and receive why dont we... feedback
and additional ideas (Simon)
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
.
The above ideas probably do not work well.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
goal to shrink the headers is 16 bytes. The headers
become 23 bytes using phantom cids and we are limited by alignments, so we will
have no more advantages unless we delete extra 7 bytes in the headers.
...and it seems to be very difficult.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software
database size ?
(the latter is a bit too radical interpretation, though.)
So I think it is not so odd to give a unit to max_fsm_pages.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 4: Have you searched
GUC_UNIT_BLOCKS and
GUC_UNIT_XLOG_BLCKSZ unit? I feel inconsistency in them.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http
, rfds, NULL, NULL, sel_timeout) 0)
| { ...
After stuck, select() always results in time-out. LOG: pgstat select is
repeated every 2 seconds (maybe PGSTAT_SELECT_TIMEOUT).
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast
to work very well!
I ran the same workload on the HEAD, and I did not see any
pgstat.stat related logs now.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
and Windows?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 6: explain analyze is your friend
the patch.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
snapshot_subtrans.patch
Description: Binary data
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
of superuser_reserved_connections.
Thank you!
It may be as well to add the same mention to the description of
superuser_reserved_connections.
| Determines the number of connection slots that are reserved for connections
| by PostgreSQL superusers, *including autovacuum*.
Regards,
---
ITAGAKI Takahiro
few people have seen it. Autovacuum can always start in normal use.
(I found it when I tried to make multi-processed autovacuum.)
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 3: Have you checked our
(bid) references branches (bid);
...
Are you interested in this idea?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
.
This is a too simplified policy, but we probably need documentation for
the linkages between autovacuum and fillfactors.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner
my, *NTT*'s presentation :D
But sorry, there are my company-specific troubles to open the source :-(
I missed 8.2 feature freeze, so gave up proposing it.
Please wait for a moment, or invent more effective checkpoint methods!
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
. Is this in-line with what others were thinking?
I agree. We can use autovacuum thresholds and cost-delay parameters to
control the frequency and priority of vacuum. I don't think it is good
to control vacuums by changing naptime.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
? and How to set the fillfactor?.
I hope you to write how to interpret the framgentation (and other) info
in README. In my understanding, I'll write You'd better do REINDEX when
you see the fragmentation is greater than 50% under the present
calculation method.
Regards,
---
ITAGAKI Takahiro
NTT Open
of the indexes are same.
I worry that users will misunderstand the 50% of fragmentation -- if the
report says 100%, they'll consider to do REINDEX. But 50%, the necessity
is unclear.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast
?
Comments welcome.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
autovacuum_adjust_naptime-0817.patch
Description: Binary data
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
which method is better? Or do you have other ideas?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
of buffers. A sequential scanning
is used in SLRU, so it will not work well against many buffers.
I hope some cares in upper layer, snapshot, hitbits or something,
being discussed in the recent thread.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end
and DTrace methods. I assume we want to gather the
statistics per resource (represented by LWLockKind in my patch), not per
LWLockId.
Even if we use DTrace, do we need some supports for coloring of lwlocks?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
SELECT abalance FROM accounts WHERE aid::int8 = :aid; -- cast to force seqscan
-- cs_indexscan.sql
\set naccounts 10 * :tps
\setrandom aid 1 :naccounts
SELECT abalance FROM accounts WHERE aid = :aid;
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
-to-stderr method is hard to use because it will increase syslogs
and requires re-parsing efforts.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http
| 0 | 0 | 0 |
0 |
(28 rows)
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 6: explain analyze is your friend
a little
!* free space is inefficient.
!*/
! if (ndeletable = 2)
_bt_delitems(rel, buffer, deletable, ndeletable);
/*
* Note: if we didn't find any LP_DELETE items, then the page's
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
| dead_tuple_count
-+--
900 |0
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
Tom Lane [EMAIL PROTECTED] wrote:
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code.
Had you done any performance testing
at-a-time, so the problem is resolved, isn't it?
http://archives.postgresql.org/pgsql-patches/2006-05/msg8.php
I think this feature is independent from the SITC project and useful for
heavily-updated indexes. If it is worthwhile, I'll revise the patch to
catch up on HEAD.
Regards,
---
ITAGAKI
,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 6: explain analyze is your friend
on fixed size integers.
Comments and suggestions are welcome.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
resolve this by adding a T_String handler to defGetBoolean().
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command
pseudo-code. (I don't know it works in fact...)
my_sync_file_range(fd, offset, nbytes, ...)
{
void *p = mmap(NULL, nbytes, ..., fd, offset);
msync(p, nbytes, MS_ASYNC);
munmap(p, nbytes);
}
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
system call, but we could use
the combination of mmap() and msync() instead of it; I mean we can use
mmap only to flush dirty pages, not to read or write pages.
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 3: Have
the saving of a few bytes in particular for indexes
on VLDBs, but my patch is still incomplete and needs more works.
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send
, I'll add a options array to pg_class instead of the fixed-field for
fillfactor, referring to the aclitem.
---
ITAGAKI Takahiro
NTT Open Source Software Center
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
the implementation should be well considered to avoid lock contentions.
Comments are welcome.
---
ITAGAKI Takahiro
NTT OSS Center
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
.
- ALTER TABLE/INDEX name SET (...)
I appreciate any comments.
---
ITAGAKI Takahiro
NTT OSS Center
---(end of broadcast)---
TIP 6: explain analyze is your friend
Space Map (Heikki Linnakangas)
http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php
| vacuuming pages one by one as they're written by bgwriter
Thank you for reading till the last.
I'd like to hear your comments.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
room for discussion on this idea.
Comments are welcome.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
access methods (btree, hash and gist) have conception of
fillfactors, but static bitmap index or something may not have it.
I see that we should give priority to the design.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast
* leaf_free_percent)
When leaf_free_percent is 10%, node_free_percent is 30%. They are the same
values of the current implementation.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 5: don't forget to increase your free
remember their fillfactors when they are created?
The last fillfactors will be used on next reindex.
- Is fillfactor useful for hash and gist indexes?
I think hash does not need it, but gist might need it.
Look forward to your comments.
Thanks,
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
.
In fact, in my machine, the queue became full twice in a checkpoint and
length of the queue decreased from 65536 to *32* by duplicate eliminations.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 6: explain analyze
hope this problem will be solved by some methods.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
bgwriter-requests-queue-overflow.patch
Description: Binary data
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your
+ data | 7 + n | Compressed |
E | 0--- + 12 bytes| 13| External|
F | 1--- + 16 bytes| 17| External+Compressed |
('*' bits are used for length, '-' are unused.)
Comments welcome,
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
[ 5] (padding)
[ 1] c char
[ 3] (padding)
[ 4] i int4
the size of tuple (patched) is 32 bytes
[27] HeapTupleHeader
[ 1] c char
[ 4] i int4
Is this effective? Or are there some problems?
I'll appreciate any comments.
Thanks,
---
ITAGAKI Takahiro
NTT Cyber Space
to a separately
created header.
Thanks, I didn't consider it.
I'll check the cases and whether they can be resolved.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 6: explain analyze is your friend
).
I'll appreciate any comments.
Thanks,
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do
| 36-- 28 + 2 + 2 + 2 + 2
...
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
varlena2.patch
Description: Binary data
---(end of broadcast)---
TIP 6: explain analyze is your friend
several text types and we will have to maintain them.
Or were you planning this to handle VARCHAR(6) and the like?
If the new text type wins VARCHAR in many respects,
I'd like to propose to replace VARCHAR with it.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
) /
MAXALIGN(offsetof(HeapTupleHeaderData, t_bits) + sizeof(ItemIdData)) + 1)
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command
)))
Also, is this something that should be in a common header file? If so
which one? BLCKSZ, HeapTupleHeaderData, and ItemIdData are all defined
in different places ...
Considering include-hierarchy, I think bufpage.h is a good place.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
| 320
tuple_percent | 87.78
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 823628 -- + 8byte * 10 (whole tuples)
free_percent | 22.59
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
xmincut.patch
Description: Binary data
-load.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 6: explain analyze is your friend
+ HEAP_XMIN_INVALID
has currently no meaning, right? If so, HEAP_FROZEN can be assigned here.
Also, t_natts is currently 16-bits, but it can be cut to 11-bits
because MaxTupleAttributeNumber is 1664 2^11.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast
in
http://archives.postgresql.org/pgsql-hackers/2005-03/msg00518.php
I'll try to mark tuples with LP_DELETE on visibility checking and
recycle the pages by bgwriter.
...However it is still a stage of an idea.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end
tests.
Why do index access methods use LP_DELETE?
Does this change make troubles?
(However, I guess there is no advantage in the change,
because unused items are not recycled until next vacuum.)
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast
on.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
xlog.dio.diff
Description: Binary data
xlog.gw.diff
Description: Binary data
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs
files. writeback-cache is always on.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 8: explain analyze is your friend
is saving memory. O_DIRECT gives
a hint that OS should not cache WAL files. Without direct io, OS might make
a effort to cache WAL files, which will never be used, and might discard
data file cache.
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories
---(end of broadcast
reuse them until next VACUUM.
But I think it is a problem that FSMs keep being scanned after they are almost
empty.
In such a case, most stored pages are touched whenever new pages
are requested. I intended to cut FSMs earlier in order to omit the scans.
---
ITAGAKI Takahiro
NTT Cyber Space
.
Is this useful?
---
ITAGAKI Takahiro [EMAIL PROTECTED]
NTT Cyber Space Laboratories
Nippon Telegraph and Telephone Corporation.
freespace.diff
Description: Binary data
---(end of broadcast)---
TIP 8: explain analyze is your friend
and open_direct set:
Throughput: 3489.69
---
ITAGAKI Takahiro [EMAIL PROTECTED]
NTT Cyber Space Laboratories
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister
and it worked properly, but I don't have IA64...
BTW, I found memory leak in BootStrapXLOG(). The buffer allocated by malloc()
is not free()ed. ISSUE_BOOTSTRAP_MEMORYLEAK in this patch points out it.
(But this leak is not serious, because this function is called only once.)
ITAGAKI Takahiro
, ILP64, or LLP64?
If you used LLP64, I think the cause is buffer alignment routine
because of sizeof(long) != sizeof(void*).
I'll fix it soon...
ITAGAKI Takahiro
---(end of broadcast)---
TIP 8: explain analyze is your friend
backends can write same contents later
even if the backend in XLogWrite is crushed.
Sincerely,
ITAGAKI Takahiro
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
.
Sincerely,
ITAGAKI Takahiro
-- pgbench result --
$ ./pgbench -s 100 -c 50 -t 400
- 8.0.0 default + fsync:
tps = 20.630632 (including connections establishing)
tps = 20.636768 (excluding connections establishing)
- multipage-writer + open_direct:
tps = 33.761917 (including connections
Excuse me.
I resend the patch with diff -c.
On Tue, 25 Jan 2005 10:30:01 +0100
Michael Paesold [EMAIL PROTECTED] wrote:
ITAGAKI Takahiro wrote:
I think that there is room for improvement in WAL.
Here is a patch for it.
I think you should resend your patch as a context diff (diff -c
801 - 888 of 888 matches
Mail list logo