Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread YAMAMOTO Takashi
 - NetBSD: crashes under load; this could have been fixed but when I ran the
   benchmarks in 2012 none of the developers seemed to care.

do you mean this?
https://mail-index.netbsd.org/tech-kern/2012/08/29/msg013918.html

YAMAMOTO Takashi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-04-04 Thread YAMAMOTO Takashi
 On Fri, Apr 04, 2014 at 02:36:05AM +, YAMAMOTO Takashi wrote:
  Thanks.  To avoid socket path length limitations, I lean toward placing the
  socket temporary directory under /tmp rather than placing under the CWD:
  
  http://www.postgresql.org/message-id/flat/20121129223632.ga15...@tornado.leadboat.com
 
 openvswitch has some tricks to overcome the socket path length
 limitation using symlink.  (or procfs where available)
 iirc these were introduced for debian builds which use deep CWD.
 
 That's another reasonable approach.  Does it have a notable advantage over
 placing the socket in a subdirectory of /tmp?  Offhand, the security and
 compatibility consequences look similar.

an advantage is that the socket can be placed under CWD
and thus automatically obeys its directory permissions etc.

YAMAMOTO Takashi

 
 -- 
 Noah Misch
 EnterpriseDB http://www.enterprisedb.com
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-04-03 Thread YAMAMOTO Takashi
 Thanks.  To avoid socket path length limitations, I lean toward placing the
 socket temporary directory under /tmp rather than placing under the CWD:
 
 http://www.postgresql.org/message-id/flat/20121129223632.ga15...@tornado.leadboat.com

openvswitch has some tricks to overcome the socket path length
limitation using symlink.  (or procfs where available)
iirc these were introduced for debian builds which use deep CWD.

http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=lib/socket-
util.c;h=aa0c7196da9926de38b7388b8e28ead12e12913e;hb=HEAD

look at shorten_name_via_symlink and shorten_name_via_proc.

YAMAMOTO Takashi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYSV shared memory vs mmap performance

2013-02-05 Thread YAMAMOTO Takashi
hi,

 Hi,
 
 On Mon, Jan 28, 2013 at 03:27:28PM +, YAMAMOTO Takashi wrote:
 
 can you give me a pointer?
 
 This bug report for a start:
 http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=46833

thanks!

YAMAMOTO Takashi

 
 This is the only one I've filled; I also remember having irc discussions
 with some netbsd developers about the system getting unresponsive under
 load.
 
 -- 
 Francois Tigeot
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYSV shared memory vs mmap performance

2013-01-28 Thread YAMAMOTO Takashi
hi,

 I'm less optimistic on the NetBSD front: even though I reported major
 show-stopper bugs (system died under load and was unable to complete
 a pgbench run), no one seemed to care.

can you give me a pointer?

YAMAMOTO Takashi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] huge tlb support

2012-07-09 Thread YAMAMOTO Takashi
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the 
 mmap'ed
 memory.
 
 So, considering that there is required setup, it seems that the
 obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
 
 The other alternative is to try with MAP_HUGETLB and, if it fails, try
 again without MAP_HUGETLB.
 
 +1 for not making people configure this manually.
 
 Also, I was under the impression that recent Linux kernels use hugepages
 automatically if they can, so I wonder exactly what Andres was testing
 on ...

if you mean the trasparent hugepage feature, iirc it doesn't affect
MAP_SHARED mappings like this.

YAMAMOTO Takashi

 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] huge tlb support

2012-07-09 Thread YAMAMOTO Takashi
 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
 Also, I was under the impression that recent Linux kernels use hugepages
 automatically if they can, so I wonder exactly what Andres was testing
 on ...
 
 if you mean the trasparent hugepage feature, iirc it doesn't affect
 MAP_SHARED mappings like this.
 
 Oh!  That would explain some things.  It seems like a pretty nasty
 restriction though ... do you know why they did that?

i don't know.  simply because it wasn't trivial, i guess.
the feature was implemented for kvm's guest memory, which is
non-shared anonymous memory from POV of host kernel.

YAMAMOTO Takashi

 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gistVacuumUpdate

2012-01-18 Thread YAMAMOTO Takashi
hi,

 On 13.01.2012 06:24, YAMAMOTO Takashi wrote:
 hi,

 gistVacuumUpdate was removed when old-style VACUUM FULL was removed.
 i wonder why.
 it was not practical and REINDEX is preferred?

 anyway, the removal seems incomplete and there are some leftovers:
  F_TUPLES_DELETED
  F_DELETED
  XLOG_GIST_PAGE_DELETE
 
 Hmm, in theory we might bring back support for deleting pages in the 
 future, I'm guessing F_DELETED and the WAL record type were left in 
 place because of that. Either that, or it was an oversight. It's also 
 good to have the F_DELETED/F_TUPLES_DELETED around, so that new versions 
 don't get confused if they see those set in GiST indexes that originate 
 from an old cluster, upgraded to new version with pg_upgrade. For that 
 purpose, a comment explaining what those used to be would've been 
 enough, though.

the loop in gistvacuumcleanup to search F_DELETED pages seems too expensive
for pg_upgrade purpose.
(while it also checks PageIsNew, is it alone worth the loop?)

i'm wondering because what gistVacuumUpdate used to do does not seem to
be necessarily tied to the old-style VACUUM FULL.
currently, no one will re-union keys after tuple removals, right?

YAMAMOTO Takashi

 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM in SP-GiST

2012-01-17 Thread YAMAMOTO Takashi
hi,

 I started reading the spgist vacuum code last night, and didn't like it
 at all.  I found a number of smaller issues, but it seems to me that
 the design is just fundamentally wrong.  Unless I'm misunderstanding it,
 the approach is to recursively traverse the tree in sort of the same way
 that a full-index search would do, except that it continues to hold lock
 on the immediate parent page of whichever page is currently being
 visited, so that it can update the downlink if it has to delete the
 first leaf tuple of a chain.  I've got several complaints about that:
 
 1. Holding exclusive locks on upper pages while we visit
 possibly-hundreds of leaf pages seems pretty bad from a concurrency
 standpoint.  It doesn't hold locks all the way up to the root, so it's
 not a complete loss, but even a first-level inner page can have a lot of
 children.
 
 2. I do not trust this code to visit every leaf page (which, if it
 failed to do so and thereby missed deleting a heap reference, would
 result in a silently corrupt index).  Unlike a regular index search,
 which makes a list of lower-level targets while examining an inner tuple
 just once, this code depends on the assumption that it can reacquire
 lock on an upper page and then resychronize itself with whatever's been
 done meanwhile to the inner tuple it was looking at.  I don't trust
 that.  The mechanism that's being used relies on page LSN having been
 changed if anything was changed on the page, which does not work in
 unlogged tables.  Furthermore, if the inner tuple did change, it has to
 rescan *everything* that the inner tuple leads to.  That's horrendously
 inefficient, and it's not even clear that it would ever terminate in the
 face of a constant stream of concurrent updates.
 
 3. Even if it all worked, it would be slow as can be, because it
 requires a whole lot of nonsequential accesses to the index.  For
 instance, the same leaf page might be visited many times (once for
 each leaf chain on the page), and not necessarily close together either.
 
 Also, the code doesn't seem to perform any cleanup at all on inner
 pages.  I am not expecting it to try to get rid of childless inner
 tuples, but surely it ought to at least convert old redirects to dead
 and get rid of dead tuples at the end of the page, much as for leaf
 pages?
 
 What I'm envisioning doing instead is making a single sequential scan
 over the entire index.  On both leaf and inner pages we could clean
 redirects and remove end dead tuples.  On leaf pages we'd also check for
 tuples to be deleted.  There's no real difficulty in removing deleted
 tuples as long as they are not at the heads of their chains.  (The code
 would have to reverse-engineer which tuples are at the heads of their
 chains, but that's easily done while scanning the page; we just make a
 reverse-lookup map for the nextOffset pointers.)  If we have to delete a
 tuple at the head of its chain, but there's still at least one live
 tuple in its chain, we could reshuffle the tuples to bring another live
 tuple to that same offset number, thereby not invalidating the parent
 downlink.  The only hard part is what to do when we delete all members
 of a chain: we can't reset the parent downlink because we do not know
 where it is.  What I propose to do in that case is install a variant
 form of dead tuple that just indicates this chain is empty.  One way
 to represent that would be a redirect tuple pointing nowhere, but I
 think it would be cleaner to repurpose the isDead and isRedirect bits as
 a two-bit field with four states.  We'd have LIVE, DEAD, REDIRECT, and
 this fourth state for a dead tuple that is not recyclable because we
 know there's a link to it somewhere.  A scan would ignore such a tuple.
 An insertion that arrives at such a tuple could either replace it (if
 there's enough room on the page) or convert it to a redirect (if not).
 
 Last night I was thinking the fourth state could be named TOMBSTONE,
 but maybe it'd be better to use DEAD for this case and rename the
 existing removable dead tuple state to PLACEHOLDER, since that case
 only exists to preserve the offset numbers of other tuples on the
 page.
 
 Comments?

with a concurrent split moving leaf tuples between pages, can't it make
bulkdelete fail reporting some TIDs and make CREATE INDEX CONCURRENTLY
create duplicates?

YAMAMOTO Takashi

 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] gistVacuumUpdate

2012-01-12 Thread YAMAMOTO Takashi
hi,

gistVacuumUpdate was removed when old-style VACUUM FULL was removed.
i wonder why.
it was not practical and REINDEX is preferred?

anyway, the removal seems incomplete and there are some leftovers:
F_TUPLES_DELETED
F_DELETED
XLOG_GIST_PAGE_DELETE

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ordering op for WHERE

2011-12-28 Thread YAMAMOTO Takashi
hi,

does it make sense to teach the planner (and the executor?) use an ordering op
to optimize queries like the following?

select * from t where a - 1000  10

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread YAMAMOTO Takashi
hi,

 On Wed, Nov 30, 2011 at 1:37 AM, YAMAMOTO Takashi
 y...@mwd.biglobe.ne.jp wrote:
 Yes, I would expect that.  What kind of increase are you seeing?  Is
 it causing a problem for you, or are you just making an observation?

 i was curious because my application uses async commits mainly to
 avoid frequent fsync.  i have no numbers right now.
 
 Oh, that's interesting.  Why do you want to avoid frequent fsyncs?

simply because it was expensive on my environment.

 I
 thought the point of synchronous_commit=off was to move the fsyncs to
 the background, but not necessarily to decrease the frequency.

it makes sense.
but it's normal for users to abuse features. :)

YAMAMOTO Takashi

 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-30 Thread YAMAMOTO Takashi
hi,

 On Tue, Nov 29, 2011 at 1:42 AM, YAMAMOTO Takashi
 y...@mwd.biglobe.ne.jp wrote:
 On Mon, Nov 7, 2011 at 5:26 PM, Simon Riggs si...@2ndquadrant.com wrote:

 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.

 I extracted this from my current patch for you to test.

 is it expected to produce more frequent fsync?
 
 Yes, I would expect that.  What kind of increase are you seeing?  Is
 it causing a problem for you, or are you just making an observation?

i was curious because my application uses async commits mainly to
avoid frequent fsync.  i have no numbers right now.

YAMAMOTO Takashi

 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reduce null bitmap size

2011-11-11 Thread YAMAMOTO Takashi
hi,

 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
 how about making heap_form_tuple and its variants use smaller natts for
 tuples whose trailing columns are NULL?
 
 This idea has been proposed before, and rejected on the basis that it's
 unlikely to save enough to be worth the cycles needed to check for the
 case.  Keep in mind that you aren't going to save anything at all unless
 the bitmap length decreases across a MAXALIGN boundary.

but it also means that, if one-bit descrease happens to cross the boundary,
you can save a MAXALIGN, right?

btw, while playing with this, i noticed that pg_column_size+rowtype+aggregate
produces unstable results.  i guess it's because when nodeAgg handles
grp_firstTuple, the tuple is converted to a short varlena.  i don't know
if it can be a real problem than just a weird looking.

YAMAMOTO Takashi

test=# create table t (i int);
CREATE TABLE
test=# insert into t values (1),(2),(3);
INSERT 0 3
test=# select pg_column_size(t.*) from t;
 pg_column_size

 28
 28
 28
(3 rows)

test=# select string_agg(pg_column_size(t.*)::text,',') from t;
 string_agg

 25,28,28
(1 row)

test=#

 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] reduce null bitmap size

2011-11-09 Thread YAMAMOTO Takashi
hi,

how about making heap_form_tuple and its variants use smaller natts for
tuples whose trailing columns are NULL?  depending on the schema, it can
save considerable space.
the most of code are ready to deal with such tuples for ALTER TABLE ADD
COLUMN anyway.  (except system catalogs?)

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ProcArrayLock contention

2011-11-09 Thread YAMAMOTO Takashi
hi,

 I've been playing with the attached patch, which adds an additional
 light-weight lock mode, LW_SHARED2.  LW_SHARED2 conflicts with
 LW_SHARED and LW_EXCLUSIVE, but not with itself.  The patch changes
 ProcArrayEndTransaction() to use this new mode.  IOW, multiple
 processes can commit at the same time, and multiple processes can take
 snapshots at the same time, but nobody can take a snapshot while
 someone else is committing.
 
 Needless to say, I don't we'd really want to apply this, because
 adding a LW_SHARED2 mode that's probably only useful for ProcArrayLock
 would be a pretty ugly wart.  But the results are interesting.
 pgbench, scale factor 100, unlogged tables, Nate Boley's 32-core AMD
 box, shared_buffers = 8GB, maintenance_work_mem = 1GB,
 synchronous_commit = off, checkpoint_segments = 300,
 checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
 wal_writer_delay = 20ms, results are median of three five-minute runs:
 
 #clients tps(master) tps(lwshared2)
 1 657.984859 683.251582
 8 4748.906750 4946.069238
 32 10695.160555 17530.390578
 80 7727.563437 16099.549506
 
 That's a pretty impressive speedup, but there's trouble in paradise.
 With 80 clients (but not 32 or fewer), I occasionally get the
 following error:
 
 ERROR:  t_xmin is uncommitted in tuple to be updated
 
 So it seems that there's some way in which this locking is actually
 incorrect, though I'm not seeing what it is at the moment.  Either
 that, or there's some bug in the existing code that happens to be
 exposed by this change.
 
 The patch also produces a (much smaller) speedup with regular tables,
 but it's hard to know how seriously to take that until the locking
 issue is debugged.
 
 Any ideas?

latestCompletedXid got backward due to concurrent updates
and it fooled TransactionIdIsInProgress?

YAMAMOTO Takashi

 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tab stop in README

2011-08-31 Thread YAMAMOTO Takashi
 On Sun, Aug 28, 2011 at 8:28 PM, YAMAMOTO Takashi
 y...@mwd.biglobe.ne.jp wrote:
 On men, 2011-08-22 at 04:09 +, YAMAMOTO Takashi wrote:
 i know that postgresql uses ts=4 for C source code.
 but how about documatation?

 I'd say ideally don't use any tabs at all.

 i agree.

 It appears to be geared for ts=4.  Could you send a patch or other
 indication for what you think needs changing?

 attached.
 
 I'm confused by this patch, because it doesn't seem to get rid of all
 the tabs in the file.  Nor does it seem to replace tabs with spaces.
 It looks like it's just randomly removing and adding tabs in various
 places.

the patch just fixes indent for ts=4, keep using tabs.

should i run expand -t4 and send the result?

YAMAMOTO Takashi

 
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tab stop in README

2011-08-29 Thread YAMAMOTO Takashi
hi,

 On men, 2011-08-22 at 04:09 +, YAMAMOTO Takashi wrote:
 i know that postgresql uses ts=4 for C source code.
 but how about documatation?
 
 I'd say ideally don't use any tabs at all.

i agree.

 
 src/backend/access/transam/README seems to have both of
 ts=4 and ts=8 mixed.
 
 It appears to be geared for ts=4.  Could you send a patch or other
 indication for what you think needs changing?

attached.

YAMAMOTO Takashi

 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
diff --git a/src/backend/access/transam/README 
b/src/backend/access/transam/README
index eaac139..e866d9e 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -61,23 +61,23 @@ sequence:
 
 /  StartTransactionCommand;
/   StartTransaction;
-1)ProcessUtility;  BEGIN
+1)ProcessUtility;  BEGIN
\   BeginTransactionBlock;
 \  CommitTransactionCommand;
 
/   StartTransactionCommand;
-2) /   ProcessQuery;SELECT ...
-   \   CommitTransactionCommand;
+2) /   ProcessQuery;SELECT ...
+   \   CommitTransactionCommand;
\   CommandCounterIncrement;
 
/   StartTransactionCommand;
-3) /   ProcessQuery;INSERT ...
-   \   CommitTransactionCommand;
+3) /   ProcessQuery;INSERT ...
+   \   CommitTransactionCommand;
\   CommandCounterIncrement;
 
 /  StartTransactionCommand;
/   ProcessUtility;  COMMIT
-4)EndTransactionBlock;
+4)EndTransactionBlock;
\   CommitTransactionCommand;
 \  CommitTransaction;
 
@@ -100,9 +100,9 @@ Transaction aborts can occur in two ways:
 The reason we have to distinguish them is illustrated by the following two
 situations:
 
-   case 1  case 2
-   --  --
-1) user types BEGIN1) user types BEGIN
+   case 1  case 2
+   --  --
+1) user types BEGIN1) user types BEGIN
 2) user does something 2) user does something
 3) user does not like what 3) system aborts for some reason
she sees and types ABORT   (syntax error, etc)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tab stop in README

2011-08-22 Thread YAMAMOTO Takashi
hi,

i know that postgresql uses ts=4 for C source code.
but how about documatation?

src/backend/access/transam/README seems to have both of
ts=4 and ts=8 mixed.

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Large Objects versus transactional behavior

2011-05-13 Thread YAMAMOTO Takashi
hi,

 On Sat, Apr 30, 2011 at 2:58 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 This is related to the SIREAD lock versus ACCESS EXCLUSIVE lock
 thread, but seemed different enough to merit spinning off a new
 thread.

 Our shop hasn't used large objects so far because of the lack of
 security (until 9.1), so I never noticed the rather unusual
 transactional semantics of large objects.  From the devel
 documentation:

 http://developer.postgresql.org/pgdocs/postgres/lo-interfaces.html#LO-OPEN

 | [...] with INV_READ you cannot write on the descriptor, and the
 | data read from it will reflect the contents of the large object at
 | the time of the transaction snapshot that was active when lo_open
 | was executed, regardless of later writes by this or other
 | transactions. Reading from a descriptor opened with INV_WRITE
 | returns data that reflects all writes of other committed
 | transactions as well as writes of the current transaction. This is
 | similar to the behavior of REPEATABLE READ versus READ COMMITTED
 | transaction modes for ordinary SQL SELECT commands.

as a novice user who has been annoyed by them, i'm curious about
the rationale of the unusual semantics.
is there any chance to just make large objects obey the normal semantics
in future?

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-04-11 Thread YAMAMOTO Takashi
hi,

 hi,
 
 I think I see what is going on now. We are sometimes failing to set the
 commitSeqNo correctly on the lock. In particular, if a lock assigned to
 OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
 cleared.
 
 The attached patch corrects this:
  TransferPredicateLocksToNewTarget should initialize a new lock
  entry's commitSeqNo to that of the old one being transferred, or take
  the minimum commitSeqNo if it is merging two lock entries.
 
  Also, CreatePredicateLock should initialize commitSeqNo for to
  InvalidSerCommitSeqNo instead of to 0. (I don't think using 0 would
  actually affect anything, but we should be consistent.)
 
  I also added a couple of assertions I used to track this down: a
  lock's commitSeqNo should never be zero, and it should be
  InvalidSerCommitSeqNo if and only if the lock is not held by
  OldCommittedSxact.
 
 Takashi, does this patch fix your problem with leaked SIReadLocks?
 
 i'm currently running bf6848bc8c82e82f857d48185554bc3e6dcf1013 with this
 patch applied.  i haven't seen the symptom yet.  i'll keep it running for
 a while.

i haven't seen the symptom since them.  so i guess it was fixed by
your patch.  thanks!

YAMAMOTO Takashi

 
 btw, i've noticed the following message in the server log.  is it normal?
 
 LOG:  could not truncate directory pg_serial: apparent wraparound
 
 YAMAMOTO Takashi
 
 
 Dan
 
 
 -- 
 Dan R. K. Ports  MIT CSAILhttp://drkp.net/
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-04-08 Thread YAMAMOTO Takashi
hi,

 YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
  
 LOG:  could not truncate directory pg_serial: apparent
 wraparound
  
 Did you get a warning with this text?:
  
 memory for serializable conflict tracking is nearly exhausted

there is not such a warning near the above aparent wraparound record.
not sure if it was far before the record as i've lost the older log files.

YAMAMOTO Takashi

  
 If not, there's some sort of cleanup bug to fix in the predicate
 locking's use of SLRU. It may be benign, but we won't really know
 until we find it.  I'm investigating.
  
 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-04-07 Thread YAMAMOTO Takashi
hi,

 I think I see what is going on now. We are sometimes failing to set the
 commitSeqNo correctly on the lock. In particular, if a lock assigned to
 OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
 cleared.
 
 The attached patch corrects this:
  TransferPredicateLocksToNewTarget should initialize a new lock
  entry's commitSeqNo to that of the old one being transferred, or take
  the minimum commitSeqNo if it is merging two lock entries.
 
  Also, CreatePredicateLock should initialize commitSeqNo for to
  InvalidSerCommitSeqNo instead of to 0. (I don't think using 0 would
  actually affect anything, but we should be consistent.)
 
  I also added a couple of assertions I used to track this down: a
  lock's commitSeqNo should never be zero, and it should be
  InvalidSerCommitSeqNo if and only if the lock is not held by
  OldCommittedSxact.
 
 Takashi, does this patch fix your problem with leaked SIReadLocks?

i'm currently running bf6848bc8c82e82f857d48185554bc3e6dcf1013 with this
patch applied.  i haven't seen the symptom yet.  i'll keep it running for
a while.

btw, i've noticed the following message in the server log.  is it normal?

LOG:  could not truncate directory pg_serial: apparent wraparound

YAMAMOTO Takashi

 
 Dan
 
 
 -- 
 Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-03-29 Thread YAMAMOTO Takashi
hi,

 [no residual SIReadLock]

i read it as there are many (7057) SIReadLocks somehow leaked.
am i wrong?

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-03-28 Thread YAMAMOTO Takashi
hi,

 (6) Does the application continue to run relatively sanely, or
 does it fall over at this point?
 
 my application just exits on the error.
 
 if i re-run the application without rebooting postgres, it seems
 that i will get the error sooner than the first run. (but it might
 be just a matter of luck)
  
 If your application hits this again, could you check pg_stat_activity
 and pg_locks and see if any SIReadLock entries are lingering after
 the owning transaction and all overlapping transactions are
 completed?  If anything is lingering between runs of your
 application, it *should* show up in one or the other of these.

this is 71ac48fd9cebd3d2a873635a04df64096c981f73 with your two patches.
this psql session was the only activity to the server at this point.

hoge=# select * from pg_stat_activity;
-[ RECORD 1 ]+
datid| 16384
datname  | hoge
procpid  | 7336
usesysid | 10
usename  | takashi
application_name | psql
client_addr  | 
client_hostname  | 
client_port  | -1
backend_start| 2011-03-26 12:28:21.882226+09
xact_start   | 2011-03-28 11:55:19.300027+09
query_start  | 2011-03-28 11:55:19.300027+09
waiting  | f
current_query| select * from pg_stat_activity;

hoge=# select count(*) from pg_locks where mode='SIReadLock';
-[ RECORD 1 ]
count | 7057

hoge=# select locktype,count(*) from pg_locks group by locktype;
-[ RECORD 1 ]
locktype | virtualxid
count| 1
-[ RECORD 2 ]
locktype | relation
count| 1
-[ RECORD 3 ]
locktype | tuple
count| 7061

hoge=# 

  
 (7) The message hint would help pin it down, or a stack trace at
 the point of the error would help more. Is it possible to get
 either? Looking over the code, it appears that the only places
 that SSI could generate that error, it would cancel that
 transaction with the hint You might need to increase
 max_pred_locks_per_transaction. and otherwise allow normal
 processing.
 
 no message hints. these errors are not generated by SSI code,
 at least directly.
  
 Right, that's because we were using HASH_ENTER instead of
 HASH_ENTER_NULL.  I've posted a patch which should correct that.

sure, with your patch it seems that they turned into different ones.

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_MESSAGE_HINT: You might need to increase max_pred_locks_per_transaction.
PG_DIAG_SOURCE_FILE: predicate.c
PG_DIAG_SOURCE_LINE: 2049
PG_DIAG_SOURCE_FUNCTION: CreatePredicateLock

 Even with the above information it may be far from clear where
 allocations are going past their maximum, since one HTAB could
 grab more than its share and starve another which is staying below
 its maximum. I'll take a look at the possibility of adding a
 warning or some such when an HTAB expands past its maximum size.
  
 I see from your later post that you are running with this patch.  Has
 that reported anything yet?

i got nothing except the following one.  (in the server log)

WARNING:  hash table ShmemIndex has more entries than expected
DETAIL:  The maximum was set to 32 on creation.

YAMAMOTO Takashi

  
 Thanks,
  
 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-03-25 Thread YAMAMOTO Takashi
hi,

 YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
  
 thanks for quickly fixing problems.
  
 Thanks for the rigorous testing.  :-)
  
 i tested the later version
 (a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
 was out of shared memory.  i'm not sure if it was caused by SSI
 activities or not.
  
 PG_DIAG_SEVERITY: WARNING
 PG_DIAG_SQLSTATE: 53200
 PG_DIAG_MESSAGE_PRIMARY: out of shared memory
 PG_DIAG_SOURCE_FILE: shmem.c
 PG_DIAG_SOURCE_LINE: 190
 PG_DIAG_SOURCE_FUNCTION: ShmemAlloc
 
 PG_DIAG_SEVERITY: ERROR
 PG_DIAG_SQLSTATE: 53200
 PG_DIAG_MESSAGE_PRIMARY: out of shared memory
 PG_DIAG_SOURCE_FILE: dynahash.c
 PG_DIAG_SOURCE_LINE: 925
 PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value
  
 Nor am I.  Some additional information would help.
  
 (1)  Could you post the non-default configuration settings?

none.  it can happen with just initdb+createdb'ed database.

 (2)  How many connections are in use in your testing?

4.

 (3)  Can you give a rough categorization of how many of what types
 of transactions are in the mix?

all transactions are SERIALIZABLE.

no transactions are with READ ONLY.
(but some of them are actually select-only.)

 (4)  Are there any long-running transactions?

no.

 (5)  How many of these errors do you get in what amount of time?

once it start happening, i see them somehow frequently.

 (6)  Does the application continue to run relatively sanely, or does
 it fall over at this point?

my application just exits on the error.

if i re-run the application without rebooting postgres, it seems that
i will get the error sooner than the first run.  (but it might be just
a matter of luck)

 (7)  The message hint would help pin it down, or a stack trace at
 the point of the error would help more.  Is it possible to get
 either?  Looking over the code, it appears that the only places that
 SSI could generate that error, it would cancel that transaction with
 the hint You might need to increase
 max_pred_locks_per_transaction. and otherwise allow normal
 processing.

no message hints.  these errors are not generated by SSI code,
at least directly.
(please look at PG_DIAG_SOURCE_xxx in the above log.)

YAMAMOTO Takashi

 Even with the above information it may be far from clear where
 allocations are going past their maximum, since one HTAB could grab
 more than its share and starve another which is staying below its
 maximum.  I'll take a look at the possibility of adding a warning
 or some such when an HTAB expands past its maximum size.
  
 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-03-18 Thread YAMAMOTO Takashi
hi,

thanks for quickly fixing problems.
i tested the later version (a2eb9e0c08ee73208b5419f5a53a6eba55809b92)
and only errors i got was out of shared memory.  i'm not sure if
it was caused by SSI activities or not.

YAMAMOTO Takashi


the following is a snippet from my application log:

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: dynahash.c
PG_DIAG_SOURCE_LINE: 925
PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: dynahash.c
PG_DIAG_SOURCE_LINE: 925
PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: dynahash.c
PG_DIAG_SOURCE_LINE: 925
PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-02-22 Thread YAMAMOTO Takashi
hi,

 Kevin Grittner wrote:
  
 I'm proceeding on this basis.
  
 Result attached. I found myself passing around the tuple xmin value
 just about everywhere that the predicate lock target tag was being
 passed, so it finally dawned on me that this logically belonged as
 part of the target tag. That simplified the changes, and the net
 result of following Heikki's suggestion here is the reduction of
 total lines of code by 178 while adding coverage for missed corner
 cases and fixing bugs.
  
 Thanks again, Heikki!
  
 I will test this some more tomorrow.  So far I haven't done more than
 ensure it passes the standard regression tests and the isolation
 tests added for SSI.  The latter took awhile because the hash_any
 function was including uninitialized bytes past the length of the tag
 in its calculations.  We should probably either fix that or document
 it.  I had to add another couple bytes to the tag to get it to a four
 byte boundary to fix it.  Easy once you know that's how it works...
  
 The attached patch can also be viewed here:
  
 http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=46fd5ea6728b566c521ec83048bc00a207289dd9
  
 If this stands up to further testing, the only issue I know of for
 the 9.1 release is to update the documentation of shared memory usage
 to include the new structures.
  
 -Kevin

i tested ede45e90dd1992bfd3e1e61ce87bad494b81f54d + ssi-multi-update-1.patch
with my application and got the following assertion failure.

YAMAMOTO Takashi

Core was generated by `postgres'.
Program terminated with signal 6, Aborted.
#0  0xbbba4cc7 in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbbba4cc7 in _lwp_kill () from /usr/lib/libc.so.12
#1  0xbbba4c85 in raise (s=6) at /siro/nbsd/src/lib/libc/gen/raise.c:48
#2  0xbbba445a in abort () at /siro/nbsd/src/lib/libc/stdlib/abort.c:74
#3  0x0833d9f2 in ExceptionalCondition (
conditionName=0x8493f6d !(locallock != ((void *)0)), 
errorType=0x8370451 FailedAssertion, fileName=0x8493ec3 predicate.c, 
lineNumber=3657) at assert.c:57
#4  0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
at predicate.c:3657
#5  0x0827bd74 in CheckForSerializableConflictIn (relation=0x99b5bad4, 
tuple=0xbfbfcf78, buffer=234) at predicate.c:3772
#6  0x080a5be8 in heap_update (relation=0x99b5bad4, otid=0xbfbfd038, 
newtup=0x99a0aa08, ctid=0xbfbfd03e, update_xmax=0xbfbfd044, cid=1, 
crosscheck=0x0, wait=1 '\001') at heapam.c:2593
#7  0x081c81ca in ExecModifyTable (node=0x99a0566c) at nodeModifyTable.c:624
#8  0x081b2153 in ExecProcNode (node=0x99a0566c) at execProcnode.c:371
#9  0x081b0f82 in standard_ExecutorRun (queryDesc=0x99b378f0, 
direction=ForwardScanDirection, count=0) at execMain.c:1247
#10 0xbbaaf352 in pgss_ExecutorRun (queryDesc=0x99b378f0, 
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:541
#11 0xbbab5ee5 in explain_ExecutorRun (queryDesc=0x99b378f0, 
direction=ForwardScanDirection, count=0) at auto_explain.c:201
#12 0x08288ae3 in ProcessQuery (plan=0x99bb404c, 
sourceText=0x99b3781c UPDATE file SET ctime = $1 WHERE fileid = $2, 
params=value optimized out, dest=0x84d1f38, completionTag=0xbfbfd2f0 )
at pquery.c:197
#13 0x08288d0a in PortalRunMulti (portal=0x99b3f01c, 
isTopLevel=value optimized out, dest=dwarf2_read_address: Corrupted DWARF 
expression.
) at pquery.c:1268
#14 0x0828984a in PortalRun (portal=0x99b3f01c, count=2147483647, 
isTopLevel=1 '\001', dest=0x99b07428, altdest=0x99b07428, 
completionTag=0xbfbfd2f0 ) at pquery.c:822
#15 0x08286c22 in PostgresMain (argc=2, argv=0xbb9196a4, 
username=0xbb9195f8 takashi) at postgres.c:2004
#16 0x082413f6 in ServerLoop () at postmaster.c:3590
#17 0x082421a8 in PostmasterMain (argc=3, argv=0xbfbfe594) at postmaster.c:1110
#18 0x081e0d09 in main (argc=3, argv=0xbfbfe594) at main.c:199
(gdb) fr 4
#4  0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
at predicate.c:3657
3657Assert(locallock != NULL);
(gdb) list
3652
3653locallock = (LOCALPREDICATELOCK 
*)
3654
hash_search_with_hash_value(LocalPredicateLockHash,
3655
targettag, targettaghash,
3656
HASH_FIND, NULL);
3657Assert(locallock != NULL);
3658Assert(locallock-held);
3659locallock-held = false;
3660
3661if (locallock-childLocks == 0)
(gdb) 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref

Re: [HACKERS] SSI bug?

2011-02-17 Thread YAMAMOTO Takashi
hi,

 YAMAMOTO Takashi  wrote:
  
 with your previous patch or not?
  
 With, thanks.

i tried.  unfortunately i can still reproduce the original loop problem.

WARNING:  [0] target 0xbb51ef18 tag 4000:4017:7e3:78:0 prior 0xbb51f148 next 0xb
b51edb0
WARNING:  [1] target 0xbb51f148 tag 4000:4017:7e3:77:0 prior 0xbb51ef90 next 0xb
b51ef18
WARNING:  [2] target 0xbb51ef90 tag 4000:4017:7e3:74:0 prior 0xbb51edb0 next 0xb
b51f148
WARNING:  [3] target 0xbb51edb0 tag 4000:4017:7e3:71:0 prior 0xbb51ef18 next 0xb
b51ef90
WARNING:  found a loop

YAMAMOTO Takashi

  
 -Kevin
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-02-17 Thread YAMAMOTO Takashi
hi,

 YAMAMOTO Takashi  wrote:
  
 might be unrelated to the loop problem, but...
  
 Aha!  I think it *is* related.  There were several places where data
 was uninitialized here; mostly because Dan was working on this piece
 while I was working on separate issues which added the new fields.
 I missed the interaction on integrating the two efforts.  :-(
 The uninitialized fields could lead to all the symptoms you saw.
 I've reviewed, looking for other similar issues and didn't find any.
  
 Could you try the attached patch and see if this fixes the issues
 you've seen?

with your previous patch or not?

YAMAMOTO Takashi

  
 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-02-17 Thread YAMAMOTO Takashi
hi,

might be unrelated to the loop problem, but...

i got the following SEGV when runnning vacuum on a table.
(the line numbers in predicate.c is different as i have local modifications.)
oldlocktag.myTarget was NULL.
it seems that TransferPredicateLocksToNewTarget sometimes use stack garbage
for newpredlocktag.myTarget.  vacuum on the table succeeded with the attached
patch.  the latter part of the patch was necessary to avoid targetList
corruption, which later seems to make DeleteChildTargetLocks loop inifinitely.

YAMAMOTO Takashi

#0  0x0823cf6c in PredicateLockAcquire (targettag=0xbfbfa734)
at predicate.c:1835
#1  0x0823f18a in PredicateLockPage (relation=0x99b4dcf0, blkno=1259)
at predicate.c:2206
#2  0x080ac978 in _bt_search (rel=0x99b4dcf0, keysz=2, scankey=0x99a05040, 
nextkey=0 '\0', bufP=0xbfbfa894, access=1) at nbtsearch.c:97
#3  0x080a996d in _bt_pagedel (rel=0x99b4dcf0, buf=value optimized out, 
stack=0x0) at nbtpage.c:1059
#4  0x080aacc2 in btvacuumscan (info=0xbfbfbcc4, stats=0x99a01328, 
callback=0x8184d50 lazy_tid_reaped, callback_state=0x99a012e0, 
cycleid=13675) at nbtree.c:981
#5  0x080ab15c in btbulkdelete (fcinfo=0xbfbfb9e0) at nbtree.c:573
#6  0x082fde74 in FunctionCall4 (flinfo=0x99b86958, arg1=3217013956, arg2=0, 
arg3=135810384, arg4=2577404640) at fmgr.c:1437
#7  0x080a4fd0 in index_bulk_delete (info=0xbfbfbcc4, stats=0x0, 
callback=0x8184d50 lazy_tid_reaped, callback_state=0x99a012e0)
at indexam.c:738
#8  0x08184cd4 in lazy_vacuum_index (indrel=0x99b4dcf0, stats=0x99a023e0, 
vacrelstats=0x99a012e0) at vacuumlazy.c:938
#9  0x081854b6 in lazy_vacuum_rel (onerel=0x99b47650, vacstmt=0x99b059d0, 
bstrategy=0x99a07018, scanned_all=0xbfbfcfd8 ) at vacuumlazy.c:762
#10 0x08184265 in vacuum_rel (relid=16424, vacstmt=0x99b059d0, 
do_toast=1 '\001', for_wraparound=0 '\0', scanned_all=0xbfbfcfd8 )
at vacuum.c:978
#11 0x081845ea in vacuum (vacstmt=0x99b059d0, relid=0, do_toast=1 '\001', 
bstrategy=0x0, for_wraparound=0 '\0', isTopLevel=1 '\001') at vacuum.c:230
#12 0xbbab50c3 in pgss_ProcessUtility (parsetree=0x99b059d0, 
queryString=0x99b05018 vacuum (verbose,analyze) pgfs.dirent;, 
params=0x0, isTopLevel=1 '\001', dest=0x99b05b80, 
completionTag=0xbfbfd21a ) at pg_stat_statements.c:603
#13 0x082499ea in PortalRunUtility (portal=0x99b33018, utilityStmt=0x99b059d0, 
isTopLevel=1 '\001', dest=0x99b05b80, completionTag=0xbfbfd21a )
at pquery.c:1191
#14 0x0824a79e in PortalRunMulti (portal=0x99b33018, isTopLevel=4 '\004', 
dest=0x99b05b80, altdest=0x99b05b80, completionTag=0xbfbfd21a )
at pquery.c:1298
#15 0x0824b21a in PortalRun (portal=0x99b33018, count=2147483647, 
isTopLevel=1 '\001', dest=0x99b05b80, altdest=0x99b05b80, 
completionTag=0xbfbfd21a ) at pquery.c:822
#16 0x08247dc7 in exec_simple_query (
query_string=0x99b05018 vacuum (verbose,analyze) pgfs.dirent;)
at postgres.c:1059
#17 0x08248a79 in PostgresMain (argc=2, argv=0xbb912650, 
username=0xbb9125c0 takashi) at postgres.c:3943
#18 0x0820e231 in ServerLoop () at postmaster.c:3590
#19 0x0820ef88 in PostmasterMain (argc=3, argv=0xbfbfe59c) at postmaster.c:1110
#20 0x081b6439 in main (argc=3, argv=0xbfbfe59c) at main.c:199
(gdb) list
1830 
offsetof(PREDICATELOCK, xactLink));
1831
1832oldlocktag = predlock-tag;
1833Assert(oldlocktag.myXact == sxact);
1834oldtarget = oldlocktag.myTarget;
1835oldtargettag = oldtarget-tag;
1836
1837if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
1838{
1839uint32  oldtargettaghash;
(gdb) 
diff --git a/src/backend/storage/lmgr/predicate.c 
b/src/backend/storage/lmgr/predicate.c
index 722d0f8..4dde6ae 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -2537,8 +2558,8 @@ TransferPredicateLocksToNewTarget(const 
PREDICATELOCKTARGETTAG oldtargettag,
if (!found)
{
SHMQueueInit((newtarget-predicateLocks));
-   newpredlocktag.myTarget = newtarget;
}
+   newpredlocktag.myTarget = newtarget;
 
oldpredlock = (PREDICATELOCK *)
SHMQueueNext((oldtarget-predicateLocks),
@@ -2588,10 +2609,12 @@ TransferPredicateLocksToNewTarget(const 
PREDICATELOCKTARGETTAG oldtargettag,
outOfShmem = true;
goto exit;
}
-   SHMQueueInsertBefore((newtarget-predicateLocks),
-
(newpredlock-targetLink));
-   
SHMQueueInsertBefore((newpredlocktag.myXact-predicateLocks),
-
(newpredlock

Re: [HACKERS] SSI bug?

2011-02-13 Thread YAMAMOTO Takashi
hi,

 I wrote:
  
 it seems likely that such a cycle might be related to this new
 code not properly allowing for some aspect of tuple cleanup.
  
 I found a couple places where cleanup could let these fall through
 the cracks long enough to get stale and still be around when a tuple
 ID is re-used, causing problems.  Please try the attached patch and
 see if it fixes the problem for you.
  
 If it does, then there's no need to try to track the other things I
 was asking about.

thanks.  unfortunately the problem still happens with the patch.

YAMAMOTO Takashi

  
 Thanks!
  
 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-02-13 Thread YAMAMOTO Takashi
hi,

all of the following answers are with the patch you provided in
other mail applied.

 YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
  
 i have seen this actually happen.  i've confirmed the creation of
 the loop with the attached patch.  it's easily reproducable with
 my application. i can provide the full source code of my
 application if you want. (but it isn't easy to run unless you are
 familiar with the recent version of NetBSD)
 i haven't found a smaller reproducible test case yet.
  
 I've never used NetBSD, so maybe a few details will help point me in
 the right direction faster than the source code.
  
 Has your application ever triggered any of the assertions in the
 code?  (In particular, it would be interesting if it ever hit the
 one right above where you patched.)

the assertion right above is sometimes triggered.  sometimes not.

  
 How long was the loop?

see below.

 Did you notice whether the loop involved multiple tuples within a
 single page?

if i understand correctly, yes.

the following is a snippet of my debug code (dump targets when
triggerCheckTargetForConflictsIn loops 1000 times) and its output.
the same locktag_field3 value means the same page, right?

+   for (t = target, i = 0; t != NULL; i++) {
+   elog(WARNING, [%u] target %p tag % PRIx32 :% PRIx32 :% PRIx32
+   :% PRIx16 :% PRIx16  prior %p next %p, i, t,
+   t-tag.locktag_field1,
+   t-tag.locktag_field2,
+   t-tag.locktag_field3,
+   t-tag.locktag_field4,
+   t-tag.locktag_field5,
+   t-priorVersionOfRow,
+   t-nextVersionOfRow);
+   t = t-priorVersionOfRow;
+   if (t == target) {
+   elog(WARNING, found a loop);
+   break;
+   }
+   }

WARNING:  [0] target 0xbb51f238 tag 4000:4017:53b:6c:0 prior 0xbb51f350 next 
0xbb51f350
WARNING:  [1] target 0xbb51f350 tag 4000:4017:53b:69:0 prior 0xbb51f238 next 
0xbb51f238
WARNING:  found a loop

another sample:

WARNING:  [0] target 0xbb51f530 tag 4000:4017:565:ae:0 prior 0xbb51f1e8 next 
0xbb51f300
WARNING:  [1] target 0xbb51f1e8 tag 4000:4017:565:ad:0 prior 0xbb51f580 next 
0xbb51f530
WARNING:  [2] target 0xbb51f580 tag 4000:4017:565:ac:0 prior 0xbb51f300 next 
0xbb51f1e8
WARNING:  [3] target 0xbb51f300 tag 4000:4017:565:ab:0 prior 0xbb51f530 next 
0xbb51f580
WARNING:  found a loop

the table seems mostly hot-updated, if it matters.

hoge=# select * from pg_stat_user_tables where relid=16407;
-[ RECORD 1 ]-+
relid | 16407
schemaname| pgfs
relname   | file
seq_scan  | 0
seq_tup_read  | 0
idx_scan  | 53681
idx_tup_fetch | 52253
n_tup_ins | 569
n_tup_upd | 12054
n_tup_del | 476
n_tup_hot_upd | 12041
n_live_tup| 93
n_dead_tup| 559
last_vacuum   | 
last_autovacuum   | 
last_analyze  | 
last_autoanalyze  | 
vacuum_count  | 0
autovacuum_count  | 0
analyze_count | 4922528128875102208
autoanalyze_count | 7598807461784802080

(values in the last two columns seems bogus.
i don't know if it's related or not.)

 Did this coincide with an autovacuum of the table?

no.
(assuming that autovacuum=off in postgresql.conf is enough to exclude
the possibility.)

  
 These last two are of interest because it seems likely that such a
 cycle might be related to this new code not properly allowing for
 some aspect of tuple cleanup.
  
 Thanks for finding this and reporting it, and thanks in advance for
 any further detail you can provide.

thanks for looking.

YAMAMOTO Takashi

  
 -Kevin
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI bug?

2011-02-12 Thread YAMAMOTO Takashi
hi,

 YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
  
 it seems that PredicateLockTupleRowVersionLink sometimes create
 a loop of targets (it founds an existing 'newtarget' whose
 nextVersionOfRow chain points to the 'oldtarget') and it later
 causes CheckTargetForConflictsIn loop forever.
  
 Is this a hypothetical risk based on looking at the code, or have
 you seen this actually happen?  Either way, could you provide more
 details?  (A reproducible test case would be ideal.)

i have seen this actually happen.  i've confirmed the creation of the loop
with the attached patch.  it's easily reproducable with my application.
i can provide the full source code of my application if you want.
(but it isn't easy to run unless you are familiar with the recent
version of NetBSD)
i haven't found a smaller reproducible test case yet.

YAMAMOTO Takashi

  
 This being the newest part of the code, I'll grant that it is the
 most likely to have an unidentified bug; but given that the pointers
 are from one predicate lock target structure identified by a tuple
 ID to one identified by the tuple ID of the next version of the row,
 it isn't obvious to me how a cycle could develop.
  
 -Kevin
diff --git a/src/backend/storage/lmgr/predicate.c 
b/src/backend/storage/lmgr/predicate.c
index 722d0f8..3e1a3e2 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -2350,7 +2350,25 @@ PredicateLockTupleRowVersionLink(const Relation relation,
newtarget-nextVersionOfRow = NULL;
}
else
+   {
Assert(newtarget-priorVersionOfRow == NULL);
+#if 0
+   Assert(newtarget-nextVersionOfRow == NULL);
+#endif
+   if (newtarget-nextVersionOfRow != NULL) {
+   PREDICATELOCKTARGET *t;
+
+   elog(WARNING, new %p has next %p\n,
+   newtarget, newtarget-nextVersionOfRow);
+   for (t = newtarget-nextVersionOfRow; t != NULL;
+   t = t-nextVersionOfRow) {
+   if (oldtarget != t) {
+   elog(WARNING, creating a loop 
new=%p old=%p\n,
+   newtarget, oldtarget);
+   }
+   }
+   }
+   }
 
newtarget-priorVersionOfRow = oldtarget;
oldtarget-nextVersionOfRow = newtarget;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SSI bug?

2011-02-10 Thread YAMAMOTO Takashi
hi,

it seems that PredicateLockTupleRowVersionLink sometimes create
a loop of targets (it founds an existing 'newtarget' whose nextVersionOfRow
chain points to the 'oldtarget') and it later causes
CheckTargetForConflictsIn loop forever.

YAMAMOTO Takashi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] systable_getnext_ordered

2011-02-01 Thread YAMAMOTO Takashi
hi,

 I wrote:
 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
 after systable_getnext_ordered returned NULL, is it ok to call it again?
 
 I wouldn't rely on it working.
 
 i'm wondering because inv_truncate seems to do it and expecting NULL.
 
 Hmm, that may well be a bug.  Have you tested it?
 
 I looked at this a bit more closely, and basically the answer is that
 inv_truncate is accidentally failing to fail.  What will actually happen
 if systable_getnext_ordered is called another time, at least when using
 a btree index, is that it will start the same requested scan over again,
 ie deliver all the tuples it did the first time (which is none, in this
 case).  That's an implementation artifact, but since the behavior is
 undefined in the first place, it's not wrong.
 
 Now, if inv_truncate's initial call on systable_getnext_ordered returns
 NULL (ie, the truncation point is past the current EOF page), it will
 fall through to the Write a brand new page code, which will create and
 insert a partial page at the truncation point.  It then goes to the
 delete-all-remaining-pages loop.  Because that starts a fresh scan with
 the very same scan key conditions, you might expect that it would find
 and delete the page it just inserted --- causing the apparent EOF of the
 blob to be wrong afterwards.  It accidentally fails to do that because
 the new tuple postdates the snapshot it's scanning with.  So the loop
 terminates having found no matching tuples, and all is well.
 
 So this code is confusing, inefficient (performing a useless search of
 the index), only works because of an obscure consideration not explained
 in the comments, and sets a bad precedent for people to follow.  I'm
 going to go change it to explicitly not do the final loop if the initial
 search failed.  It's not a bug, exactly, but it's sure lousy coding.
 Thanks for pointing it out.

thanks for the quick investigation and fix.

the attached patch is to avoid unnecessary detoast'ing and EOF marker pages
when possible.  does it make sense?

YAMAMOTO Takashi

 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
diff --git a/src/backend/storage/large_object/inv_api.c 
b/src/backend/storage/large_object/inv_api.c
index 01e3492..60a9c69 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -178,10 +178,14 @@ myLargeObjectExists(Oid loid, Snapshot snapshot)
 static int32
 getbytealen(bytea *data)
 {
-   Assert(!VARATT_IS_EXTENDED(data));
-   if (VARSIZE(data)  VARHDRSZ)
-   elog(ERROR, invalid VARSIZE(data));
-   return (VARSIZE(data) - VARHDRSZ);
+   Size size;
+
+   size = toast_raw_datum_size(PointerGetDatum(data));
+   if (size  VARHDRSZ)
+   elog(ERROR, invalid size);
+   if (size  VARHDRSZ + LOBLKSIZE)
+   elog(ERROR, too large page);
+   return (size - VARHDRSZ);
 }
 
 
@@ -359,22 +363,12 @@ inv_getsize(LargeObjectDesc *obj_desc)
{
Form_pg_largeobject data;
bytea  *datafield;
-   boolpfreeit;
 
if (HeapTupleHasNulls(tuple))   /* paranoia */
elog(ERROR, null field found in pg_largeobject);
data = (Form_pg_largeobject) GETSTRUCT(tuple);
datafield = (data-data);  /* see note at top of 
file */
-   pfreeit = false;
-   if (VARATT_IS_EXTENDED(datafield))
-   {
-   datafield = (bytea *)
-   heap_tuple_untoast_attr((struct varlena *) 
datafield);
-   pfreeit = true;
-   }
lastbyte = data-pageno * LOBLKSIZE + getbytealen(datafield);
-   if (pfreeit)
-   pfree(datafield);
}
 
systable_endscan_ordered(sd);
@@ -724,8 +718,9 @@ inv_write(LargeObjectDesc *obj_desc, const char *buf, int 
nbytes)
 void
 inv_truncate(LargeObjectDesc *obj_desc, int len)
 {
-   int32   pageno = (int32) (len / LOBLKSIZE);
-   int off;
+   const int32 pageno = (int32) (len / LOBLKSIZE);
+   int32   reqpageno;
+   const int   off = len % LOBLKSIZE;  /* offset in the page */
ScanKeyData skey[2];
SysScanDesc sd;
HeapTuple   oldtuple;
@@ -741,6 +736,7 @@ inv_truncate(LargeObjectDesc *obj_desc, int len)
Datum   values[Natts_pg_largeobject];
boolnulls[Natts_pg_largeobject];
boolreplace[Natts_pg_largeobject];
+   boolprevpagefull;
CatalogIndexState indstate;
 
Assert(PointerIsValid(obj_desc));
@@ -770,10 +766,20 @@ inv_truncate(LargeObjectDesc *obj_desc, int len

Re: [HACKERS] [NOVICE] systable_getnext_ordered

2011-02-01 Thread YAMAMOTO Takashi
hi,

thanks for taking a look.

 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
 the attached patch is to avoid unnecessary detoast'ing and EOF marker pages
 when possible.  does it make sense?
 
 The blob page size is already chosen not to allow for out-of-line
 storage, not to mention that pg_largeobject doesn't have a TOAST table.
 So I think avoiding detoasting is largely a waste of time.

doesn't detoasting involve decompression?

 I'm
 unexcited about the other consideration too --- it looks to me like it
 just makes truncation slower, more complicated, and hence more
 bug-prone, in return for a possible speedup that probably nobody will
 ever notice.

slower?  it depends, i guess.

my primary motivation of that part of the patch was to save some space for
certain workloads.  (besides that, leaving unnecessary rows isn't neat,
but it might be a matter of taste.)

YAMAMOTO Takashi

 
   regards, tom lane
 
 -- 
 Sent via pgsql-novice mailing list (pgsql-nov...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-novice

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers