AW: [HACKERS] selecting from cursor

2001-07-03 Thread Zeugswetter Andreas SB

  That's gonna have to be fixed.  If you're not up for it, don't implement
  this.  Given that cursors (are supposed to) support FETCH BACKWARDS,
  I really don't see why they shouldn't be expected to handle ReScan...
 I thought only scrollable cursors can do that. What if cursor isn't
 scrollable? Should it error during the execution?

In PostgreSQL, all cursors are scrollable. The allowed grammar keyword is
simply ignored. I am actually not sure that this is optimal, since there
are a few very effective optimizations, that you can do if you know, that 
ReScan is not needed (like e.g. not storing the result temporarily).

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] selecting from cursor

2001-07-03 Thread Tom Lane

Alex Pilosov [EMAIL PROTECTED] writes:
 And what are you doing with the places that don't care which kind of RTE
 they are dealing with (which is most of them IIRC)?  While you haven't

 They just have things declared as RangeTblEntry *, and as long as they
 don't access type-specific fields, they are fine.

So you have four (soon to be six or seven) different structs that *must*
have the same fields?  I don't think that's cleaner than a union ...
at the very least, declare it as structs containing RangeTblEntry,
similar to the way the various Plan node types work (see plannodes.h).

 For scrollable cursors, Rescan should be implemented as 'scroll backwards
 until you can't scroll no more', correct?

No, it should be implemented as Rescan.  The portal mechanism needs to
expose the Rescan call for the contained querytree.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: AW: [HACKERS] selecting from cursor

2001-07-03 Thread Tom Lane

Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
 this.  Given that cursors (are supposed to) support FETCH BACKWARDS,
 I really don't see why they shouldn't be expected to handle ReScan...
 I thought only scrollable cursors can do that. What if cursor isn't
 scrollable? Should it error during the execution?

 In PostgreSQL, all cursors are scrollable. The allowed grammar keyword is
 simply ignored. I am actually not sure that this is optimal, since there
 are a few very effective optimizations, that you can do if you know, that 
 ReScan is not needed (like e.g. not storing the result temporarily).

It's worse than that: we don't distinguish plans for cursors from plans
for any other query, hence *all* query plans are supposed to be able to
run backwards.  (In practice, a lot of them don't work :-(.)  Someday
that needs to be improved.  It would be good if the system understood
whether a particular plan node would ever be asked to rescan itself or
run backwards, and could optimize things on that basis.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Re: Buffer access rules, and a probable bug

2001-07-03 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I believe that nbtree.c's btbuild() code is currently in violation of
 these rules, because it calls HeapTupleSatisfiesNow() while holding a
 pin but no lock on the containing buffer.

 OK, we had better avoid using heapam routines in btbuild() ? 

On further thought, btbuild is not that badly broken at the moment,
because CREATE INDEX acquires ShareLock on the relation, so there can be
no concurrent writers at the page level.  Still, it seems like it'd be a
good idea to do LockBuffer(buffer, BUFFER_LOCK_SHARE) here, and
probably also to invoke HeapTupleSatisfiesNow() via the
HeapTupleSatisfies() macro so that infomask update is checked for.
Vadim, what do you think?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] selecting from cursor

2001-07-03 Thread Tom Lane

Alex Pilosov [EMAIL PROTECTED] writes:
 On Tue, 3 Jul 2001, Tom Lane wrote:
 So you have four (soon to be six or seven) different structs that *must*
 have the same fields?  I don't think that's cleaner than a union ...

 Please see my diffs. Its implemented via #define to declare all common
 fields. 
 #define RTE_COMMON_FIELDS \
 NodeTag type; \
 [etc]

I don't think that technique is cleaner than a union, either ;-).
The macro definition is a pain in the neck: you have to play games with
semicolon placement, most tools won't autoindent it nicely, etc etc.

But the main point is that I think NodeType = RangeTblEntry with
a separate subtype field is a better way to go than making a bunch of
different NodeType values.  When most of the fields are common, as in
this case, it's going to be true that many places only want to know
is it a rangetable entry or not?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] Re: Buffer access rules, and a probable bug

2001-07-03 Thread Mikheev, Vadim

 On further thought, btbuild is not that badly broken at the moment,
 because CREATE INDEX acquires ShareLock on the relation, so
 there can be no concurrent writers at the page level. Still, it
 seems like it'd be a good idea to do LockBuffer(buffer,
BUFFER_LOCK_SHARE)
 here, and probably also to invoke HeapTupleSatisfiesNow() via the
 HeapTupleSatisfies() macro so that infomask update is checked for.
 Vadim, what do you think?

Looks like there is no drawback in locking buffer so let's lock it.

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] [OT] Any major users of postgresql?

2001-07-03 Thread Bill Studenmund

On Tue, 3 Jul 2001, Joe Brenner wrote:

 The reason I'm asking is that the place that I work is
 actually contemplating reverting from Oracle's expensive
 bugs to MySQL's (supposedly) cheap ones.  They'd consider
 postgresql, but they figure that with MySQL they can at
 least point to sites that pump a fair amount of data with it
 (e.g. mp3.com).
 
 Please help save me from a life without referential
 integrity... 

sourceforge for one. They were using MySQL, then changed. Also, look at
the postgres web site - there is an article there were someome did a speed
comparison between PG  MySQL. Postgres came out on top, even in places
where folks thought MySQL would win.

Also, it depends on what your application is. If there is any amount of DB
updates, PG will easily be the best choice. :-)

Take care,

Bill


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] selecting from cursor

2001-07-03 Thread Tom Lane

Alex Pilosov [EMAIL PROTECTED] writes:
 True true. On other hand, unlike union, its automatically typechecked, you
 cannot by mistake reference a field you shouldn't be referencing.

Only true to the extent that you have cast a generic pointer to the
correct type to begin with.  However, we've probably wasted more time
arguing the point than it's really worth.

I would suggest leaving off the final semicolon in the macro definition
so that you can write

typedef struct RangeTblEntryRelation
{
RTE_COMMON_FIELDS;
/* Fields valid for a plain relation RTE */
char   *relname;/* real name of the relation */
Oid relid;  /* OID of the relation */

Without this, tools like pgindent will almost certainly mess up these
struct declarations (I know emacs' C mode will get it wrong...)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-03 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 I added some codes into HandleDeadLock to measure how long
 LockLockTable and DeadLOckCheck calls take. Followings are the result
 in running pgbench -c 1000 (it failed with stuck spin lock
 error). real time shows how long they actually run (using
 gettimeofday). user time and system time are measured by calling
 getrusage. The time unit is milli second.

  LockLockTable: real time

  min |  max   |avg
 -++---
0 | 867873 | 152874.9015151515

  LockLockTable: user time

  min | max | avg  
 -+-+--
0 |  30 | 1.2121212121

  LockLockTable: system time

  min | max  |  avg   
 -+--+
0 | 2140 | 366.5909090909


  DeadLockCheck: real time

  min |  max  |   avg   
 -+---+-
0 | 87671 | 3463.6996197719

  DeadLockCheck: user time

  min | max |  avg  
 -+-+---
0 | 330 | 14.2205323194

  DeadLockCheck: system time

  min | max | avg  
 -+-+--
0 | 100 | 2.5095057034

Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.
I have to suppose that the problem is (once again) our inefficient
spinlock code.

If you think about it, on a typical platform where processes waiting for
a time delay are released at a clock tick, what's going to be happening
is that a whole lot of spinblocked processes will all be awoken in the
same clock tick interrupt.  The first one of these that gets to run will
acquire the spinlock, if it's free, and the rest will go back to sleep
and try again at the next tick.  This could be highly unfair depending
on just how the kernel's scheduler works --- for example, one could
easily believe that the waiters might be awoken in process-number order,
in which case backends with high process numbers might never get to
acquire the spinlock, or at least would have such low probability of
winning that they are prone to stuck spinlock timeout.

We really need to look at replacing the spinlock mechanism with
something more efficient.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Re: Backup and Recovery

2001-07-03 Thread Rod Taylor

 With stock PostgreSQL... how many committed transactions can one
lose
 on a simple system crash/reboot? With Oracle or Informix, the answer
 is zero. Is that true with PostgreSQL in fsync mode? If not, does it
 lose all in the log, or just those not yet written to the DB?

With WAL the theory is that it will not lose a committed transaction.
Bugs have plagged previous versions (7.1.2 looks clean) and it none
(Oracle, Informix, Postgres) can protect against coding errors in the
certain cases but from general power failure it's fine.

This assumes adequate hardware too.  Some harddrives claim to have
written when they haven't among other things, but Postgres itself
won't lose the information -- your hardware might :do that silently
though.)


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] New SQL Datatype RECURRINGCHAR

2001-07-03 Thread Rod Taylor

This is rather like MySQL's enum.  I still opt for the join, and if
you like make a view for those who don't want to know the data
structure.
--
Rod Taylor

Your eyes are weary from staring at the CRT. You feel sleepy. Notice
how restful it is to watch the cursor blink. Close your eyes. The
opinions stated above are yours. You cannot imagine why you ever felt
otherwise.

- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, June 29, 2001 6:05 PM
Subject: [HACKERS] New SQL Datatype RECURRINGCHAR


 Idea for a new SQL Data Type:

  RECURRINGCHAR

 The idea with RECURRINGCHAR is treated exactly like a VARCHAR in
it's
 usage.  However, it's designed for table columns that store a small
set of
 repeated values (=256 values). This allows for a great deal of
savings in
 the storage of the data.

 Example:

   Query:
 select count(*) from order
   Returns:
 100,000

   Query:
 select distinct status from order
   Returns:
 OPEN
 REWORK
 PLANNED
 RELEASED
 FINISHED
 SHIPPED

 It's apparent that there is a lot of duplicate space used in the
storage
 of this information.  The idea is if order.status was stored as a
 RECURRINGCHAR
 then the only data stored for the row would be a reference to the
value of
 the column. The actual values would be stored in a separate lookup
table.

 Advantages:

  - Storage space is optimized.

  - a query like:

 select distinct {RECURRINGCHAR} from {table}

can be radically optimized

  - Eliminates use of joins and extended knowledge of data
relationships
   for adhoc users.

 This datatype could be extended to allow for larger sets of repeated
 values:

  RECURRINGCHAR1 (8-bit)   up to 256 unique column values
  RECURRINGCHAR2 (16-bit)  up to 65536 unique column values

 Reasoning behind using 'long reference values':

 It is often an advantage to actually store an entire word
representing a
 business meaning as the value of a column (as opposed to a reference
 number or mnemonic abbreviation ).  This helps to make the system
 'self documenting' and adds value to users who are performing adhoc
 queries on the database.

 
 David Bennett
 President - Bensoft
 912 Baltimore, Suite 200
 Kansas City, MO  64105




 ---(end of
broadcast)---
 TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Buffer access rules, and a probable bug

2001-07-03 Thread Tom Lane

[EMAIL PROTECTED] (Nathan Myers) writes:
 On Mon, Jul 02, 2001 at 09:40:25PM -0400, Tom Lane wrote:
 4. It is considered OK to update tuple commit status bits (ie, OR the
 values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
 HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
 pin on a buffer.  This is OK because another backend looking at the tuple
 at about the same time would OR the same bits into the field, so there
 is little or no risk of conflicting update; what's more, if there did
 manage to be a conflict it would merely mean that one bit-update would
 be lost and need to be done again later.

 Without looking at the code, this seems mad.  Are you sure?

Yes.  Those status bits aren't ground truth, only hints.  They cache the
results of looking up transaction status in pg_log; if they get dropped,
the only consequence is the next visitor to the tuple has to do the
lookup over again.

Changing any other bits in t_infomask requires exclusive lock, however.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] funny (cache (?)) bug in postgres (7.x tested)

2001-07-03 Thread Tom Lane

RISKO Gergely [EMAIL PROTECTED] writes:
 I found a funny bug in postgres with c functions. (or feature??)

Scribbling on your input datum is verboten.  palloc a new value
to return.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] New SQL Datatype RECURRINGCHAR

2001-07-03 Thread Tom Lane

Rod Taylor [EMAIL PROTECTED] writes:
 This is rather like MySQL's enum.

Yes.  If we were going to do anything like this, I'd vote for stealing
the enum API, lock stock and barrel --- might as well be compatible.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Help with SI buffer overflow error

2001-07-03 Thread Tom Lane

Matthew [EMAIL PROTECTED] writes:
 NOTICE: RegisterSharedInvalid: SI buffer overflow
 NOTICE: InvalidateSharedInvalid: cache state reset

These are normal; at most they suggest that you've got another backend
sitting around doing nothing (but in an open transaction) while VACUUM
runs.

I think we finally got around to downgrading them to DEBUG messages
for 7.2.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Re: Buffer access rules, and a probable bug

2001-07-03 Thread Tom Lane

Okay, on to the next concern.  I've been thinking some more about the
restrictions needed to make the world safe for concurrent VACUUM.
I previously said:

 5. To physically remove a tuple or compact free space on a page, one
 must hold a pin and an exclusive lock, *and* observe while holding the
 exclusive lock that the buffer's shared reference count is one (ie,
 no other backend holds a pin).  If these conditions are met then no other
 backend can perform a page scan until the exclusive lock is dropped, and
 no other backend can be holding a reference to an existing tuple that it
 might expect to examine again.  Note that another backend might pin the
 buffer (increment the refcount) while one is performing the cleanup, but
 it won't be able to to actually examine the page until it acquires shared
 or exclusive lock.

This is OK when considering a page in isolation, but it does not get the
job done when one is deleting related index tuples and heap tuples.  It
seems to me that there *must* be some cross-page coupling to make that
work safely.  Otherwise you could have this scenario:

1. Indexscanning process visits an index tuple, decides to access the
   corresponding heap tuple, drops its lock on the index buffer page.

2. VACUUMing process visits the index buffer page and marks the index
   tuple dead.  (It won't try to delete the tuple yet, since it sees
   the pin still held on the page by process #1, but it can acquire
   exclusive lock and mark the tuple dead anyway.)

3. VACUUMing process is the first to acquire pin and lock on the heap
   buffer page.  It sees no other pin, so it deletes the tuple.

4. Indexscanning process finally acquires pin and lock on the heap page,
   tries to access what is now a gone tuple.  Ooops.  (Even if we
   made that not an error condition, it could be worse: what if a
   third process already reused the line pointer for a new tuple?)

It does not help to postpone the actual cleaning of an index or heap
page until its own pin count drops to zero --- the problem here is that
an indexscanner has acquired a reference into a heap page from the
index, but does not yet hold a pin on the heap page to ensure that
the reference stays good.  So we can't just postpone the cleaning of
the index page till it has pin count zero, we have to make the related
heap page(s)' cleanup wait for that to happen too.

I can think of two ways of guaranteeing that this problem cannot happen.

One is for an indexscanning process to retain its shared lock on the
index page until it has acquired at least a pin on the heap page.
This is very bad for concurrency --- it means that we'd typically
be holding indexpage shared locks for the time needed to read in a
randomly-accessed disk page.  And it's very complicated, since we still
need all the other rules, plus the mechanism for postponing cleanups
until pin count goes to zero.  It'd cause considerable changes to the
index access method API, too.

The other is to forget about asynchronous cleaning, and instead have the
VACUUM process directly do the wait for pin count zero, then clean the
index page.  Then when it does the same for the heap page, we know for
sure there are no indexscanners in transit to the heap page.  This would
be logically a lot simpler, it seems to me.  Another advantage is that
we need only one WAL entry per cleaned page, not two (one for the
initial mark-dead step and one for the postponable compaction step),
and there's no need for an intermediate gone but not forgotten state
for index tuples.

We could implement this in pretty nearly the same way as the mark for
cleanup facility that you partially implemented awhile back:
essentially, the cleanup callback would send a signal or semaphore
increment to the waiting process, which would then try to acquire pin
and exclusive lock on the buffer.  If it succeeded in observing pin
count 1 with exclusive lock, it could proceed with cleanup, else loop
back and try again.  Eventually it'll get the lock.  (It might take
awhile, but for a background VACUUM I think that's OK.)

What I'm wondering is if you had any other intended use for mark for
cleanup than VACUUM.  The cheapest implementation would allow only
one process to be waiting for cleanup on a given buffer, which is OK
for VACUUM because we'll only allow one VACUUM at a time on a relation
anyway.  But if you had some other uses in mind, maybe the code needs
to support multiple waiters.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-03 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:

 DeadLockCheck: real time
 
 min |  max  |   avg
 -+---+-
 0 | 87671 | 3463.6996197719
 
 DeadLockCheck: user time
 
 min | max |  avg
 -+-+---
 0 | 330 | 14.2205323194
 
 DeadLockCheck: system time
 
 min | max | avg
 -+-+--
 0 | 100 | 2.5095057034
 
 Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.

 Isn't the real time big ?

Yes, it sure is, but remember that the guy getting useful work done
(DeadLockCheck) is having to share the CPU with 999 other processes
that are waking up on every clock tick for just long enough to fail
to get the spinlock.  I think it's those useless process wakeups that
are causing the problem.

If you estimate that a process dispatch cycle is ~ 10 microseconds,
then waking 999 useless processes every 10 msec is just about enough
to consume 100% of the CPU doing nothing useful... so what should be
a few-millisecond check takes a long time, which makes things worse
because the 999 wannabees are spinning for that much more time.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 If you estimate that a process dispatch cycle is ~ 10 microseconds,
 then waking 999 useless processes every 10 msec is just about enough
 to consume 100% of the CPU doing nothing useful...

 Don't we back off the sleeps or was that code removed?

Not enough to affect this calculation.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Buffer access rules, and a probable bug

2001-07-03 Thread Tom Lane

[EMAIL PROTECTED] (Nathan Myers) writes:
 Here, backend B is a good citizen and locks while it makes its change.

No, backend B wasn't a good citizen: it should have been holding
exclusive lock on the buffer.

 Also, as hints, would it be Bad(tm) if an attempt to clear one failed?

Clearing hint bits is also an exclusive-lock-only operation.  Notice
I specified that *setting* them is the only case allowed to be done
with shared lock.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])