date:20010309

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Karel Zak


On Thu, Mar 08, 2001 at 09:00:09PM -0500, Tom Lane wrote:
  On Thu, Mar 08, 2001 at 11:49:50PM +0100, Peter Eisentraut wrote:
  I really feel that translated error messages need to happen soon.
 
 Agreed.

 Yes, error codes is *very* wanted feature.

 
   ERROR: Attribute "foo" not found  -- basic message for dumb frontends
   ERRORCODE: UNREC_IDENT  -- key for finding localized message
   PARAM1: foo -- something to embed in the localized message
   MESSAGE: Attribute or table name not known within context of query
   CODELOC: src/backend/parser/parse_clause.c line 345
   QUERYLOC: 22

 Great idea! I agree that we need some powerful Error protocol instead 
currect string based messages.
 
 For transaltion to other languages I not sure with gettext() stuff on
backend -- IMHO better (faster) solution will postgres system catalog
with it.

 May be add new command too: SET MESSAGE_LANGUAGE TO xxx, because
wanted language not must be always same as locale setting.

 Something like elog(ERROR, gettext(...)); is usable, but not sounds good 
for me.

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
 http://home.zf.jcu.cz/~zakkr/
 
 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

AW: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace

2001-03-09 Thread Zeugswetter Andreas SB



  Even with true fdatasync it's not obviously good for performance - it takes
  too long time to write 16Mb files and fills OS buffer cache with trash-:(
 
 True.  But at least the write is (hopefully) being done at a
 non-performance-critical time.

So you have non critical time every five minutes ?
Those platforms that don't have fdatasync won't profit anyway.
Even if the IO is done by postmaster the write to the disk has a severe
impact on concurrent other disk activity.
In a real 5 minutes checkpoint setup we are seriously talking about
48 Mb at least, or you risc foreground log creation. On systems I know,
that means 100% busying the disk for at least 8 seconds.

Andreas

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Tatsuo Ishii


  For transaltion to other languages I not sure with gettext() stuff on
 backend -- IMHO better (faster) solution will postgres system catalog
 with it.
 
  May be add new command too: SET MESSAGE_LANGUAGE TO xxx, because
 wanted language not must be always same as locale setting.

In the multibyte enabled environment, that kind of command would not
be necessary except UNICODE and MULE_INTERNAL, since they are
multi-lingual encoding. For them, we might need something like:

SET LANGUAGE_PREFERENCE TO 'Japanese';

For the long term solutuon, this kind of problem should be solved in
the implemetaion of SQL-92/99 i18n features.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] WAL SHM principles

2001-03-09 Thread Martin Devera


  BTW, what means "bummer" ?
 
 Sorry, it means, "Oh, I am disappointed."

thanks :)

  But for many OSes you CAN control when to write data - you can mlock
  individual pages.
 
 mlock() controls locking in physical memory.  I don't see it controling
 write().

When you mmap, you don't use write() !
mlock actualy locks page in memory and as long as the page is locked
the OS doesn't attempt to store the dirty page. It is intended also
for security app to ensure that sensitive data are not written to unsecure
storage (hdd). It is definition of mlock so that you can be probably sure
with it.

There is way to do it without mlock (fallback):
You definitely need some kind of page headers. The header should has info
whether the page can be mmaped or is in "dirty pool". Pages in dirty pool
are pages which are dirty but not written yet and are waiting to
appropriate log record to be flushed. When log is flushed the data at
dirty pool can be copied to its regular mmap location and discarded.

If dirty pool is too large, simply sync log and whole pool can be
discarded.

mmap version could be faster when loading data from hdd and will result in
better utilization of memory (because you are directly working with data
at OS' page-cache instead of having duplicates in pg's buffer cache).
Also page cache expiration is handled by OS and it will allow pg to use as
much memory as is available (no need to specify buffer page size).

devik


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

RE: [HACKERS] WAL SHM principles

2001-03-09 Thread Martin Devera


  Pros: upper layers can think thet buffers are always safe/logged and
there is no special handling for indices; very simple/fast redo
  Cons: can't implement undo - but in non-overwriting is not needed (?)
 
 But needed if we want to get rid of vacuum and have savepoints.

Hmm. How do you implement savepoints ? When there is rollback to savepoint
do you use xlog to undo all changes which the particular transaction has
done ? Hmmm it seems nice ... these resords are locked by such transaction
so that it can safely undo them :-)
Am I right ?

But how can you use xlog to get rid of vacuum ? Do you treat all delete
log records as candidates for free space ?

regards, devik


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: AW: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace

2001-03-09 Thread Tom Lane


Zeugswetter Andreas SB  [EMAIL PROTECTED] writes:
 Even with true fdatasync it's not obviously good for performance - it takes
 too long time to write 16Mb files and fills OS buffer cache with trash-:(
 
 True.  But at least the write is (hopefully) being done at a
 non-performance-critical time.

 So you have non critical time every five minutes ?
 Those platforms that don't have fdatasync won't profit anyway.

Yes they will; you're forgetting the cost of updating filesystem overhead.

Suppose that we do not preallocate the log files.  Each WAL fsync will
require a write of the added data block(s), plus a write of at least one
indirect block to record the allocation of new blocks to the file, plus
a write of the file's inode, plus a write of the cylinder group's
free-space bitmap.  It takes extremely lucky placement of the file and
indirect blocks to achieve less than four seeks per WAL block written.
Total cost to write a 16MB file: roughly eight thousand seeks, assuming
8K block size.  Even if we consider it safe to use fdatasync in this
scenario, it will save only one of the four seeks, since the indirect
block and freespace map *must* be updated regardless.

Now consider the preallocation approach.  In the preallocation phase,
we write like mad and then fsync the file ONCE.  This means *one* write
of each affected data block, indirect block, freespace map block, and
the inode, versus one write of each data block and circa two thousand
writes of the others.  Furthermore the kernel is free to schedule these
writes in some reasonable fashion, and so we may hope that something
less than two thousand seeks will be used to do it.

Then we come to the phase of actually writing the file.  No indirect
block or freespace bitmap updates will occur.  On a machine that
implements fdatasync, we write data blocks and nothing else.  One
seek per block written, possibly no seeks if the layout is good.
Even if we don't have fdatasync, it's only two seeks per block written
(the block and the inode only).  So, at worst four thousand seeks in
this phase, at best much less than two thousand.

Bottom line is that it should take fewer seeks overall to do it this
way, even on a machine without fdatasync, and even if we don't get to
count any benefit from doing a large part of the work outside the
critical path of transaction commit.

Also, given that modern systems *do* have fdatasync, I do not see why
we should not optimize for that case.

It is true that prezeroing the file will tend to fill the kernel's disk
cache with entirely useless blocks.  I don't know of any portable way
around that, but even an unportable way might be worth #ifdefing in on
platforms where it works.  Does anyone know a way of suppressing caching
of outgoing blocks, or flushing them from the kernel's cache right away?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Peter Eisentraut


Nathan Myers writes:

  elog(ERROR, "XYZ01", gettext("stuff happened"));

 Similar approaches have been tried frequently, and even enshrined
 in standards (e.g. POSIX catgets), but have almost always proven too
 cumbersome.  The problem is that keeping programs that interpret the
 numeric code in sync with the program they monitor is hard, and trying
 to avoid breaking all those secondary programs hinders development on
 the primary program.

That's why no one uses catgets and everyone uses gettext.

 Furthermore, assigning code numbers is a nuisance, and they add
 uninformative clutter.

The error codes are exactly what we want, to allow client programs (as
opposed to humans) to identify the errors.  The code in my example has
nothing to do with the message id in the catgets interface.

 It's better to scan the program for elog() arguments, and generate
 a catalog by using the string itself as the index code.  Those
 maintaining the secondary programs can compare catalogs to see what
 has been broken by changes and what new messages to expect.  elog()
 itself can (optionally) invent tokens (e.g. catalog indices) to help
 out those programs.

That's what gettext does for you.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Peter Eisentraut


Tom Lane writes:

 There's a difficult tradeoff to make here, but I think we do want to
 distinguish between the "official error code" --- the thing that has
 translations into various languages --- and what the backend is actually
 allowed to print out.  It seems to me that a fairly large fraction of
 the unique messages found in the backend can all be lumped under the
 category of "internal error", and that we need to have only one official
 error code and one user-level translated message for the lot of them.

That's exactly what I was trying to avoid.  You'd still be allowed to
choose the error message text freely, but client programs will be able to
make sense of them by looking at the code only, as opposed to parsing the
message text.  I'm trying to avoid making the message text to be computed
from the error code, because that obscures the source code.

 Another thing that's bothered me for a long time is our inconsistent
 approach to determining where in the code a message comes from.  A lot
 of the messages currently embed the name of the generating routine right
 into the error text.  Again, we ought to separate the functionality:
 the source-code location is valuable but ought not form part of the
 primary error message.  I would like to see elog() become a macro that
 invokes __FILE__ and __LINE__ to automatically make the *exact* source
 code location become part of the secondary error information, and then
 drop the convention of using the routine name in the message text.

These sort of things have been on my mind as well, but they're really
independent of my issue.  We can easily have runtime options to append or
not additional things to the error string.  I don't see this as part of my
proposal.

 Another thing that I missed in Peter's proposal is how we are going to
 cope with messages that include parameters.  Surely we do not expect
 gettext to start with 'Attribute "foo" not found' and distinguish fixed
 from variable parts of that string?

Sure we do.

 That would mean we'd have to dump all the info into a single string,
 which is doable but would perhaps look pretty ugly:

   ERROR: Attribute "foo" not found  -- basic message for dumb frontends
   ERRORCODE: UNREC_IDENT  -- key for finding localized message

There should not be a "key" to look up localized messages.  Remember that
the localization will also have to be done in all the front-end programs.
Surely we do not wish to make a list of messages that pg_dump or psql
print out.  Gettext takes care of this stuff.  The only reason why we need
error codes is for the sake of ease of interpreting by programs.

   PARAM1: foo -- something to embed in the localized message

Not necessary.

   MESSAGE: Attribute or table name not known within context of query

How's that different from ERROR:?

   CODELOC: src/backend/parser/parse_clause.c line 345

Can be appended to ERROR (or MESSAGE) depending on configuration setting.

   QUERYLOC: 22

Not all errors are related to a query.

The general problem here is also that this would introduce a client
incompatibility.  Older clients that do not expect this amount of detail
will print all this garbage to the screen?

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Tom Lane


Peter Eisentraut [EMAIL PROTECTED] writes:
 That's exactly what I was trying to avoid.  You'd still be allowed to
 choose the error message text freely, but client programs will be able to
 make sense of them by looking at the code only, as opposed to parsing the
 message text.  I'm trying to avoid making the message text to be computed
 from the error code, because that obscures the source code.

I guess I don't understand what you have in mind, because this seems
self-contradictory.  If "client programs can look at the code only",
then how can the error message text be chosen independently of the code?

 Surely we do not expect gettext to start with 'Attribute "foo" not
 found' and distinguish fixed from variable parts of that string?

 Sure we do.

How does that work exactly?  You're assuming an extremely intelligent
localization mechanism, I guess, which I was not.  I think it makes more
sense to work a little harder in the backend to avoid requiring AI
software in every frontend.

 MESSAGE: Attribute or table name not known within context of query

 How's that different from ERROR:?

Sorry, I meant that as an example of the "secondary message string", but
it's a pretty lame example...

 The general problem here is also that this would introduce a client
 incompatibility.  Older clients that do not expect this amount of detail
 will print all this garbage to the screen?

Yes, if we send it to them.  It would make sense to control the amount
of detail presented via some option (a GUC variable, probably).  For
backwards compatibility reasons we'd want the default to correspond to
roughly the existing amount of detail.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] 7.0.3 Bitset dumping

2001-03-09 Thread Peter Eisentraut


Christopher Kings-Lynne writes:

 In case anyone cares, there is a bug in pg_dump in 7.0.3 when using the bit
 fields.

There are no bit fields in 7.0, at least no officially.  7.1 does support
them.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Peter Eisentraut


Tom Lane writes:

 I guess I don't understand what you have in mind, because this seems
 self-contradictory.  If "client programs can look at the code only",
 then how can the error message text be chosen independently of the code?

Let's say "type mismatch error", code 2200G acc. to SQL.  At one place in
the source you write

elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...);

Elsewhere you'd write

elog(ERROR, "2200G", "type mismatch in argument %d of function %s,
 expected %s, got %s", ...);

Humans can look at this and have a fairly good idea what they'd need to
fix.  However, a client program currently only has the option of failing
or not failing.  In this example case it would probably better for it to
fail, but someone else already put forth the example of constraint
violation.  In this case the program might want to do something else.

  Surely we do not expect gettext to start with 'Attribute "foo" not
  found' and distinguish fixed from variable parts of that string?

  Sure we do.

 How does that work exactly?  You're assuming an extremely intelligent
 localization mechanism, I guess, which I was not.  I think it makes more
 sense to work a little harder in the backend to avoid requiring AI
 software in every frontend.

Gettext takes care of this.  In the source you'd write

elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
string, string);

When you run the xgettext utility program it scans the source for cases of
gettext(...) and creates message catalogs for the translators.  When it
finds printf arguments it automatically includes marks in the message,
such as

"type mismatch in CASE expression (%1$s vs %2$s)"

which the translator better keep in his version.  This also handles the
case where the arguments might have to appear in a different order in a
different language.

 Sorry, I meant that as an example of the "secondary message string", but
 it's a pretty lame example...

I guess I'm not sold on the concept of primary and secondary message
strings.  If the primary message isn't good enough you better fix that.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Tom Lane


Peter Eisentraut [EMAIL PROTECTED] writes:
 Let's say "type mismatch error", code 2200G acc. to SQL.  At one place in
 the source you write
 elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...);
 Elsewhere you'd write
 elog(ERROR, "2200G", "type mismatch in argument %d of function %s,
  expected %s, got %s", ...);

Okay, so your notion of an error code is not a localizable entity at
all, it's something for client programs to look at.  Now I get it.

I object to writing "2200G" however, because that has no mnemonic value
whatever, and is much too easy to get wrong.  How about

elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s,
 expected %s, got %s", ...);

where ERR_TYPE_MISMATCH is #defined as "2200G" someplace?  Or for that
matter #defined as "TYPE_MISMATCH"?  Content-free numeric codes are no
fun to use on the client side either...

 Gettext takes care of this.  In the source you'd write

 elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
 string, string);

Duh.  For some reason I was envisioning the localization substitution as
occurring on the client side, but of course we'd want to do it on the
server side, and before parameters are substituted into the message.
Sorry for the noise.

I am not sure we can/should use gettext (possible license problems?),
but certainly something like this could be cooked up.

 Sorry, I meant that as an example of the "secondary message string", but
 it's a pretty lame example...

 I guess I'm not sold on the concept of primary and secondary message
 strings.  If the primary message isn't good enough you better fix that.

The motivation isn't so much to improve on the primary message as to
reduce the number of distinct strings that really need to be translated.
Remember all those internal "can't happen" errors.  If we have only one
message component then the translator is faced with a huge pile of
internal messages and not a lot of gain from translating them.  If
there's a primary and secondary component then all the internal messages
can share the same primary component ("Internal error, please file a bug
report").  Now the translator translates that one message, and can
ignore the many secondary-component messages with a clear conscience.
(Of course, he can translate those too if he really wants to, but the
point is that he doesn't *have* to do it to attain reasonably friendly
behavior.)

Perhaps another way to look at it is that we have a bunch of errors that
are user-oriented (ie, relate pretty directly to something the user did
wrong) and another bunch that are system-oriented (relate to internal
problems, such as consistency check failures or violations of internal
APIs).  We want to provide localized translations of the first set, for
sure.  I don't think we need localized translations of the second set,
so long as we have some sort of "covering message" that can be localized
for them.  Maybe instead of "primary" and "secondary" strings for a
single error, we ought to distinguish these two categories of error and
plan different localization strategies for them.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Query not using index, please explain.

2001-03-09 Thread Tom Lane


Richard Poole [EMAIL PROTECTED] writes:
 [ snip a bunch of commentary about optimizer statistics ]
 Can someone who really knows this stuff (Tom?) step in if what I've
 just said is completely wrong?

Looked good to me.

 select domain from history_entries group by domain;
 
 To me, since there is an index on domain, it seems like this should be a 
 rather fast thing to do?  It takes a *very* long time, no matter if I turn 
 seqscan on or off.

 The reason this is slow is that Postgres always has to look at heap
 tuples, even when it's been sent there by indexes. This in turn is
 because of the way the storage manager works (only by looking in the
 heap can you tell for sure whether a tuple is valid for the current
 transaction). So a "group by" always has to look at every heap tuple
 (that hasn't been eliminated by a where clause). "select distinct"
 has the same problem. I don't think there's a way to do what you
 want here with your existing schema without a sequential scan over
 the table.

But this last I object to.  You certainly could use an index scan here,
it's just that it's not very likely to be faster.  The way that Postgres
presently does GROUP BY and SELECT DISTINCT is to sort the tuples into
order and then combine/eliminate adjacent duplicates.  (If you've ever
used a "sort | uniq" pipeline in Unix then you know the principle.)
So the initial requirement for this query is to scan the history_entries
tuples in order by domain.  We can do this either by a sequential scan
and explicit sort, or by an index scan using an ordered index on domain.
It turns out that unless the physical order of the tuples is pretty
close to their index order, the index-scan method is actually slower,
because it results in a lot of random-access thrashing.  But neither way
is exactly speedy.

One thing that's on the to-do list is to look at reimplementing these
operations using in-memory hash tables, with one hash entry per distinct
value of the GROUP/DISTINCT columns.  Then you can just do a sequential
scan, with no sort, and as long as there aren't so many distinct values
as to make the hash table overflow memory you're going to win.  However
until we have statistics that can give us some idea how many distinct
values there might be, there's no way for the planner to make an
intelligent choice between this way and the sort/uniq way...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] PQfinish(const PGconn *conn) question

2001-03-09 Thread Boulat Khakimov


Hi,

I'm wondering how safe it is to pass an uninitialized conn to
PQfinish

I have set a SIGALRM handler that terminates my daemon and closes
connection
to Postgres if time-out happens. However, Im setting the alarm few times
before the connection to Postgres was established... which means that 
conn pointer in not initialized yet and it will be passed to PQfinish in
my
SIGALRM handler. Im not sure how Postgres will handle that so far
that
doesnt seem to be causing any errors but Im not sure what's actually
happening
behind the scenes...

Any input would be appreciated.

Regards,
Boulat Khakimov

-- 
Nothing Like the Sun

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

AW: AW: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace

2001-03-09 Thread Zeugswetter Andreas SB



  Even with true fdatasync it's not obviously good for performance - it takes
  too long time to write 16Mb files and fills OS buffer cache 
 with trash-:(
  
  True.  But at least the write is (hopefully) being done at a
  non-performance-critical time.
 
  So you have non critical time every five minutes ?
  Those platforms that don't have fdatasync won't profit anyway.
 
 Yes they will; you're forgetting the cost of updating 
 filesystem overhead.

I did have that in mind, but I thought that in effect the OS would 
optimize sparse file allocation somehow.
Doing some tests however showed that while your variant is really good
and saves 12 seconds, the performance is *very* poor for eighter variant.

A short test shows, that opening the file O_SYNC, and thus avoiding fsync()
would cut the effective time needed to sync write the xlog more than in half.
Of course we would need to buffer = 1 xlog page before write (or commit)
to gain the full advantage.

prewrite 0 + write and fsync:   60.4 sec
sparse file + write with O_SYNC:37.5 sec
no prewrite + write with O_SYNC:36.8 sec
prewrite 0 + write with O_SYNC: 24.0 sec

These times include the prewrite when applicable on AIX with jfs.
Testprogram attached. I may be overseeing something, though.

Andreas


 tfsync.c


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: AW: AW: [HACKERS] WAL does not recover gracefully from out-of-dis k-sp ace

2001-03-09 Thread Tom Lane


Zeugswetter Andreas SB  [EMAIL PROTECTED] writes:
 A short test shows, that opening the file O_SYNC, and thus avoiding fsync()
 would cut the effective time needed to sync write the xlog more than in half.
 Of course we would need to buffer = 1 xlog page before write (or commit)
 to gain the full advantage.

 prewrite 0 + write and fsync: 60.4 sec
 sparse file + write with O_SYNC:  37.5 sec
 no prewrite + write with O_SYNC:  36.8 sec
 prewrite 0 + write with O_SYNC:   24.0 sec

This seems odd.  As near as I can tell, O_SYNC is simply a command to do
fsync implicitly during each write call.  It cannot save any I/O unless
I'm missing something significant.  Where is the performance difference
coming from?

The reason I'm inclined to question this is that what we want is not an
fsync per write but an fsync per transaction, and we can't easily buffer
all of a transaction's XLOG writes...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] PQfinish(const PGconn *conn) question

2001-03-09 Thread Tom Lane


Boulat Khakimov [EMAIL PROTECTED] writes:
 I'm wondering how safe it is to pass an uninitialized conn to
 PQfinish

You can pass a NULL pointer to PQfinish safely, if that's what you
meant.  Passing a pointer to uninitialized memory seems like a bad
idea.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

AW: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace

2001-03-09 Thread Zeugswetter Andreas SB



  This seems odd.  As near as I can tell, O_SYNC is simply a command to do
  fsync implicitly during each write call.  It cannot save any I/O unless
  I'm missing something significant.  Where is the performance difference
  coming from?
 
  Yes, odd, but sure very reproducible here.
 
 I tried this on HPUX 10.20, which has not only O_SYNC but also O_DSYNC

AIX has O_DSYNC (which is _FDATASYNC) too, but I assumed O_SYNC 
would be more portable. Now we have two, maybe it is more widespread
than I thought.

 I attach my modified version of Andreas' program.  Note I do 
 not believe his assertion that close() implies fsync() --- on the machines I've
 used, it demonstrably does not sync.

Ok, I am not sure, but essentially do we need it to sync ? The OS sure isn't
supposed to notice after closing the file, that it ran out of disk space.

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] porting question: funky uid names?

2001-03-09 Thread Mark Bixby


Hi pgsql-hackers,

I'm currently porting 7.0.3 to the HP MPE/iX OS to join my other ports of
Apache, BIND, sendmail, Perl, and others.  I'm at the point where I'm trying to
run the "make runcheck" regression tests, and I've just run into a problem
where I need to seek the advice of psql-hackers.

MPE is a proprietary OS with a POSIX layer on top.  The concept of POSIX uids
and gids has been mapped to the concept of MPE usernames and MPE accountnames. 
An example MPE username would be "MGR.BIXBY", and if you do a POSIX
getpwuid(getuid()), the contents of pw_name will be the same "MGR.BIXBY".

The fact that pw_name contains a period on MPE has been confusing to some
previous ports I've done, and it now appears PostgreSQL is being confused too. 
Make runcheck is dying in the initdb phase:

Creating global relations in /blah/blah/blah
ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
syntax error 25 : - .

I'm guessing that something tried to parse "MGR.BIXBY", saw the decimal point
character and passed the string to pg_atoi() thinking it's a number instead of
a name.  This seems like a really bad omen hinting at trouble on a fundamental
level.

What are my options here?

1) I'm screwed; go try porting MySQL instead.  ;-)

2) Somehow modify username parsing to be tolerant of the "." character?  I was
able to do this when I ported sendmail.  Where should I be looking in the
PostgreSQL source?  Is this going to require language grammar changes?

3) Always specify numeric uids instead of user names.  Is this even possible?

Your advice will be greatly appreciated.  MPE users are currently whining on
their mailing list about the lack of standard databases for the platform, and I
wanted to surprise them by releasing a PostgreSQL port.

Thanks!
-- 
[EMAIL PROTECTED]
Remainder of .sig suppressed to conserve scarce California electrons...

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: AW: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Tom Lane


Zeugswetter Andreas SB  [EMAIL PROTECTED] writes:
 I attach my modified version of Andreas' program.  Note I do not
 believe his assertion that close() implies fsync() --- on the
 machines I've used, it demonstrably does not sync.

 Ok, I am not sure, but essentially do we need it to sync? The OS sure isn't
 supposed to notice after closing the file, that it ran out of disk space.

I believe that out-of-space would be reported during the writes, anyway,
so that's not the issue.

The point of fsync'ing after the prewrite is to ensure that the indirect
blocks are down on disk.  If you trust fdatasync (or O_DSYNC) to write
indirect blocks then it's not necessary --- but I'm pretty sure I heard
somewhere that some versions of fdatasync fail to guarantee that.

In any case, the real point of the prewrite is to move work out of the
transaction commit path, and so we're better off if we can sync the
indirect blocks during prewrite.

 I tried this on HPUX 10.20, which has not only O_SYNC but also O_DSYNC

 AIX has O_DSYNC (which is _FDATASYNC) too, but I assumed O_SYNC 

Oh?  What speeds do you get if you use that?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Tom Lane


Mark Bixby [EMAIL PROTECTED] writes:
 MPE is a proprietary OS with a POSIX layer on top.  The concept of
 POSIX uids and gids has been mapped to the concept of MPE usernames
 and MPE accountnames.  An example MPE username would be "MGR.BIXBY",
 and if you do a POSIX getpwuid(getuid()), the contents of pw_name will
 be the same "MGR.BIXBY".

Hm.  And what is returned in pw_uid?

I think you are getting burnt by initdb's attempt to assign the postgres
superuser's numeric ID to be the same as the Unix userid number of the
user running initdb.  Look at the uses of pg_id in the initdb script,
and experiment with running pg_id by hand to see what it produces.

A quick and dirty experiment would be to run "initdb -i 42" (or
whatever) to override the result of pg_id.  If that succeeds, the
real answer may be that pg_id needs a patch to behave reasonably on MPE.

Let us know...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Andrew Evans


 Peter Eisentraut [EMAIL PROTECTED] writes:
  Let's say "type mismatch error", code 2200G acc. to SQL.  At one place in
  the source you write
  elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...);

Tom Lane [EMAIL PROTECTED] spake:
 I object to writing "2200G" however, because that has no mnemonic value
 whatever, and is much too easy to get wrong.  How about
 
 elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s,
  expected %s, got %s", ...);
 
 where ERR_TYPE_MISMATCH is #defined as "2200G" someplace?  Or for that
 matter #defined as "TYPE_MISMATCH"?  Content-free numeric codes are no
 fun to use on the client side either...

This is one thing I think VMS does well.  All error messages are a
composite of the subsystem where they originated, the severity of the
error, and the actual error itself.  Internally this is stored in a
32-bit word.  It's been a long time, so I don't recall how many bits
they allocated for each component.  The human-readable representation
looks like "subsystem-severity-error".

--
Andrew Evans

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Nathan Myers


On Fri, Mar 09, 2001 at 12:05:22PM -0500, Tom Lane wrote:
  Gettext takes care of this.  In the source you'd write
 
  elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
  string, string);
 
 Duh.  For some reason I was envisioning the localization substitution as
 occurring on the client side, but of course we'd want to do it on the
 server side, and before parameters are substituted into the message.
 Sorry for the noise.
 
 I am not sure we can/should use gettext (possible license problems?),
 but certainly something like this could be cooked up.

I've been assuming that PG's needs are specialized enough that the
project wouldn't use gettext directly, but instead something inspired 
by it.  

If you look at my last posting on the subject, by the way, you will see 
that it could work without a catalog underneath; integrating a catalog 
would just require changes in a header file (and the programs to generate 
the catalog, of course).  That quality seems to me essential to allow the 
changeover to be phased in gradually, and to allow different underlying 
catalog implementations to be tried out.

Nathan
ncm

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Mark Bixby




Tom Lane wrote:
 
 Mark Bixby [EMAIL PROTECTED] writes:
  MPE is a proprietary OS with a POSIX layer on top.  The concept of
  POSIX uids and gids has been mapped to the concept of MPE usernames
  and MPE accountnames.  An example MPE username would be "MGR.BIXBY",
  and if you do a POSIX getpwuid(getuid()), the contents of pw_name will
  be the same "MGR.BIXBY".
 
 Hm.  And what is returned in pw_uid?

A valid numeric uid.

 I think you are getting burnt by initdb's attempt to assign the postgres
 superuser's numeric ID to be the same as the Unix userid number of the
 user running initdb.  Look at the uses of pg_id in the initdb script,
 and experiment with running pg_id by hand to see what it produces.

pg_id without parameters returns uid=484(MGR.BIXBY), which matches what I get
from MPE's native id command.

The pg_id -n and -u options behave as expected.

 A quick and dirty experiment would be to run "initdb -i 42" (or
 whatever) to override the result of pg_id.  If that succeeds, the
 real answer may be that pg_id needs a patch to behave reasonably on MPE.

I just hacked src/test/regress/run_check.sh to invoke initdb with --show.  The
user name/id is behaving "correctly" for an MPE machine:

SUPERUSERNAME:  MGR.BIXBY
SUPERUSERID:484

The initdb -i option will only override the SUPERUSERID, but it's already
correct.
-- 
[EMAIL PROTECTED]
Remainder of .sig suppressed to conserve scarce California electrons...

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

RE: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace

2001-03-09 Thread Mikheev, Vadim


  Even with true fdatasync it's not obviously good for 
  performance - it takes too long time to write 16Mb files
  and fills OS buffer cache with trash-:(
 
 True.  But at least the write is (hopefully) being done at a
 non-performance-critical time.

There is no such hope: XLogWrite may be called from XLogFlush
(at commit time and from bufmgr on replacements) *and* from XLogInsert
- ie new log file may be required at any time.

  Probably, we need in separate process like LGWR (log 
  writer) in Oracle.
 
 I think the create-ahead feature in the checkpoint maker should be
 on by default.

I'm not sure - it increases disk requirements.

  I considered this mostly as hint for OS about how log file should be
  allocated (to decrease fragmentation). Not sure how OSes 
  use such hints but seek+write costs nothing.
 
 AFAIK, extant Unixes will not regard this as a hint at all; they'll
 think it is a great opportunity to not store zeroes :-(.

Yes, but if I would write file system then I wouldn't allocate space
for file block by block - I would try to pre-allocate more than required
by write(). So I hoped that seek+write is hint for OS: "Hey, I need in
16Mb file - try to make it as continuous as possible". Don't know does
it work, though -:)

 One reason that I like logfile fill to be done separately is that it's
 easier to convince ourselves that failure (due to out of disk space)
 need not require elog(STOP) than if we have the same failure during
 XLogWrite.  You are right that we don't have time to consider 
 each STOP in the WAL code, but I think we should at least look at
 that case...

What problem with elog(STOP) in the absence of disk space?
I think running out of disk is bad enough to stop DB operations.

Vadim

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

RE: AW: AW: [HACKERS] WAL does not recover gracefully from out-of-dis k-sp ace

2001-03-09 Thread Mikheev, Vadim


 The reason I'm inclined to question this is that what we want 
 is not an fsync per write but an fsync per transaction, and we can't 
 easily buffer all of a transaction's XLOG writes...

WAL keeps records in WAL buffers (wal-buffers parameter may be used to
increase # of buffers), so we can make write()-s buffered.

Seems that my Solaris has fdatasync, so I'll test different approaches...

Vadim

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

RE: [HACKERS] WAL SHM principles

2001-03-09 Thread Mikheev, Vadim


  But needed if we want to get rid of vacuum and have savepoints.
 
 Hmm. How do you implement savepoints ? When there is rollback 
 to savepoint do you use xlog to undo all changes which the particular 
 transaction has done ? Hmmm it seems nice ... these resords are locked by 
 such transaction so that it can safely undo them :-)
 Am I right ?

Yes, but there is no savepoints in 7.1 - hopefully in 7.2

 But how can you use xlog to get rid of vacuum ? Do you treat 
 all delete log records as candidates for free space ?

Vaccum removes deleted records *and* records inserted by aborted
transactions - last ones will be removed by UNDO.

Vadim

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace

2001-03-09 Thread Tom Lane


"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Seems that my Solaris has fdatasync, so I'll test different approaches...

A Sun guy told me that Solaris does this just the same way that HPUX
does it: fsync() scans all kernel buffers for the file, but O_SYNC
doesn't, because it knows it only needs to sync the blocks covered
by the write().  He didn't say about fdatasync/O_DSYNC but I bet the
same difference exists for those two.

The Linux 2.4 kernel allegedly is set up so that fsync() is smart enough
to only look at dirty buffers, not all the buffers of the file.  So
the performance tradeoffs would be different there.  But on HPUX and
probably Solaris, O_DSYNC is likely to be a big win, unless we can find
a way to stop the kernel from buffering so much of the WAL files.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Tom Lane


Mark Bixby [EMAIL PROTECTED] writes:
 I just hacked src/test/regress/run_check.sh to invoke initdb with
 --show.  The user name/id is behaving "correctly" for an MPE machine:

 SUPERUSERNAME:  MGR.BIXBY
 SUPERUSERID:484

Okay, so much for that theory.

Can you set a breakpoint at elog() and provide a stack backtrace so we
can see where this is happening?  I can't think where else in the code
might be affected, but obviously the problem is somewhere else...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace

2001-03-09 Thread Tom Lane


"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 True.  But at least the write is (hopefully) being done at a
 non-performance-critical time.

 There is no such hope: XLogWrite may be called from XLogFlush
 (at commit time and from bufmgr on replacements) *and* from XLogInsert
 - ie new log file may be required at any time.

Sure, but if we have create-ahead enabled then there's a good chance of
the log files being made by the checkpoint process, rather than by
working backends.  In that case the prefill is not time critical.

In any case, my tests so far show that prefilling and then writing with
O_SYNC or better O_DSYNC is in fact faster than not prefilling; this
matches pretty well the handwaving argument I gave Andreas this morning.
(With fsync() or fdatasync() it seems we're at the mercy of inefficient
kernel algorithms, a factor I didn't consider before.)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Peter Eisentraut


Tom Lane writes:

 I object to writing "2200G" however, because that has no mnemonic value
 whatever, and is much too easy to get wrong.  How about

 elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s,
  expected %s, got %s", ...);

 where ERR_TYPE_MISMATCH is #defined as "2200G" someplace?  Or for that
 matter #defined as "TYPE_MISMATCH"?  Content-free numeric codes are no
 fun to use on the client side either...

Well, SQL defines these.  Do we want to make our own list?  However,
numeric codes also have the advantage that some hierarchy is possible.
E.g., the "22" in "2200G" is actually the category code "data exception".
Personally, I would stick to the SQL codes but make some readable macro
name for backend internal use.

 I am not sure we can/should use gettext (possible license problems?),

Gettext is an open standard, invented at Sun IIRC.  There is also an
independent implementation for BSDs in the works.  On GNU/Linux system
it's in the C library.  I don't see any license problems that way.  Is has
been used widely for free software and so far I haven't seen any real
alternative.

 but certainly something like this could be cooked up.

Well, I'm trying to avoid having to do the cooking. ;-)

 Perhaps another way to look at it is that we have a bunch of errors that
 are user-oriented (ie, relate pretty directly to something the user did
 wrong) and another bunch that are system-oriented (relate to internal
 problems, such as consistency check failures or violations of internal
 APIs).  We want to provide localized translations of the first set, for
 sure.  I don't think we need localized translations of the second set,
 so long as we have some sort of "covering message" that can be localized
 for them.

I'm sure this can be covered in some macro way.  A random idea:

elog(ERROR, INTERNAL_ERROR("text"), ...)

expands to

elog(ERROR, gettext("Internal error: %s"), ...)

OTOH, we should not yet make presumptions about what dedicated translators
can be capable of.  :-)

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Tom Lane


Peter Eisentraut [EMAIL PROTECTED] writes:
 Well, SQL defines these.  Do we want to make our own list?  However,
 numeric codes also have the advantage that some hierarchy is possible.
 E.g., the "22" in "2200G" is actually the category code "data exception".
 Personally, I would stick to the SQL codes but make some readable macro
 name for backend internal use.

We will probably find cases where we need codes not defined by SQL
(since we have non-SQL features).  If there is room to invent our
own codes then I have no objection to this.

 I am not sure we can/should use gettext (possible license problems?),

 Gettext is an open standard, invented at Sun IIRC.  There is also an
 independent implementation for BSDs in the works.  On GNU/Linux system
 it's in the C library.  I don't see any license problems that way.

Unless that BSD implementation is ready to go, I think we'd be talking
about relying on GPL'd (not LGPL'd) code for an essential component of
the system functionality.  Given RMS' recent antics I am much less
comfortable with that than I might once have been.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Peter Eisentraut


Mark Bixby writes:

 Creating global relations in /blah/blah/blah
 ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
 ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
 syntax error 25 : - .

I'm curious about that last line.  Is that the shell complaining?

The offending command seems to be

insert OID = 0 ( POSTGRES PGUID t t t t _null_ _null_ )

in the file global1.bki.source.  (This is the file the creates the global
relations.)  The POSTGRES and PGUID quantities are substituted when initdb
runs:

cat "$GLOBAL" \
| sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \
  -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \
| "$PGPATH"/postgres $BACKENDARGS template1

For some reason the line probably ends up being

insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ )
^
which causes the observed failure to parse BIXBY as user id.  This brings
us back to why the dot disappears, which seems to be related to the error
message

syntax error 25 : - .
^^^

Can you try using a different a sed command (e.g, GNU sed)?

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] undefined reference pq

2001-03-09 Thread Peter Eisentraut


Jeff Lu writes:

 $ make intrasend
 gcc -o /c/inetpub/wwwroot/cgi-bin/intrasend.exe intrasend.c
 intrautils.c -I/usr/
 local/pgsql/include -L/usr/local/pgsql/lib

-lpq

 intrautils.c:7: warning: initialization makes integer from pointer without a
 cas
 t
 /c/TEMP/ccXES02E.o(.text+0x32c):intrasend.c: undefined reference to
 `PQconnectdb'
[...]

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Tom Lane


Peter Eisentraut [EMAIL PROTECTED] writes:
 cat "$GLOBAL" \
 | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \
   -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \
 | "$PGPATH"/postgres $BACKENDARGS template1

 For some reason the line probably ends up being

 insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ )
 ^
 which causes the observed failure to parse BIXBY as user id.

Good thought.  Just looking at this, I wonder if we shouldn't flip the
order of the sed patterns --- as is, won't it mess up if the superuser
name contains PGUID?

A further exercise would be to make it not foul up if the superuser name
contains '/'.  I'd be kind of inclined to use ':' for the pattern
delimiter, since in normal Unix practice usernames can't contain colons
(cf. passwd file format).  Of course one doesn't generally put a slash
in a username either, but I think it's physically possible to do it...

But none of these fully explain Mark's problem.  If we knew where the
"syntax error 25 : - ." came from, we'd be closer to an answer.

regards, tom lane


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Mark Bixby




Tom Lane wrote:
 But none of these fully explain Mark's problem.  If we knew where the
 "syntax error 25 : - ." came from, we'd be closer to an answer.

After scanning the source for "syntax error", line 126 of
backend/bootstrap/bootscanner.l seems to be the likely culprit.
-- 
[EMAIL PROTECTED]
Remainder of .sig suppressed to conserve scarce California electrons...

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace

2001-03-09 Thread Ian Lance Taylor


Tom Lane [EMAIL PROTECTED] writes:

 We just bought back almost all the system time.  The only possible
 explanation is that this way either doesn't keep the buffers from prior
 blocks, or does not scan them for dirtybits.  I note that the open(2)
 man page is phrased so that O_SYNC is actually defined not to fsync the
 whole file, but only the part you just wrote --- I wonder if it's
 actually implemented that way?

Sure, why not?  That's how it is implemented in the Linux kernel.  If
you do a write with O_SYNC set, the write simply flushes out the
buffers it just modified.  If you call fsync, the kernel has to walk
through all the buffers looking for ones associated with the file in
question.

Ian

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Internationalized dates (was Internationalized error messages)

2001-03-09 Thread Kaare Rasmussen


Now you're talking about i18n, maybe someone could think about input and
output of dates in local language.

As fas as I can tell, PostgreSQL will only use English for dates, eg January,
February and weekdays, Monday, Tuesday etc. Not the local name.

-- 
Kaare Rasmussen--Linux, spil,--Tlf:3816 2582
Kaki Datatshirts, merchandize  Fax:3816 2501
Howitzvej 75   ben 14.00-18.00Email: [EMAIL PROTECTED]
2000 FrederiksbergLrdag 11.00-17.00   Web:  www.suse.dk

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Mark Bixby




Tom Lane wrote:
 
 Mark Bixby [EMAIL PROTECTED] writes:
  I just hacked src/test/regress/run_check.sh to invoke initdb with
  --show.  The user name/id is behaving "correctly" for an MPE machine:
 
  SUPERUSERNAME:  MGR.BIXBY
  SUPERUSERID:484
 
 Okay, so much for that theory.
 
 Can you set a breakpoint at elog() and provide a stack backtrace so we
 can see where this is happening?  I can't think where else in the code
 might be affected, but obviously the problem is somewhere else...

Here's a stack trace from the native MPE debugger (we don't have gdb support
yet).  I'm assuming that all results after the initdb failure should be
suspect, and that's possibly why pg_log wasn't created.  I haven't tried
troubleshooting the pg_log problem yet until after I resolve the uid names
issue.

=== Initializing check database instance   
DEBUG/iX C.25.06 

DEBUG Intrinsic at: 129.0009d09c ?$START$
$1 ($4b) nmdebug  b elog
added: NM[1] PROG 129.001ad7d8 elog
$2 ($4b) nmdebug  c
Break at: NM[1] PROG 129.001ad7d8 elog
$3 ($4b) nmdebug  tr
 PC=129.001ad7d8 elog
* 0) SP=41843ef0 RP=129.0018f7a4 pg_atoi+$b4
  1) SP=41843ef0 RP=129.00182994 int4in+$14
  2) SP=41843e70 RP=129.0018296c ?int4in+$8
   export stub: 129.001aed28 $CODE$+$138
  3) SP=41843e30 RP=129.001af428 fmgr+$98
  4) SP=41843db0 RP=129.000c3354 InsertOneValue+$264
  5) SP=41843cf0 RP=129.000c05d4 Int_yyparse+$924
  6) SP=41843c70 RP=129. 
 (end of NM stack)
$4 ($4b) nmdebug  c
=== Starting regression postmaster 
Regression postmaster is running - PID=125239393 PGPORT=65432
=== Creating regression database...
NOTICE:  mdopen: couldn't open
/BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr
ess/tmp_check/data/pg_log: No such file or directory
NOTICE:  mdopen: couldn't open
/BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr
ess/tmp_check/data/pg_log: No such file or directory
psql: FATAL 1:  cannot open relation pg_log
createdb: database creation failed
createdb failed
make: *** [runcheck] Error 1
-- 
[EMAIL PROTECTED]
Remainder of .sig suppressed to conserve scarce California electrons...

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace

2001-03-09 Thread Mikheev, Vadim


 I tried this on HPUX 10.20, which has not only O_SYNC but also O_DSYNC
 (defined to do the equivalent of fdatasync()), and got truly 
 fascinating results. Apparently, on this platform these flags change
 the kernel's buffering behavior!  Observe:

Solaris 2.6 fascinates even more!!!

 $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC tfsync.c
 $ time a.out
 
 real0m21.40s
 user0m0.02s
 sys 0m0.60s

bash-2.02# gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC tfsync.c 
bash-2.02# time a.out 

real0m4.242s
user0m0.000s
sys 0m0.450s

It's hard to believe... Writing with DSYNC takes the same time as
file initialization - ~2 sec.
Also, there is no difference if using 64k blocks.
INIT_WRITE + OSYNC gives 52 sec for 8k blocks and 5.7 sec
for 256k ones, but INIT_WRITE + DSYNC doesn't depend on block
size.
Modern IDE drive? -:))

Probably we should change code to use O_DSYNC if defined even without
changing XLogWrite to write more than 1 block at once (if requested)?

As for O_SYNC:

bash-2.02# gcc -Wall -O -DINIT_WRITE tfsync.c 
bash-2.02# time a.out 

real0m54.786s
user0m0.010s
sys 0m10.820s
bash-2.02# gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC tfsync.c 
bash-2.02# time a.out 

real0m52.406s
user0m0.020s
sys 0m0.650s

Not big win. Solaris has more optimized search for dirty blocks
than Tom' HP and Andreas' platform?

Vadim

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace

2001-03-09 Thread Mikheev, Vadim


 $ gcc -Wall -O -DINIT_WRITE tfsync.c
 $ time a.out
 
 real1m15.11s
 user0m0.04s
 sys 0m32.76s
 
 Note the large amount of system time here, and the fact that the extra
 time in INIT_WRITE is all system time.  I have previously 
 observed that fsync() on HPUX 10.20 appears to iterate through every
 kernel disk buffer belonging to the file, presumably checking their 
 dirtybits one by one. The INIT_WRITE form loses because each fsync in
 the second loop has to iterate through a full 16Mb worth of buffers,
 whereas without INIT_WRITE there will only be as many buffers as the
 amount of file we've filled so far.  (On this platform, it'd probably
 be a win to use log segments smaller than 16Mb...)  It's interesting
 that there's no visible I/O cost here for the extra write pass ---
 the extra I/O must be completely overlapped with the extra system time.

Tom, could you run this test for different block sizes?
Up to 32*8k?
Just curious when you get something close to

 $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC tfsync.c
 $ time a.out
 
 real0m21.40s
 user0m0.02s
 sys 0m0.60s

Vadim

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Mark Bixby




Mark Bixby wrote:
 It seems that plpgsql.sl didn't get built.  Might be an autoconf issue, since
 quite frequently config scripts don't know about shared libraries on MPE.  I
 will investigate this further.

Ah.  I found src/Makefile.shlib and added the appropriate stuff.

Woohoo!  We have test output!  The regression README was clear about how some
platform dependent errors can be expected, and how to code for these
differences in the expected outputs.

Now I'm off to examine the individual failures

MULTIBYTE=;export MULTIBYTE; \
/bin/sh ./run_check.sh hppa1.0-hp-mpeix
=== Removing old ./tmp_check directory ... 
=== Create ./tmp_check directory   
=== Installing new build into ./tmp_check  
=== Initializing check database instance   
=== Starting regression postmaster 
Regression postmaster is running - PID=125042790 PGPORT=65432
=== Creating regression database...
CREATE DATABASE
=== Installing PL/pgSQL... 
=== Running regression queries...  
parallel group1 (12 tests)   ...
 boolean  text  name  oid  float4  varchar  char  int4  int2  float8  int8 
nume
ric 
   test boolean  ...  ok
   test char ...  ok
   test name ...  ok
   test varchar  ...  ok
   test text ...  ok
   test int2 ...  ok
   test int4 ...  ok
   test int8 ...  ok
   test oid  ...  ok
   test float4   ...  ok
   test float8   ...  FAILED
   test numeric  ...  ok
sequential test strings  ...  ok
sequential test numerology   ...  ok
parallel group2 (15 tests)   ...
 comments  path  polygon  lseg  point  box  reltime  interval  tinterval 
circle
  inet  timestamp  type_sanity  opr_sanity  oidjoins 
   test point...  ok
   test lseg ...  ok
   test box  ...  ok
   test path ...  ok
   test polygon  ...  ok
   test circle   ...  ok
   test interval ...  FAILED
   test timestamp...  FAILED
   test reltime  ...  ok
   test tinterval...  ok
   test inet ...  ok
   test comments ...  ok
   test oidjoins ...  ok
   test type_sanity  ...  ok
   test opr_sanity   ...  ok
sequential test abstime  ...  ok
sequential test geometry ...  FAILED
sequential test horology ...  FAILED
sequential test create_function_1...  ok
sequential test create_type  ...  ok
sequential test create_table ...  ok
sequential test create_function_2...  ok
sequential test copy ...  ok
parallel group3 (6 tests)...
 create_aggregate  create_operator  triggers  constraints  create_misc 
create_i
ndex 
   test constraints  ...  ok
   test triggers ...  ok
   test create_misc  ...  ok
   test create_aggregate ...  ok
   test create_operator  ...  ok
   test create_index ...  ok
sequential test create_view  ...  ok
sequential test sanity_check ...  ok
sequential test errors   ...  ok
sequential test select   ...  ok
parallel group4 (16 tests)   ...
 arrays  union  select_having  transactions  portals  join  select_implicit 
sel
ect_distinct_on  subselect  case  random  select_distinct  select_into 
aggregat
es  hash_index  btree_index 
   test select_into  ...  ok
   test select_distinct  ...  ok
   test select_distinct_on   ...  ok
   test select_implicit  ...  ok
   test select_having...  ok
   test subselect...  ok
   test union...  ok
   test case ...  ok
   test join ...  ok
   test aggregates   ...  ok
   test transactions ...  ok
   test random   ...  ok
   test portals  ...  ok
   test arrays   ...  ok
   test btree_index  ...  ok
   test hash_index   ...  ok
sequential test misc ...  ok
parallel group5 (5 tests)...
 portals_p2  foreign_key  rules  alter_table  select_views 
   test select_views ...  ok
   test alter_table  ...  ok

Re: [HACKERS] Performance monitor

2001-03-09 Thread Gordon A. Runkle


In article [EMAIL PROTECTED], "Bruce Momjian"
[EMAIL PROTECTED] wrote:
 The problem I see with the shared memory idea is that some of the
 information needed may be quite large.  For example, query strings can
 be very long.  Do we just allocate 512 bytes and clip off the rest.  And
 as I add more info, I need more shared memory per backend.  I just liked
 the file system dump solution because I could modify it pretty easily,
 and because the info only appears when you click on the process, it
 doesn't happen often.
 
 Of course, if we start getting the full display partly from each
 backend, we will have to use shared memory.

Long-term, perhaps a monitor server (like Sybase ASE uses) might 
be a reasonable approach.  That way, only one process (and a well-
regulated one at that) would be accessing the shared memory, which
should make it safer and have less of an impact performance-wise
if semaphores are needed to regulate access to the various regions
of shared memory.

Then, 1-N clients may access the monitor server to get performance
data w/o impacting the backends.

Gordon.
-- 
It doesn't get any easier, you just go faster.
   -- Greg LeMond

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] Interesting failure mode for initdb

2001-03-09 Thread Tom Lane


Assume a configuration problem that causes standalone backends to fail
without doing anything.  (I happened across this by tweaking global.bki
in such a way that the superuser name entered into pg_shadow was
different from what getpwname returns.  I don't have a real-world
example, but I'm sure there are some.)  Unless the failure is so bad
as to provoke a coredump, the backend will print a FATAL error message
and then exit with exit status 0, because that's what it's supposed to
do under the postmaster.

Unfortunately, given the exit status 0, initdb doesn't notice anything
wrong.  And since initdb carefully stuffs ALL stdout and stderr output
from its standalone-backend calls into /dev/null, the user will never
notice anything wrong either, unless he's attuned enough to realize that
initdb should've taken longer.

I think one part of the fix is to modify elog() so that a FATAL exit
results in exit status 1, not 0, if not IsUnderPostmaster.  But this
will not help the user of initdb, who will still have no clue why
the initdb is failing, even if he turns on debug output from initdb.

I tried modifying initdb along the lines of removing "-o /dev/null"
from PGSQL_OPT, and then writing (eg)

echo "CREATE TRIGGER pg_sync_pg_pwd AFTER INSERT OR UPDATE OR DELETE ON pg_shadow" \
 "FOR EACH ROW EXECUTE PROCEDURE update_pg_pwd()" \
 | "$PGPATH"/postgres $PGSQL_OPT template1 21 /dev/null \
 | grep -v ^DEBUG || exit_nicely

so that all non-DEBUG messages from the standalone backend would appear
in initdb's output.  However, this does not work because then the ||
tests the exit status of grep, not postgres.  I don't think

(postgres || exit_nicely) | grep

would work either --- the exit will occur in a subprocess.

At the very least we should hack initdb so that --debug removes
"-o /dev/null" from PGSQL_OPT, but can you see any way to provide
filtered stderr output from the backend in the normal mode of operation?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] porting question: funky uid names?

2001-03-09 Thread Tom Lane


Mark Bixby [EMAIL PROTECTED] writes:
 So why is there a backend/global1.bki.source *and* a
 backend/catalog/global1.bki.source?

You don't want to know ;-) ... it's all cleaned up for 7.1 anyway.
I think in 7.0 you have to run make install in src/backend to get the
.bki files installed.

 But now runcheck dies during the install of PL/pgSQL, with createlang
 complaining about a missing lib/plpgsql.sl.

 I did do an MPE implementation of dynloader.c, but I was under the dim
 impression this was only used for user-added functions, not core
 functionality.  Am I mistaken?  Are you dynaloading core functionality too?

No, but the regress tests try to test plpgsql too ... you should be able
to dike out the createlang call and have all tests except the plpgsql
regress test work.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Tom Lane


"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Tom, could you run this test for different block sizes?
 Up to 32*8k?
 
 You mean changing the amount written per write(), while holding the
 total file size constant, right?

 Yes. Currently XLogWrite writes 8k blocks one by one. From what I've seen
 on Solaris we can use O_DSYNC there without changing XLogWrite to
 write() more than 1 block (if  1 block is available for writing).
 But on other platforms write(BLOCKS_TO_WRITE * 8k) + fsync() probably will
 be
 faster than BLOCKS_TO_WRITE * write(8k) (for file opened with O_DSYNC)
 if BLOCKS_TO_WRITE  1.
 I just wonder with what BLOCKS_TO_WRITE we'll see same times for both
 approaches.

Okay, I changed the program to
char zbuffer[8192 * BLOCKS];
(all else the same)

and on HPUX 10.20 I get

$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c
$ time a.out

real1m18.48s
user0m0.04s
sys 0m34.69s
$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=4 tfsync.c
$ time a.out

real0m35.10s
user0m0.01s
sys 0m9.08s
$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=8 tfsync.c
$ time a.out

real0m29.75s
user0m0.01s
sys 0m5.23s
$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=32 tfsync.c
$ time a.out

real0m22.77s
user0m0.01s
sys 0m1.80s
$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=64 tfsync.c
$ time a.out

real0m22.08s
user0m0.01s
sys 0m1.25s


$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=1 tfsync.c
$ time a.out

real0m20.64s
user0m0.02s
sys 0m0.67s
$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=4 tfsync.c
$ time a.out

real0m20.72s
user0m0.01s
sys 0m0.57s
$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=32 tfsync.c
$ time a.out

real0m20.59s
user0m0.01s
sys 0m0.61s
$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=64 tfsync.c
$ time a.out

real0m20.86s
user0m0.01s
sys 0m0.69s

So I also see that there is no benefit to writing more than one block at
a time with ODSYNC.  And even at half a meg per write, DSYNC is slower
than ODSYNC with 8K per write!  Note the fairly high system-time
consumption for DSYNC, too.  I think this is not so much a matter of a
really good ODSYNC implementation, as a really bad DSYNC one ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Mikheev, Vadim


 $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c
  ^^^
You should use -DUSE_OSYNC to test O_SYNC.
So you've tested N * write() + fsync(), exactly what I've asked -:)

 So I also see that there is no benefit to writing more than 
 one block at a time with ODSYNC.  And even at half a meg per write,
 DSYNC is slower than ODSYNC with 8K per write!  Note the fairly high
 system-time consumption for DSYNC, too.  I think this is not so much
 a matter of a really good ODSYNC implementation, as a really bad DSYNC
 one ...

So seems we can use O_DSYNC without losing log write performance
comparing with write() + fsync. Though, we didn't tested write() +
fdatasync()
yet...

Vadim

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Tom Lane


More numbers, these from a Powerbook G3 laptop running Linux 2.2:

[tgl@g3 tmp]$ uname -a
Linux g3 2.2.18-4hpmac #1 Thu Dec 21 15:16:15 MST 2000 ppc unknown

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m32.418s
user0m0.020s
sys 0m14.020s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=4 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m10.894s
user0m0.000s
sys 0m4.030s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=8 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m7.211s
user0m0.000s
sys 0m2.200s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=32 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m4.441s
user0m0.020s
sys 0m0.870s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=64 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m4.488s
user0m0.000s
sys 0m0.640s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=1 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m3.725s
user0m0.000s
sys 0m0.310s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=4 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m3.785s
user0m0.000s
sys 0m0.290s

[tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=64 tfsync.c
[tgl@g3 tmp]$ time ./a.out

real0m3.753s
user0m0.010s
sys 0m0.300s


Starting to look like we should just use ODSYNC where available, and
forget about dumping more per write ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Mikheev, Vadim


 Starting to look like we should just use ODSYNC where available, and
 forget about dumping more per write ...

I'll run these tests on RedHat 7.0 tomorrow.

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Tom Lane


"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c
   ^^^
 You should use -DUSE_OSYNC to test O_SYNC.

Ooops ... let's hear it for cut-and-paste, and for sharp-eyed readers!

Just for completeness, here are the results for O_SYNC:

$ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=1 tfsync.c
$ time a.out

real0m43.44s
user0m0.02s
sys 0m0.74s
$ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=4 tfsync.c
$ time a.out

real0m26.38s
user0m0.01s
sys 0m0.59s
$ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=8 tfsync.c
$ time a.out

real0m23.86s
user0m0.01s
sys 0m0.59s

$ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=64 tfsync.c
$ time a.out

real0m22.93s
user0m0.01s
sys 0m0.66s

Better than fsync(), but still not up to O_DSYNC.

 So seems we can use O_DSYNC without losing log write performance
 comparing with write() + fsync. Though, we didn't tested write() +
 fdatasync() yet...

Good point, we should check fdatasync() too --- although I have no
machines where it's different from fsync().

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace

2001-03-09 Thread Tom Lane


"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Ok, I've made changes in xlog.c and run tests:

Could you send me your diffs?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

51 matches

Mail list logo