Re: [HACKERS] Internationalized error messages
On Thu, Mar 08, 2001 at 09:00:09PM -0500, Tom Lane wrote: On Thu, Mar 08, 2001 at 11:49:50PM +0100, Peter Eisentraut wrote: I really feel that translated error messages need to happen soon. Agreed. Yes, error codes is *very* wanted feature. ERROR: Attribute "foo" not found -- basic message for dumb frontends ERRORCODE: UNREC_IDENT -- key for finding localized message PARAM1: foo -- something to embed in the localized message MESSAGE: Attribute or table name not known within context of query CODELOC: src/backend/parser/parse_clause.c line 345 QUERYLOC: 22 Great idea! I agree that we need some powerful Error protocol instead currect string based messages. For transaltion to other languages I not sure with gettext() stuff on backend -- IMHO better (faster) solution will postgres system catalog with it. May be add new command too: SET MESSAGE_LANGUAGE TO xxx, because wanted language not must be always same as locale setting. Something like elog(ERROR, gettext(...)); is usable, but not sounds good for me. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
AW: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace
Even with true fdatasync it's not obviously good for performance - it takes too long time to write 16Mb files and fills OS buffer cache with trash-:( True. But at least the write is (hopefully) being done at a non-performance-critical time. So you have non critical time every five minutes ? Those platforms that don't have fdatasync won't profit anyway. Even if the IO is done by postmaster the write to the disk has a severe impact on concurrent other disk activity. In a real 5 minutes checkpoint setup we are seriously talking about 48 Mb at least, or you risc foreground log creation. On systems I know, that means 100% busying the disk for at least 8 seconds. Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Internationalized error messages
For transaltion to other languages I not sure with gettext() stuff on backend -- IMHO better (faster) solution will postgres system catalog with it. May be add new command too: SET MESSAGE_LANGUAGE TO xxx, because wanted language not must be always same as locale setting. In the multibyte enabled environment, that kind of command would not be necessary except UNICODE and MULE_INTERNAL, since they are multi-lingual encoding. For them, we might need something like: SET LANGUAGE_PREFERENCE TO 'Japanese'; For the long term solutuon, this kind of problem should be solved in the implemetaion of SQL-92/99 i18n features. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] WAL SHM principles
BTW, what means "bummer" ? Sorry, it means, "Oh, I am disappointed." thanks :) But for many OSes you CAN control when to write data - you can mlock individual pages. mlock() controls locking in physical memory. I don't see it controling write(). When you mmap, you don't use write() ! mlock actualy locks page in memory and as long as the page is locked the OS doesn't attempt to store the dirty page. It is intended also for security app to ensure that sensitive data are not written to unsecure storage (hdd). It is definition of mlock so that you can be probably sure with it. There is way to do it without mlock (fallback): You definitely need some kind of page headers. The header should has info whether the page can be mmaped or is in "dirty pool". Pages in dirty pool are pages which are dirty but not written yet and are waiting to appropriate log record to be flushed. When log is flushed the data at dirty pool can be copied to its regular mmap location and discarded. If dirty pool is too large, simply sync log and whole pool can be discarded. mmap version could be faster when loading data from hdd and will result in better utilization of memory (because you are directly working with data at OS' page-cache instead of having duplicates in pg's buffer cache). Also page cache expiration is handled by OS and it will allow pg to use as much memory as is available (no need to specify buffer page size). devik ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
RE: [HACKERS] WAL SHM principles
Pros: upper layers can think thet buffers are always safe/logged and there is no special handling for indices; very simple/fast redo Cons: can't implement undo - but in non-overwriting is not needed (?) But needed if we want to get rid of vacuum and have savepoints. Hmm. How do you implement savepoints ? When there is rollback to savepoint do you use xlog to undo all changes which the particular transaction has done ? Hmmm it seems nice ... these resords are locked by such transaction so that it can safely undo them :-) Am I right ? But how can you use xlog to get rid of vacuum ? Do you treat all delete log records as candidates for free space ? regards, devik ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: AW: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace
Zeugswetter Andreas SB [EMAIL PROTECTED] writes: Even with true fdatasync it's not obviously good for performance - it takes too long time to write 16Mb files and fills OS buffer cache with trash-:( True. But at least the write is (hopefully) being done at a non-performance-critical time. So you have non critical time every five minutes ? Those platforms that don't have fdatasync won't profit anyway. Yes they will; you're forgetting the cost of updating filesystem overhead. Suppose that we do not preallocate the log files. Each WAL fsync will require a write of the added data block(s), plus a write of at least one indirect block to record the allocation of new blocks to the file, plus a write of the file's inode, plus a write of the cylinder group's free-space bitmap. It takes extremely lucky placement of the file and indirect blocks to achieve less than four seeks per WAL block written. Total cost to write a 16MB file: roughly eight thousand seeks, assuming 8K block size. Even if we consider it safe to use fdatasync in this scenario, it will save only one of the four seeks, since the indirect block and freespace map *must* be updated regardless. Now consider the preallocation approach. In the preallocation phase, we write like mad and then fsync the file ONCE. This means *one* write of each affected data block, indirect block, freespace map block, and the inode, versus one write of each data block and circa two thousand writes of the others. Furthermore the kernel is free to schedule these writes in some reasonable fashion, and so we may hope that something less than two thousand seeks will be used to do it. Then we come to the phase of actually writing the file. No indirect block or freespace bitmap updates will occur. On a machine that implements fdatasync, we write data blocks and nothing else. One seek per block written, possibly no seeks if the layout is good. Even if we don't have fdatasync, it's only two seeks per block written (the block and the inode only). So, at worst four thousand seeks in this phase, at best much less than two thousand. Bottom line is that it should take fewer seeks overall to do it this way, even on a machine without fdatasync, and even if we don't get to count any benefit from doing a large part of the work outside the critical path of transaction commit. Also, given that modern systems *do* have fdatasync, I do not see why we should not optimize for that case. It is true that prezeroing the file will tend to fill the kernel's disk cache with entirely useless blocks. I don't know of any portable way around that, but even an unportable way might be worth #ifdefing in on platforms where it works. Does anyone know a way of suppressing caching of outgoing blocks, or flushing them from the kernel's cache right away? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Internationalized error messages
Nathan Myers writes: elog(ERROR, "XYZ01", gettext("stuff happened")); Similar approaches have been tried frequently, and even enshrined in standards (e.g. POSIX catgets), but have almost always proven too cumbersome. The problem is that keeping programs that interpret the numeric code in sync with the program they monitor is hard, and trying to avoid breaking all those secondary programs hinders development on the primary program. That's why no one uses catgets and everyone uses gettext. Furthermore, assigning code numbers is a nuisance, and they add uninformative clutter. The error codes are exactly what we want, to allow client programs (as opposed to humans) to identify the errors. The code in my example has nothing to do with the message id in the catgets interface. It's better to scan the program for elog() arguments, and generate a catalog by using the string itself as the index code. Those maintaining the secondary programs can compare catalogs to see what has been broken by changes and what new messages to expect. elog() itself can (optionally) invent tokens (e.g. catalog indices) to help out those programs. That's what gettext does for you. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Internationalized error messages
Tom Lane writes: There's a difficult tradeoff to make here, but I think we do want to distinguish between the "official error code" --- the thing that has translations into various languages --- and what the backend is actually allowed to print out. It seems to me that a fairly large fraction of the unique messages found in the backend can all be lumped under the category of "internal error", and that we need to have only one official error code and one user-level translated message for the lot of them. That's exactly what I was trying to avoid. You'd still be allowed to choose the error message text freely, but client programs will be able to make sense of them by looking at the code only, as opposed to parsing the message text. I'm trying to avoid making the message text to be computed from the error code, because that obscures the source code. Another thing that's bothered me for a long time is our inconsistent approach to determining where in the code a message comes from. A lot of the messages currently embed the name of the generating routine right into the error text. Again, we ought to separate the functionality: the source-code location is valuable but ought not form part of the primary error message. I would like to see elog() become a macro that invokes __FILE__ and __LINE__ to automatically make the *exact* source code location become part of the secondary error information, and then drop the convention of using the routine name in the message text. These sort of things have been on my mind as well, but they're really independent of my issue. We can easily have runtime options to append or not additional things to the error string. I don't see this as part of my proposal. Another thing that I missed in Peter's proposal is how we are going to cope with messages that include parameters. Surely we do not expect gettext to start with 'Attribute "foo" not found' and distinguish fixed from variable parts of that string? Sure we do. That would mean we'd have to dump all the info into a single string, which is doable but would perhaps look pretty ugly: ERROR: Attribute "foo" not found -- basic message for dumb frontends ERRORCODE: UNREC_IDENT -- key for finding localized message There should not be a "key" to look up localized messages. Remember that the localization will also have to be done in all the front-end programs. Surely we do not wish to make a list of messages that pg_dump or psql print out. Gettext takes care of this stuff. The only reason why we need error codes is for the sake of ease of interpreting by programs. PARAM1: foo -- something to embed in the localized message Not necessary. MESSAGE: Attribute or table name not known within context of query How's that different from ERROR:? CODELOC: src/backend/parser/parse_clause.c line 345 Can be appended to ERROR (or MESSAGE) depending on configuration setting. QUERYLOC: 22 Not all errors are related to a query. The general problem here is also that this would introduce a client incompatibility. Older clients that do not expect this amount of detail will print all this garbage to the screen? -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Internationalized error messages
Peter Eisentraut [EMAIL PROTECTED] writes: That's exactly what I was trying to avoid. You'd still be allowed to choose the error message text freely, but client programs will be able to make sense of them by looking at the code only, as opposed to parsing the message text. I'm trying to avoid making the message text to be computed from the error code, because that obscures the source code. I guess I don't understand what you have in mind, because this seems self-contradictory. If "client programs can look at the code only", then how can the error message text be chosen independently of the code? Surely we do not expect gettext to start with 'Attribute "foo" not found' and distinguish fixed from variable parts of that string? Sure we do. How does that work exactly? You're assuming an extremely intelligent localization mechanism, I guess, which I was not. I think it makes more sense to work a little harder in the backend to avoid requiring AI software in every frontend. MESSAGE: Attribute or table name not known within context of query How's that different from ERROR:? Sorry, I meant that as an example of the "secondary message string", but it's a pretty lame example... The general problem here is also that this would introduce a client incompatibility. Older clients that do not expect this amount of detail will print all this garbage to the screen? Yes, if we send it to them. It would make sense to control the amount of detail presented via some option (a GUC variable, probably). For backwards compatibility reasons we'd want the default to correspond to roughly the existing amount of detail. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.0.3 Bitset dumping
Christopher Kings-Lynne writes: In case anyone cares, there is a bug in pg_dump in 7.0.3 when using the bit fields. There are no bit fields in 7.0, at least no officially. 7.1 does support them. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Internationalized error messages
Tom Lane writes: I guess I don't understand what you have in mind, because this seems self-contradictory. If "client programs can look at the code only", then how can the error message text be chosen independently of the code? Let's say "type mismatch error", code 2200G acc. to SQL. At one place in the source you write elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...); Elsewhere you'd write elog(ERROR, "2200G", "type mismatch in argument %d of function %s, expected %s, got %s", ...); Humans can look at this and have a fairly good idea what they'd need to fix. However, a client program currently only has the option of failing or not failing. In this example case it would probably better for it to fail, but someone else already put forth the example of constraint violation. In this case the program might want to do something else. Surely we do not expect gettext to start with 'Attribute "foo" not found' and distinguish fixed from variable parts of that string? Sure we do. How does that work exactly? You're assuming an extremely intelligent localization mechanism, I guess, which I was not. I think it makes more sense to work a little harder in the backend to avoid requiring AI software in every frontend. Gettext takes care of this. In the source you'd write elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"), string, string); When you run the xgettext utility program it scans the source for cases of gettext(...) and creates message catalogs for the translators. When it finds printf arguments it automatically includes marks in the message, such as "type mismatch in CASE expression (%1$s vs %2$s)" which the translator better keep in his version. This also handles the case where the arguments might have to appear in a different order in a different language. Sorry, I meant that as an example of the "secondary message string", but it's a pretty lame example... I guess I'm not sold on the concept of primary and secondary message strings. If the primary message isn't good enough you better fix that. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Internationalized error messages
Peter Eisentraut [EMAIL PROTECTED] writes: Let's say "type mismatch error", code 2200G acc. to SQL. At one place in the source you write elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...); Elsewhere you'd write elog(ERROR, "2200G", "type mismatch in argument %d of function %s, expected %s, got %s", ...); Okay, so your notion of an error code is not a localizable entity at all, it's something for client programs to look at. Now I get it. I object to writing "2200G" however, because that has no mnemonic value whatever, and is much too easy to get wrong. How about elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s, expected %s, got %s", ...); where ERR_TYPE_MISMATCH is #defined as "2200G" someplace? Or for that matter #defined as "TYPE_MISMATCH"? Content-free numeric codes are no fun to use on the client side either... Gettext takes care of this. In the source you'd write elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"), string, string); Duh. For some reason I was envisioning the localization substitution as occurring on the client side, but of course we'd want to do it on the server side, and before parameters are substituted into the message. Sorry for the noise. I am not sure we can/should use gettext (possible license problems?), but certainly something like this could be cooked up. Sorry, I meant that as an example of the "secondary message string", but it's a pretty lame example... I guess I'm not sold on the concept of primary and secondary message strings. If the primary message isn't good enough you better fix that. The motivation isn't so much to improve on the primary message as to reduce the number of distinct strings that really need to be translated. Remember all those internal "can't happen" errors. If we have only one message component then the translator is faced with a huge pile of internal messages and not a lot of gain from translating them. If there's a primary and secondary component then all the internal messages can share the same primary component ("Internal error, please file a bug report"). Now the translator translates that one message, and can ignore the many secondary-component messages with a clear conscience. (Of course, he can translate those too if he really wants to, but the point is that he doesn't *have* to do it to attain reasonably friendly behavior.) Perhaps another way to look at it is that we have a bunch of errors that are user-oriented (ie, relate pretty directly to something the user did wrong) and another bunch that are system-oriented (relate to internal problems, such as consistency check failures or violations of internal APIs). We want to provide localized translations of the first set, for sure. I don't think we need localized translations of the second set, so long as we have some sort of "covering message" that can be localized for them. Maybe instead of "primary" and "secondary" strings for a single error, we ought to distinguish these two categories of error and plan different localization strategies for them. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Query not using index, please explain.
Richard Poole [EMAIL PROTECTED] writes: [ snip a bunch of commentary about optimizer statistics ] Can someone who really knows this stuff (Tom?) step in if what I've just said is completely wrong? Looked good to me. select domain from history_entries group by domain; To me, since there is an index on domain, it seems like this should be a rather fast thing to do? It takes a *very* long time, no matter if I turn seqscan on or off. The reason this is slow is that Postgres always has to look at heap tuples, even when it's been sent there by indexes. This in turn is because of the way the storage manager works (only by looking in the heap can you tell for sure whether a tuple is valid for the current transaction). So a "group by" always has to look at every heap tuple (that hasn't been eliminated by a where clause). "select distinct" has the same problem. I don't think there's a way to do what you want here with your existing schema without a sequential scan over the table. But this last I object to. You certainly could use an index scan here, it's just that it's not very likely to be faster. The way that Postgres presently does GROUP BY and SELECT DISTINCT is to sort the tuples into order and then combine/eliminate adjacent duplicates. (If you've ever used a "sort | uniq" pipeline in Unix then you know the principle.) So the initial requirement for this query is to scan the history_entries tuples in order by domain. We can do this either by a sequential scan and explicit sort, or by an index scan using an ordered index on domain. It turns out that unless the physical order of the tuples is pretty close to their index order, the index-scan method is actually slower, because it results in a lot of random-access thrashing. But neither way is exactly speedy. One thing that's on the to-do list is to look at reimplementing these operations using in-memory hash tables, with one hash entry per distinct value of the GROUP/DISTINCT columns. Then you can just do a sequential scan, with no sort, and as long as there aren't so many distinct values as to make the hash table overflow memory you're going to win. However until we have statistics that can give us some idea how many distinct values there might be, there's no way for the planner to make an intelligent choice between this way and the sort/uniq way... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] PQfinish(const PGconn *conn) question
Hi, I'm wondering how safe it is to pass an uninitialized conn to PQfinish I have set a SIGALRM handler that terminates my daemon and closes connection to Postgres if time-out happens. However, Im setting the alarm few times before the connection to Postgres was established... which means that conn pointer in not initialized yet and it will be passed to PQfinish in my SIGALRM handler. Im not sure how Postgres will handle that so far that doesnt seem to be causing any errors but Im not sure what's actually happening behind the scenes... Any input would be appreciated. Regards, Boulat Khakimov -- Nothing Like the Sun ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
AW: AW: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace
Even with true fdatasync it's not obviously good for performance - it takes too long time to write 16Mb files and fills OS buffer cache with trash-:( True. But at least the write is (hopefully) being done at a non-performance-critical time. So you have non critical time every five minutes ? Those platforms that don't have fdatasync won't profit anyway. Yes they will; you're forgetting the cost of updating filesystem overhead. I did have that in mind, but I thought that in effect the OS would optimize sparse file allocation somehow. Doing some tests however showed that while your variant is really good and saves 12 seconds, the performance is *very* poor for eighter variant. A short test shows, that opening the file O_SYNC, and thus avoiding fsync() would cut the effective time needed to sync write the xlog more than in half. Of course we would need to buffer = 1 xlog page before write (or commit) to gain the full advantage. prewrite 0 + write and fsync: 60.4 sec sparse file + write with O_SYNC:37.5 sec no prewrite + write with O_SYNC:36.8 sec prewrite 0 + write with O_SYNC: 24.0 sec These times include the prewrite when applicable on AIX with jfs. Testprogram attached. I may be overseeing something, though. Andreas tfsync.c ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: AW: AW: [HACKERS] WAL does not recover gracefully from out-of-dis k-sp ace
Zeugswetter Andreas SB [EMAIL PROTECTED] writes: A short test shows, that opening the file O_SYNC, and thus avoiding fsync() would cut the effective time needed to sync write the xlog more than in half. Of course we would need to buffer = 1 xlog page before write (or commit) to gain the full advantage. prewrite 0 + write and fsync: 60.4 sec sparse file + write with O_SYNC: 37.5 sec no prewrite + write with O_SYNC: 36.8 sec prewrite 0 + write with O_SYNC: 24.0 sec This seems odd. As near as I can tell, O_SYNC is simply a command to do fsync implicitly during each write call. It cannot save any I/O unless I'm missing something significant. Where is the performance difference coming from? The reason I'm inclined to question this is that what we want is not an fsync per write but an fsync per transaction, and we can't easily buffer all of a transaction's XLOG writes... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PQfinish(const PGconn *conn) question
Boulat Khakimov [EMAIL PROTECTED] writes: I'm wondering how safe it is to pass an uninitialized conn to PQfinish You can pass a NULL pointer to PQfinish safely, if that's what you meant. Passing a pointer to uninitialized memory seems like a bad idea. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
AW: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace
This seems odd. As near as I can tell, O_SYNC is simply a command to do fsync implicitly during each write call. It cannot save any I/O unless I'm missing something significant. Where is the performance difference coming from? Yes, odd, but sure very reproducible here. I tried this on HPUX 10.20, which has not only O_SYNC but also O_DSYNC AIX has O_DSYNC (which is _FDATASYNC) too, but I assumed O_SYNC would be more portable. Now we have two, maybe it is more widespread than I thought. I attach my modified version of Andreas' program. Note I do not believe his assertion that close() implies fsync() --- on the machines I've used, it demonstrably does not sync. Ok, I am not sure, but essentially do we need it to sync ? The OS sure isn't supposed to notice after closing the file, that it ran out of disk space. Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] porting question: funky uid names?
Hi pgsql-hackers, I'm currently porting 7.0.3 to the HP MPE/iX OS to join my other ports of Apache, BIND, sendmail, Perl, and others. I'm at the point where I'm trying to run the "make runcheck" regression tests, and I've just run into a problem where I need to seek the advice of psql-hackers. MPE is a proprietary OS with a POSIX layer on top. The concept of POSIX uids and gids has been mapped to the concept of MPE usernames and MPE accountnames. An example MPE username would be "MGR.BIXBY", and if you do a POSIX getpwuid(getuid()), the contents of pw_name will be the same "MGR.BIXBY". The fact that pw_name contains a period on MPE has been confusing to some previous ports I've done, and it now appears PostgreSQL is being confused too. Make runcheck is dying in the initdb phase: Creating global relations in /blah/blah/blah ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" syntax error 25 : - . I'm guessing that something tried to parse "MGR.BIXBY", saw the decimal point character and passed the string to pg_atoi() thinking it's a number instead of a name. This seems like a really bad omen hinting at trouble on a fundamental level. What are my options here? 1) I'm screwed; go try porting MySQL instead. ;-) 2) Somehow modify username parsing to be tolerant of the "." character? I was able to do this when I ported sendmail. Where should I be looking in the PostgreSQL source? Is this going to require language grammar changes? 3) Always specify numeric uids instead of user names. Is this even possible? Your advice will be greatly appreciated. MPE users are currently whining on their mailing list about the lack of standard databases for the platform, and I wanted to surprise them by releasing a PostgreSQL port. Thanks! -- [EMAIL PROTECTED] Remainder of .sig suppressed to conserve scarce California electrons... ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: AW: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
Zeugswetter Andreas SB [EMAIL PROTECTED] writes: I attach my modified version of Andreas' program. Note I do not believe his assertion that close() implies fsync() --- on the machines I've used, it demonstrably does not sync. Ok, I am not sure, but essentially do we need it to sync? The OS sure isn't supposed to notice after closing the file, that it ran out of disk space. I believe that out-of-space would be reported during the writes, anyway, so that's not the issue. The point of fsync'ing after the prewrite is to ensure that the indirect blocks are down on disk. If you trust fdatasync (or O_DSYNC) to write indirect blocks then it's not necessary --- but I'm pretty sure I heard somewhere that some versions of fdatasync fail to guarantee that. In any case, the real point of the prewrite is to move work out of the transaction commit path, and so we're better off if we can sync the indirect blocks during prewrite. I tried this on HPUX 10.20, which has not only O_SYNC but also O_DSYNC AIX has O_DSYNC (which is _FDATASYNC) too, but I assumed O_SYNC Oh? What speeds do you get if you use that? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] porting question: funky uid names?
Mark Bixby [EMAIL PROTECTED] writes: MPE is a proprietary OS with a POSIX layer on top. The concept of POSIX uids and gids has been mapped to the concept of MPE usernames and MPE accountnames. An example MPE username would be "MGR.BIXBY", and if you do a POSIX getpwuid(getuid()), the contents of pw_name will be the same "MGR.BIXBY". Hm. And what is returned in pw_uid? I think you are getting burnt by initdb's attempt to assign the postgres superuser's numeric ID to be the same as the Unix userid number of the user running initdb. Look at the uses of pg_id in the initdb script, and experiment with running pg_id by hand to see what it produces. A quick and dirty experiment would be to run "initdb -i 42" (or whatever) to override the result of pg_id. If that succeeds, the real answer may be that pg_id needs a patch to behave reasonably on MPE. Let us know... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Internationalized error messages
Peter Eisentraut [EMAIL PROTECTED] writes: Let's say "type mismatch error", code 2200G acc. to SQL. At one place in the source you write elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...); Tom Lane [EMAIL PROTECTED] spake: I object to writing "2200G" however, because that has no mnemonic value whatever, and is much too easy to get wrong. How about elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s, expected %s, got %s", ...); where ERR_TYPE_MISMATCH is #defined as "2200G" someplace? Or for that matter #defined as "TYPE_MISMATCH"? Content-free numeric codes are no fun to use on the client side either... This is one thing I think VMS does well. All error messages are a composite of the subsystem where they originated, the severity of the error, and the actual error itself. Internally this is stored in a 32-bit word. It's been a long time, so I don't recall how many bits they allocated for each component. The human-readable representation looks like "subsystem-severity-error". -- Andrew Evans ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Internationalized error messages
On Fri, Mar 09, 2001 at 12:05:22PM -0500, Tom Lane wrote: Gettext takes care of this. In the source you'd write elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"), string, string); Duh. For some reason I was envisioning the localization substitution as occurring on the client side, but of course we'd want to do it on the server side, and before parameters are substituted into the message. Sorry for the noise. I am not sure we can/should use gettext (possible license problems?), but certainly something like this could be cooked up. I've been assuming that PG's needs are specialized enough that the project wouldn't use gettext directly, but instead something inspired by it. If you look at my last posting on the subject, by the way, you will see that it could work without a catalog underneath; integrating a catalog would just require changes in a header file (and the programs to generate the catalog, of course). That quality seems to me essential to allow the changeover to be phased in gradually, and to allow different underlying catalog implementations to be tried out. Nathan ncm ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] porting question: funky uid names?
Tom Lane wrote: Mark Bixby [EMAIL PROTECTED] writes: MPE is a proprietary OS with a POSIX layer on top. The concept of POSIX uids and gids has been mapped to the concept of MPE usernames and MPE accountnames. An example MPE username would be "MGR.BIXBY", and if you do a POSIX getpwuid(getuid()), the contents of pw_name will be the same "MGR.BIXBY". Hm. And what is returned in pw_uid? A valid numeric uid. I think you are getting burnt by initdb's attempt to assign the postgres superuser's numeric ID to be the same as the Unix userid number of the user running initdb. Look at the uses of pg_id in the initdb script, and experiment with running pg_id by hand to see what it produces. pg_id without parameters returns uid=484(MGR.BIXBY), which matches what I get from MPE's native id command. The pg_id -n and -u options behave as expected. A quick and dirty experiment would be to run "initdb -i 42" (or whatever) to override the result of pg_id. If that succeeds, the real answer may be that pg_id needs a patch to behave reasonably on MPE. I just hacked src/test/regress/run_check.sh to invoke initdb with --show. The user name/id is behaving "correctly" for an MPE machine: SUPERUSERNAME: MGR.BIXBY SUPERUSERID:484 The initdb -i option will only override the SUPERUSERID, but it's already correct. -- [EMAIL PROTECTED] Remainder of .sig suppressed to conserve scarce California electrons... ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace
Even with true fdatasync it's not obviously good for performance - it takes too long time to write 16Mb files and fills OS buffer cache with trash-:( True. But at least the write is (hopefully) being done at a non-performance-critical time. There is no such hope: XLogWrite may be called from XLogFlush (at commit time and from bufmgr on replacements) *and* from XLogInsert - ie new log file may be required at any time. Probably, we need in separate process like LGWR (log writer) in Oracle. I think the create-ahead feature in the checkpoint maker should be on by default. I'm not sure - it increases disk requirements. I considered this mostly as hint for OS about how log file should be allocated (to decrease fragmentation). Not sure how OSes use such hints but seek+write costs nothing. AFAIK, extant Unixes will not regard this as a hint at all; they'll think it is a great opportunity to not store zeroes :-(. Yes, but if I would write file system then I wouldn't allocate space for file block by block - I would try to pre-allocate more than required by write(). So I hoped that seek+write is hint for OS: "Hey, I need in 16Mb file - try to make it as continuous as possible". Don't know does it work, though -:) One reason that I like logfile fill to be done separately is that it's easier to convince ourselves that failure (due to out of disk space) need not require elog(STOP) than if we have the same failure during XLogWrite. You are right that we don't have time to consider each STOP in the WAL code, but I think we should at least look at that case... What problem with elog(STOP) in the absence of disk space? I think running out of disk is bad enough to stop DB operations. Vadim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
RE: AW: AW: [HACKERS] WAL does not recover gracefully from out-of-dis k-sp ace
The reason I'm inclined to question this is that what we want is not an fsync per write but an fsync per transaction, and we can't easily buffer all of a transaction's XLOG writes... WAL keeps records in WAL buffers (wal-buffers parameter may be used to increase # of buffers), so we can make write()-s buffered. Seems that my Solaris has fdatasync, so I'll test different approaches... Vadim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
RE: [HACKERS] WAL SHM principles
But needed if we want to get rid of vacuum and have savepoints. Hmm. How do you implement savepoints ? When there is rollback to savepoint do you use xlog to undo all changes which the particular transaction has done ? Hmmm it seems nice ... these resords are locked by such transaction so that it can safely undo them :-) Am I right ? Yes, but there is no savepoints in 7.1 - hopefully in 7.2 But how can you use xlog to get rid of vacuum ? Do you treat all delete log records as candidates for free space ? Vaccum removes deleted records *and* records inserted by aborted transactions - last ones will be removed by UNDO. Vadim ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Seems that my Solaris has fdatasync, so I'll test different approaches... A Sun guy told me that Solaris does this just the same way that HPUX does it: fsync() scans all kernel buffers for the file, but O_SYNC doesn't, because it knows it only needs to sync the blocks covered by the write(). He didn't say about fdatasync/O_DSYNC but I bet the same difference exists for those two. The Linux 2.4 kernel allegedly is set up so that fsync() is smart enough to only look at dirty buffers, not all the buffers of the file. So the performance tradeoffs would be different there. But on HPUX and probably Solaris, O_DSYNC is likely to be a big win, unless we can find a way to stop the kernel from buffering so much of the WAL files. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] porting question: funky uid names?
Mark Bixby [EMAIL PROTECTED] writes: I just hacked src/test/regress/run_check.sh to invoke initdb with --show. The user name/id is behaving "correctly" for an MPE machine: SUPERUSERNAME: MGR.BIXBY SUPERUSERID:484 Okay, so much for that theory. Can you set a breakpoint at elog() and provide a stack backtrace so we can see where this is happening? I can't think where else in the code might be affected, but obviously the problem is somewhere else... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] WAL does not recover gracefully from out-of-disk-sp ace
"Mikheev, Vadim" [EMAIL PROTECTED] writes: True. But at least the write is (hopefully) being done at a non-performance-critical time. There is no such hope: XLogWrite may be called from XLogFlush (at commit time and from bufmgr on replacements) *and* from XLogInsert - ie new log file may be required at any time. Sure, but if we have create-ahead enabled then there's a good chance of the log files being made by the checkpoint process, rather than by working backends. In that case the prefill is not time critical. In any case, my tests so far show that prefilling and then writing with O_SYNC or better O_DSYNC is in fact faster than not prefilling; this matches pretty well the handwaving argument I gave Andreas this morning. (With fsync() or fdatasync() it seems we're at the mercy of inefficient kernel algorithms, a factor I didn't consider before.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Internationalized error messages
Tom Lane writes: I object to writing "2200G" however, because that has no mnemonic value whatever, and is much too easy to get wrong. How about elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s, expected %s, got %s", ...); where ERR_TYPE_MISMATCH is #defined as "2200G" someplace? Or for that matter #defined as "TYPE_MISMATCH"? Content-free numeric codes are no fun to use on the client side either... Well, SQL defines these. Do we want to make our own list? However, numeric codes also have the advantage that some hierarchy is possible. E.g., the "22" in "2200G" is actually the category code "data exception". Personally, I would stick to the SQL codes but make some readable macro name for backend internal use. I am not sure we can/should use gettext (possible license problems?), Gettext is an open standard, invented at Sun IIRC. There is also an independent implementation for BSDs in the works. On GNU/Linux system it's in the C library. I don't see any license problems that way. Is has been used widely for free software and so far I haven't seen any real alternative. but certainly something like this could be cooked up. Well, I'm trying to avoid having to do the cooking. ;-) Perhaps another way to look at it is that we have a bunch of errors that are user-oriented (ie, relate pretty directly to something the user did wrong) and another bunch that are system-oriented (relate to internal problems, such as consistency check failures or violations of internal APIs). We want to provide localized translations of the first set, for sure. I don't think we need localized translations of the second set, so long as we have some sort of "covering message" that can be localized for them. I'm sure this can be covered in some macro way. A random idea: elog(ERROR, INTERNAL_ERROR("text"), ...) expands to elog(ERROR, gettext("Internal error: %s"), ...) OTOH, we should not yet make presumptions about what dedicated translators can be capable of. :-) -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Internationalized error messages
Peter Eisentraut [EMAIL PROTECTED] writes: Well, SQL defines these. Do we want to make our own list? However, numeric codes also have the advantage that some hierarchy is possible. E.g., the "22" in "2200G" is actually the category code "data exception". Personally, I would stick to the SQL codes but make some readable macro name for backend internal use. We will probably find cases where we need codes not defined by SQL (since we have non-SQL features). If there is room to invent our own codes then I have no objection to this. I am not sure we can/should use gettext (possible license problems?), Gettext is an open standard, invented at Sun IIRC. There is also an independent implementation for BSDs in the works. On GNU/Linux system it's in the C library. I don't see any license problems that way. Unless that BSD implementation is ready to go, I think we'd be talking about relying on GPL'd (not LGPL'd) code for an essential component of the system functionality. Given RMS' recent antics I am much less comfortable with that than I might once have been. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] porting question: funky uid names?
Mark Bixby writes: Creating global relations in /blah/blah/blah ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" syntax error 25 : - . I'm curious about that last line. Is that the shell complaining? The offending command seems to be insert OID = 0 ( POSTGRES PGUID t t t t _null_ _null_ ) in the file global1.bki.source. (This is the file the creates the global relations.) The POSTGRES and PGUID quantities are substituted when initdb runs: cat "$GLOBAL" \ | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \ -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \ | "$PGPATH"/postgres $BACKENDARGS template1 For some reason the line probably ends up being insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ ) ^ which causes the observed failure to parse BIXBY as user id. This brings us back to why the dot disappears, which seems to be related to the error message syntax error 25 : - . ^^^ Can you try using a different a sed command (e.g, GNU sed)? -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] undefined reference pq
Jeff Lu writes: $ make intrasend gcc -o /c/inetpub/wwwroot/cgi-bin/intrasend.exe intrasend.c intrautils.c -I/usr/ local/pgsql/include -L/usr/local/pgsql/lib -lpq intrautils.c:7: warning: initialization makes integer from pointer without a cas t /c/TEMP/ccXES02E.o(.text+0x32c):intrasend.c: undefined reference to `PQconnectdb' [...] -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] porting question: funky uid names?
Peter Eisentraut [EMAIL PROTECTED] writes: cat "$GLOBAL" \ | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \ -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \ | "$PGPATH"/postgres $BACKENDARGS template1 For some reason the line probably ends up being insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ ) ^ which causes the observed failure to parse BIXBY as user id. Good thought. Just looking at this, I wonder if we shouldn't flip the order of the sed patterns --- as is, won't it mess up if the superuser name contains PGUID? A further exercise would be to make it not foul up if the superuser name contains '/'. I'd be kind of inclined to use ':' for the pattern delimiter, since in normal Unix practice usernames can't contain colons (cf. passwd file format). Of course one doesn't generally put a slash in a username either, but I think it's physically possible to do it... But none of these fully explain Mark's problem. If we knew where the "syntax error 25 : - ." came from, we'd be closer to an answer. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] porting question: funky uid names?
Tom Lane wrote: But none of these fully explain Mark's problem. If we knew where the "syntax error 25 : - ." came from, we'd be closer to an answer. After scanning the source for "syntax error", line 126 of backend/bootstrap/bootscanner.l seems to be the likely culprit. -- [EMAIL PROTECTED] Remainder of .sig suppressed to conserve scarce California electrons... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace
Tom Lane [EMAIL PROTECTED] writes: We just bought back almost all the system time. The only possible explanation is that this way either doesn't keep the buffers from prior blocks, or does not scan them for dirtybits. I note that the open(2) man page is phrased so that O_SYNC is actually defined not to fsync the whole file, but only the part you just wrote --- I wonder if it's actually implemented that way? Sure, why not? That's how it is implemented in the Linux kernel. If you do a write with O_SYNC set, the write simply flushes out the buffers it just modified. If you call fsync, the kernel has to walk through all the buffers looking for ones associated with the file in question. Ian ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Internationalized dates (was Internationalized error messages)
Now you're talking about i18n, maybe someone could think about input and output of dates in local language. As fas as I can tell, PostgreSQL will only use English for dates, eg January, February and weekdays, Monday, Tuesday etc. Not the local name. -- Kaare Rasmussen--Linux, spil,--Tlf:3816 2582 Kaki Datatshirts, merchandize Fax:3816 2501 Howitzvej 75 ben 14.00-18.00Email: [EMAIL PROTECTED] 2000 FrederiksbergLrdag 11.00-17.00 Web: www.suse.dk ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] porting question: funky uid names?
Tom Lane wrote: Mark Bixby [EMAIL PROTECTED] writes: I just hacked src/test/regress/run_check.sh to invoke initdb with --show. The user name/id is behaving "correctly" for an MPE machine: SUPERUSERNAME: MGR.BIXBY SUPERUSERID:484 Okay, so much for that theory. Can you set a breakpoint at elog() and provide a stack backtrace so we can see where this is happening? I can't think where else in the code might be affected, but obviously the problem is somewhere else... Here's a stack trace from the native MPE debugger (we don't have gdb support yet). I'm assuming that all results after the initdb failure should be suspect, and that's possibly why pg_log wasn't created. I haven't tried troubleshooting the pg_log problem yet until after I resolve the uid names issue. === Initializing check database instance DEBUG/iX C.25.06 DEBUG Intrinsic at: 129.0009d09c ?$START$ $1 ($4b) nmdebug b elog added: NM[1] PROG 129.001ad7d8 elog $2 ($4b) nmdebug c Break at: NM[1] PROG 129.001ad7d8 elog $3 ($4b) nmdebug tr PC=129.001ad7d8 elog * 0) SP=41843ef0 RP=129.0018f7a4 pg_atoi+$b4 1) SP=41843ef0 RP=129.00182994 int4in+$14 2) SP=41843e70 RP=129.0018296c ?int4in+$8 export stub: 129.001aed28 $CODE$+$138 3) SP=41843e30 RP=129.001af428 fmgr+$98 4) SP=41843db0 RP=129.000c3354 InsertOneValue+$264 5) SP=41843cf0 RP=129.000c05d4 Int_yyparse+$924 6) SP=41843c70 RP=129. (end of NM stack) $4 ($4b) nmdebug c === Starting regression postmaster Regression postmaster is running - PID=125239393 PGPORT=65432 === Creating regression database... NOTICE: mdopen: couldn't open /BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr ess/tmp_check/data/pg_log: No such file or directory NOTICE: mdopen: couldn't open /BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr ess/tmp_check/data/pg_log: No such file or directory psql: FATAL 1: cannot open relation pg_log createdb: database creation failed createdb failed make: *** [runcheck] Error 1 -- [EMAIL PROTECTED] Remainder of .sig suppressed to conserve scarce California electrons... ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace
I tried this on HPUX 10.20, which has not only O_SYNC but also O_DSYNC (defined to do the equivalent of fdatasync()), and got truly fascinating results. Apparently, on this platform these flags change the kernel's buffering behavior! Observe: Solaris 2.6 fascinates even more!!! $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC tfsync.c $ time a.out real0m21.40s user0m0.02s sys 0m0.60s bash-2.02# gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC tfsync.c bash-2.02# time a.out real0m4.242s user0m0.000s sys 0m0.450s It's hard to believe... Writing with DSYNC takes the same time as file initialization - ~2 sec. Also, there is no difference if using 64k blocks. INIT_WRITE + OSYNC gives 52 sec for 8k blocks and 5.7 sec for 256k ones, but INIT_WRITE + DSYNC doesn't depend on block size. Modern IDE drive? -:)) Probably we should change code to use O_DSYNC if defined even without changing XLogWrite to write more than 1 block at once (if requested)? As for O_SYNC: bash-2.02# gcc -Wall -O -DINIT_WRITE tfsync.c bash-2.02# time a.out real0m54.786s user0m0.010s sys 0m10.820s bash-2.02# gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC tfsync.c bash-2.02# time a.out real0m52.406s user0m0.020s sys 0m0.650s Not big win. Solaris has more optimized search for dirty blocks than Tom' HP and Andreas' platform? Vadim ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from out-of -dis k-sp ace
$ gcc -Wall -O -DINIT_WRITE tfsync.c $ time a.out real1m15.11s user0m0.04s sys 0m32.76s Note the large amount of system time here, and the fact that the extra time in INIT_WRITE is all system time. I have previously observed that fsync() on HPUX 10.20 appears to iterate through every kernel disk buffer belonging to the file, presumably checking their dirtybits one by one. The INIT_WRITE form loses because each fsync in the second loop has to iterate through a full 16Mb worth of buffers, whereas without INIT_WRITE there will only be as many buffers as the amount of file we've filled so far. (On this platform, it'd probably be a win to use log segments smaller than 16Mb...) It's interesting that there's no visible I/O cost here for the extra write pass --- the extra I/O must be completely overlapped with the extra system time. Tom, could you run this test for different block sizes? Up to 32*8k? Just curious when you get something close to $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC tfsync.c $ time a.out real0m21.40s user0m0.02s sys 0m0.60s Vadim ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] porting question: funky uid names?
Mark Bixby wrote: It seems that plpgsql.sl didn't get built. Might be an autoconf issue, since quite frequently config scripts don't know about shared libraries on MPE. I will investigate this further. Ah. I found src/Makefile.shlib and added the appropriate stuff. Woohoo! We have test output! The regression README was clear about how some platform dependent errors can be expected, and how to code for these differences in the expected outputs. Now I'm off to examine the individual failures MULTIBYTE=;export MULTIBYTE; \ /bin/sh ./run_check.sh hppa1.0-hp-mpeix === Removing old ./tmp_check directory ... === Create ./tmp_check directory === Installing new build into ./tmp_check === Initializing check database instance === Starting regression postmaster Regression postmaster is running - PID=125042790 PGPORT=65432 === Creating regression database... CREATE DATABASE === Installing PL/pgSQL... === Running regression queries... parallel group1 (12 tests) ... boolean text name oid float4 varchar char int4 int2 float8 int8 nume ric test boolean ... ok test char ... ok test name ... ok test varchar ... ok test text ... ok test int2 ... ok test int4 ... ok test int8 ... ok test oid ... ok test float4 ... ok test float8 ... FAILED test numeric ... ok sequential test strings ... ok sequential test numerology ... ok parallel group2 (15 tests) ... comments path polygon lseg point box reltime interval tinterval circle inet timestamp type_sanity opr_sanity oidjoins test point... ok test lseg ... ok test box ... ok test path ... ok test polygon ... ok test circle ... ok test interval ... FAILED test timestamp... FAILED test reltime ... ok test tinterval... ok test inet ... ok test comments ... ok test oidjoins ... ok test type_sanity ... ok test opr_sanity ... ok sequential test abstime ... ok sequential test geometry ... FAILED sequential test horology ... FAILED sequential test create_function_1... ok sequential test create_type ... ok sequential test create_table ... ok sequential test create_function_2... ok sequential test copy ... ok parallel group3 (6 tests)... create_aggregate create_operator triggers constraints create_misc create_i ndex test constraints ... ok test triggers ... ok test create_misc ... ok test create_aggregate ... ok test create_operator ... ok test create_index ... ok sequential test create_view ... ok sequential test sanity_check ... ok sequential test errors ... ok sequential test select ... ok parallel group4 (16 tests) ... arrays union select_having transactions portals join select_implicit sel ect_distinct_on subselect case random select_distinct select_into aggregat es hash_index btree_index test select_into ... ok test select_distinct ... ok test select_distinct_on ... ok test select_implicit ... ok test select_having... ok test subselect... ok test union... ok test case ... ok test join ... ok test aggregates ... ok test transactions ... ok test random ... ok test portals ... ok test arrays ... ok test btree_index ... ok test hash_index ... ok sequential test misc ... ok parallel group5 (5 tests)... portals_p2 foreign_key rules alter_table select_views test select_views ... ok test alter_table ... ok
Re: [HACKERS] Performance monitor
In article [EMAIL PROTECTED], "Bruce Momjian" [EMAIL PROTECTED] wrote: The problem I see with the shared memory idea is that some of the information needed may be quite large. For example, query strings can be very long. Do we just allocate 512 bytes and clip off the rest. And as I add more info, I need more shared memory per backend. I just liked the file system dump solution because I could modify it pretty easily, and because the info only appears when you click on the process, it doesn't happen often. Of course, if we start getting the full display partly from each backend, we will have to use shared memory. Long-term, perhaps a monitor server (like Sybase ASE uses) might be a reasonable approach. That way, only one process (and a well- regulated one at that) would be accessing the shared memory, which should make it safer and have less of an impact performance-wise if semaphores are needed to regulate access to the various regions of shared memory. Then, 1-N clients may access the monitor server to get performance data w/o impacting the backends. Gordon. -- It doesn't get any easier, you just go faster. -- Greg LeMond ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Interesting failure mode for initdb
Assume a configuration problem that causes standalone backends to fail without doing anything. (I happened across this by tweaking global.bki in such a way that the superuser name entered into pg_shadow was different from what getpwname returns. I don't have a real-world example, but I'm sure there are some.) Unless the failure is so bad as to provoke a coredump, the backend will print a FATAL error message and then exit with exit status 0, because that's what it's supposed to do under the postmaster. Unfortunately, given the exit status 0, initdb doesn't notice anything wrong. And since initdb carefully stuffs ALL stdout and stderr output from its standalone-backend calls into /dev/null, the user will never notice anything wrong either, unless he's attuned enough to realize that initdb should've taken longer. I think one part of the fix is to modify elog() so that a FATAL exit results in exit status 1, not 0, if not IsUnderPostmaster. But this will not help the user of initdb, who will still have no clue why the initdb is failing, even if he turns on debug output from initdb. I tried modifying initdb along the lines of removing "-o /dev/null" from PGSQL_OPT, and then writing (eg) echo "CREATE TRIGGER pg_sync_pg_pwd AFTER INSERT OR UPDATE OR DELETE ON pg_shadow" \ "FOR EACH ROW EXECUTE PROCEDURE update_pg_pwd()" \ | "$PGPATH"/postgres $PGSQL_OPT template1 21 /dev/null \ | grep -v ^DEBUG || exit_nicely so that all non-DEBUG messages from the standalone backend would appear in initdb's output. However, this does not work because then the || tests the exit status of grep, not postgres. I don't think (postgres || exit_nicely) | grep would work either --- the exit will occur in a subprocess. At the very least we should hack initdb so that --debug removes "-o /dev/null" from PGSQL_OPT, but can you see any way to provide filtered stderr output from the backend in the normal mode of operation? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] porting question: funky uid names?
Mark Bixby [EMAIL PROTECTED] writes: So why is there a backend/global1.bki.source *and* a backend/catalog/global1.bki.source? You don't want to know ;-) ... it's all cleaned up for 7.1 anyway. I think in 7.0 you have to run make install in src/backend to get the .bki files installed. But now runcheck dies during the install of PL/pgSQL, with createlang complaining about a missing lib/plpgsql.sl. I did do an MPE implementation of dynloader.c, but I was under the dim impression this was only used for user-added functions, not core functionality. Am I mistaken? Are you dynaloading core functionality too? No, but the regress tests try to test plpgsql too ... you should be able to dike out the createlang call and have all tests except the plpgsql regress test work. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Tom, could you run this test for different block sizes? Up to 32*8k? You mean changing the amount written per write(), while holding the total file size constant, right? Yes. Currently XLogWrite writes 8k blocks one by one. From what I've seen on Solaris we can use O_DSYNC there without changing XLogWrite to write() more than 1 block (if 1 block is available for writing). But on other platforms write(BLOCKS_TO_WRITE * 8k) + fsync() probably will be faster than BLOCKS_TO_WRITE * write(8k) (for file opened with O_DSYNC) if BLOCKS_TO_WRITE 1. I just wonder with what BLOCKS_TO_WRITE we'll see same times for both approaches. Okay, I changed the program to char zbuffer[8192 * BLOCKS]; (all else the same) and on HPUX 10.20 I get $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c $ time a.out real1m18.48s user0m0.04s sys 0m34.69s $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=4 tfsync.c $ time a.out real0m35.10s user0m0.01s sys 0m9.08s $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=8 tfsync.c $ time a.out real0m29.75s user0m0.01s sys 0m5.23s $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=32 tfsync.c $ time a.out real0m22.77s user0m0.01s sys 0m1.80s $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=64 tfsync.c $ time a.out real0m22.08s user0m0.01s sys 0m1.25s $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=1 tfsync.c $ time a.out real0m20.64s user0m0.02s sys 0m0.67s $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=4 tfsync.c $ time a.out real0m20.72s user0m0.01s sys 0m0.57s $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=32 tfsync.c $ time a.out real0m20.59s user0m0.01s sys 0m0.61s $ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=64 tfsync.c $ time a.out real0m20.86s user0m0.01s sys 0m0.69s So I also see that there is no benefit to writing more than one block at a time with ODSYNC. And even at half a meg per write, DSYNC is slower than ODSYNC with 8K per write! Note the fairly high system-time consumption for DSYNC, too. I think this is not so much a matter of a really good ODSYNC implementation, as a really bad DSYNC one ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c ^^^ You should use -DUSE_OSYNC to test O_SYNC. So you've tested N * write() + fsync(), exactly what I've asked -:) So I also see that there is no benefit to writing more than one block at a time with ODSYNC. And even at half a meg per write, DSYNC is slower than ODSYNC with 8K per write! Note the fairly high system-time consumption for DSYNC, too. I think this is not so much a matter of a really good ODSYNC implementation, as a really bad DSYNC one ... So seems we can use O_DSYNC without losing log write performance comparing with write() + fsync. Though, we didn't tested write() + fdatasync() yet... Vadim ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
More numbers, these from a Powerbook G3 laptop running Linux 2.2: [tgl@g3 tmp]$ uname -a Linux g3 2.2.18-4hpmac #1 Thu Dec 21 15:16:15 MST 2000 ppc unknown [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m32.418s user0m0.020s sys 0m14.020s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=4 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m10.894s user0m0.000s sys 0m4.030s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=8 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m7.211s user0m0.000s sys 0m2.200s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=32 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m4.441s user0m0.020s sys 0m0.870s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=64 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m4.488s user0m0.000s sys 0m0.640s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=1 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m3.725s user0m0.000s sys 0m0.310s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=4 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m3.785s user0m0.000s sys 0m0.290s [tgl@g3 tmp]$ gcc -Wall -O -DINIT_WRITE -DUSE_ODSYNC -DBLOCKS=64 tfsync.c [tgl@g3 tmp]$ time ./a.out real0m3.753s user0m0.010s sys 0m0.300s Starting to look like we should just use ODSYNC where available, and forget about dumping more per write ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
RE: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
Starting to look like we should just use ODSYNC where available, and forget about dumping more per write ... I'll run these tests on RedHat 7.0 tomorrow. Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
"Mikheev, Vadim" [EMAIL PROTECTED] writes: $ gcc -Wall -O -DINIT_WRITE -DUSE_DSYNC -DBLOCKS=1 tfsync.c ^^^ You should use -DUSE_OSYNC to test O_SYNC. Ooops ... let's hear it for cut-and-paste, and for sharp-eyed readers! Just for completeness, here are the results for O_SYNC: $ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=1 tfsync.c $ time a.out real0m43.44s user0m0.02s sys 0m0.74s $ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=4 tfsync.c $ time a.out real0m26.38s user0m0.01s sys 0m0.59s $ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=8 tfsync.c $ time a.out real0m23.86s user0m0.01s sys 0m0.59s $ gcc -Wall -O -DINIT_WRITE -DUSE_OSYNC -DBLOCKS=64 tfsync.c $ time a.out real0m22.93s user0m0.01s sys 0m0.66s Better than fsync(), but still not up to O_DSYNC. So seems we can use O_DSYNC without losing log write performance comparing with write() + fsync. Though, we didn't tested write() + fdatasync() yet... Good point, we should check fdatasync() too --- although I have no machines where it's different from fsync(). regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: AW: AW: AW: [HACKERS] WAL does not recover gracefully from ou t-of -dis k-sp ace
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Ok, I've made changes in xlog.c and run tests: Could you send me your diffs? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]