date:20030212

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Daniel Kalchev

scott.marlowe said:
  On 11 Feb 2003, Greg Copeland wrote:
   Besides, I'm not sure that it makes sense to let other product needs
   dictate the default configurations for this one.  It would be one thing
   if the vast majority of people only used PostgreSQL with Apache.  I know
   I'm using it in environments in which no way relate to the web.  I'm
   thinking I'm not alone.
[...]
  You don't have to be using apache to need more than 32 simo connections.  
  Heck, how many postgresql databases do you figure are in production with 
 that setting still in there?  My guess is not many.
  

I would second this. One of my larger PostgreSQL applications uses Apache, 
although it's not typical web server. Apache is restricted to particular 
number of processes and it rarely uses too many backends (but one should 
consider the possible N databases x M apache processes when using persistent 
database connections).

The main connection load on that system however comes from lots of scripts 
that run asynchronously and access the same data (collect, analyze, archive 
data). Even if database access is serialized as much as possible (at the cost 
of performance or wall clock time waste), this load represents sometimes 
hunderts of backends.

My opinion too is that increasing the number of connections will benefit more 
the first-time experience in usability, rather than in performance boost.

The main trouble is, that more connections require not only more semaphores, 
but also more shared memory.

If we are toying with the 'performance' idea, we should definitely increase 
the sort memory default as well :-) ... and this means LOTS of memory for many 
processes.

Is it possible to have some useful connections/memory usage statistics - run 
this code on different installations and collect sufficient data to make 
better choice.

Daniel


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Vatamanescu Victor

Hi all!
Things are very unclear to me. Are we in a personalwar with Microsoft and I'm not aware? 
I don't really much care what's the OS our product is running on. I care much about our product's high availability, speed, scalability etc. In the last month I saw on this list a lot of opinions regarding the differences between various operating systems. I havent saw opinions regarding the functionalities exposed by our direct competitors: other dmbs. Do we want to transform PostgreSQL in the next generation's OS and I am not aware?
I can guarantee you good software can be written on Windows too. I can guarantee you that moving PostgreSQL on Windows is the best move PostgreSQL. I can guarantee you a MMC Snap-In for PostgreSQL and a Visual Studio .NET add-in for PostgreSQL can help PostgreSQL becomming a leader.
But discussions like "Windows is a B category platform" and "Windows will die tommorow" and "Linux is the best" won't lead us on a top position.Do you Yahoo!?
Yahoo! Shopping - Send Flowers for Valentine's Day

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Kevin Brown

Tom Lane wrote:
 Kevin Brown [EMAIL PROTECTED] writes:
  I assume $PGDATA was around long before GUC?
 
 Yes, it was.  But I have not yet seen an argument here that justifies
 why $SOMECONFIGDIRECTORY/postgresql.conf is better than
 $PGDATA/postgresql.conf.  

Okay, here's one: most Unix systems store all of the configuration
files in a well known directory: /etc.  These days it's a hierarchy of
directories with /etc as the root of the hierarchy.  When an
administrator is looking for configuration files, the first place he's
going to look is in /etc and its subdirectories.  After that, he's
forced to look through the startup scripts to figure out where things
are located.  And if those aren't revealing, then he has to read
manpages and hope they're actually useful.  :-)  And if that doesn't
work, then he has to resort to tricks like doing strings on the
binaries (he doesn't necessarily have access to the sources that the
binaries were compiled from, which is all that matters here).

 The latter keeps all the related files together.  The former seems
 only to introduce unnecessary complexity.

Well, I'd say it's unnecessary only when you already know where the
data files are located -- which is true when you're a developer or
someone who is already familiar with the installation you're working
with.  But if you're just getting started and installed it from a
package like an RPM file, then you have to look in the package to see
where it created the data file areas, or look at the startup scripts,
etc.

 You can only justify it as simpler if you propose hardwiring a value
 for $SOMECONFIGDIRECTORY ...

Making things simpler from the standpoint of the code isn't the point.
Making things simpler for the DBA and/or Unix sysadmin is.

I'd say $SOMECONFIGDIRECTORY should be a hardwired default with a
command line override.

I doubt you'll get a whole lot of argument from the general user
community if you say that the hard wired default should be
/etc/postgresql.

 which is a proposal that will not fly with any of the core
 developers, because we all run multiple versions of Postgres on our
 machines so that we can deal with back-version bug reports, test
 installations, etc.

I absolutely agree that the config directory to use should be
something that can be controlled with a command line option.

 It is unlikely to fly with any of the RPM packagers either, due to
 the wildly varying ideas out there about the One True Place where
 applications should put their config files.

There seems to be substantial agreement among the distribution
maintainers that config files belong somewhere in /etc.  At least,
I've seen very little disagreement with that idea except from people
who believe that each package should have its own, separate directory
hierarchy.  And the fact that the vast majority of packages put their
config files somewhere in /etc supports this.

Debian, for instance, actually *does* put the PostgreSQL config files
in /etc/postgresql and creates symlinks in the data directory that
point to them.  This works, but it's a kludge.

There are highly practical reasons for putting all the config files
under /etc, not the least of which is that it makes backing up files
that are *very* likely to change from the default, and which are also
highly critical to the operation of the system, very easy.

You'll get A LOT more disagreement about where to put data files than
config files, as standards go.  And in the case of PostgreSQL, where
you put your data files is especially important for performance
reasons, so it therefore makes even less sense to put the config files
in the same location: it means that the config files could literally
be anywhere, and any administrator who is unfamiliar with the system
will have to dig through startup scripts (or worse!) to figure it out.


Oh, here's another reason $SOMECONFIGDIRECTORY is better than $PGDATA:
it allows much more appropriate separation of concern by default.

Most installations of PostgreSQL start the database from a startup
script that's run at boot time.  With $PGDATA, changing the target
data directory requires changing the startup script, which requires
root access to the system -- if it didn't require root access then the
entire system is open to the possibility of a world of hurt because
the DBA isn't necessarily the same guy as the Unix sysadmin and
therefore doesn't necessarily know his way around shell scripts in
general, and rc scripts in particular, the way the Unix admin will.
The possibility of hurt comes from the fact that the rc script runs at
root, at a time that the system is hardest to work with in the event
of a failure (many systems haven't even put up any console login
prompts and may not have even started any remote login facilities
before the PostgreSQL startup script runs).  A sufficiently bad
screwup on the part of the DBA with that kind of setup will require
the Unix sysadmin to go to single user mode or worse to fix it.  So
unless the

Re: [HACKERS] Wrong charset mappings

2003-02-12 Thread Tatsuo Ishii

I think the problem you see is due to the the mapping table changes
between 7.2 and 7.3. It seems there are more changes other than
u301c. Moreover according to the recent discussion in Japanese local
mailing list, 7.3's JDBC driver now relies on the encoding conversion
performed by the backend. ie. The driver issues set client_encoding =
'UNICODE'. This problem is very complex and I need time to find good
solution. I don't think simply backout the changes to the mapping
table solves the problem.

 Hi all,
 
 One Japanese character has been causing my head to swim lately. I've
 finally tracked down the problem to both Java 1.3 and Postgresql.
 
 The problem character is namely:
 utf-16: 0x301C
 utf-8: 0xE3809C
 SJIS: 0x8160
 EUC_JP: 0xA1C1
 Otherwise known as the WAVE DASH character.
 
 The confusion stems from a very similar character 0xFF5E (utf-16) or
 0xEFBD9E (utf-8) the FULLWIDTH TILDE.
 
 Java has just lately (1.4.1) finally fixed their mappings so that 0x301C
 maps correctly to both the correct SJIS and EUC-JP character. Previously
 (at least in 1.3.1) they mapped SJIS to 0xFF5E and EUC to 0x301C,
 causing all sorts of trouble.
 
 Postgresql at least picked one of the two characters namely 0xFF5E, so
 conversions in and out of the database to/from sjis/euc seemed to be
 working. Problem is when you try to view utf-8 from the database or if
 you read the data into java (utf-16) and try converting to euc or sjis
 from there.
 
 Anyway, I think postgresql needs to be fixed for this character. In my
 opinion what needs to be done is to change the mappings...
 
 euc-jp - utf-8- euc-jp
 ====
 0xA1C1 - 0xE3809C0xA1C1
 
 sjis   - utf-8- sjis
 ====
 0x8160 - 0xE3809C0x8160
 
 As to what to do with the current mapping of 0xEFBD9E (utf-8)? It
 probably should be removed. Maybe you could keep the mapping back to the
 sjis/euc characters to help backward compatibility though. I'm not sure
 what is the correct approach there.
 
 If anyone can tell me how to edit the mappings under:
   src/backend/utils/mb/Unicode/
 
 and rebuild postgres to use them, then I can test this out locally.

Just edit src/backend/utils/mb/Unicode/*.map and rebiuld
PostgreSQL. Probably you might want to modify utf8_to_euc_jp.map and
euc_jp_to_utf8.map.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Daniel Kalchev

Vatamanescu Victor said:
  I don't really much care what's the OS our product is running on. I care muc
 h about our product's high availability, speed, scalability etc. In the la
 st month I saw on this list a lot of opinions regarding the differences be
 tween various operating systems. I havent saw opinions regarding the funct
 ionalities exposed by our direct competitors: other dmbs. Do we want to tr
 ansform PostgreSQL in the next generation's OS and I am not aware?

PostgreSQL is what it is, which certainly is not an OS.

If PostgreSQL was 'Operating System' we wouldn't care much on what 'Operating 
System' it runs, right?

But most of the things related to performance AND functionality of PostgreSQL 
depend very much on what OS you use to build and run it on. (otherwise it 
could well contain portions of the OS much like Oracle does :)

While I agree, that (any) Windows platform may be useful for a lot of things, 
it's true that one should be wary of Windows, for things like software 
development (due to it's unstable API), platform stability (I have yet to know 
someone who didn't have to reinstall Windows every month or so), performance 
(you can hardly explain to customers, why their new personal Windows 
Workstation requires  1 GHz processor,  256 MB RAM etc while showing them 
how a moderate PostgreSQL database server that could serve their entire 
company can run on much older/slower/cheaper hardware).

There is nothing wrong in having nice Windows based GUI for accessing 
databases, but turning a graphical workstation into database server is 
something that has never worked in the computer history (the opposite has 
happened from time to time).

Windows has lost the game when Microsoft decided to abandon support for 
non-Intel CPUs in Windows NT (VMS actually).

  I can guarantee you good software can be written on Windows too. I can guara
 ntee you that moving PostgreSQL on Windows is the best move PostgreSQL. I 
 can guarantee you a MMC Snap-In for PostgreSQL and a Visual Studio .NET ad
 d-in for PostgreSQL can help PostgreSQL becomming a leader.

While enough was said in this list regarding the Windows port - nothing could 
stop anyone to port an open-source database to whatever platform, including 
Windows. The better the platform knowledge of the porting team, the better the 
result. But 'moving' PostgreSQL to Windows won't happen.

  But discussions like Windows is a B category platform and Windows will di
 e tommorow and Linux is the best won't lead us on a top position.

My personal favorite is BSD/OS and for what it matters, Windows cannot solve the kind 
of tasks, that BSD/OS solves for me. period.

Daniel

PS: I probably sound too anti-Microsoft biased, which is not true. I just don't care 
about Microsoft - I use Windows from time to time when someone sends me 
$@^#$@*#*$-Microsoft-Office formatted document that will not open anywhere else. 
Windows also makes great launcher for my father's Heroes III game. ;-)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Contract Programmer Advice.

2003-02-12 Thread Lamar Owen

On Tuesday 11 February 2003 20:56, Lamar Owen wrote:
 Being that this group of hackers is one I trust, and that this is a pretty
 common scenario for contract programming, I thought I'd ask this group a
 question.  I hope you don't mind.

I want to thank everyone for their responses.  We will see where things go 
from here.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Vatamanescu Victor

Well, I havent seen much that unstable API. If you saw something unstable please provide me source code that proves Windows API is unstable. Don't tell me about some "expert"'s oppinion: if you have a problem with Windows show it to me. We are not users here. I agree that Windows 95/98 was a sh..., but I have news for you: Windows is becomming stronger every day. In 1998-1999 we could talk about the dying Microsoft -they were null on the server platform and the growth of Linux seemedunlimited.Can you tell me that the situation is the same? Have you carrefully tested Windows 2003?
About reinstalling: at home I have a Pentium III 1 ghz workstation. OS: Windows XP. I have installed on it Visual Studio .NET, Visual Studio 6, Microsoft SQL Server, IBM DB2, Cygwin, IIS, MSMQ etc etc(you got the picture). I installed it about 9 months ago (when I bought the hdd) and I dont have any trouble with it, I hadn't to reinstall it after a month...
By the way, DB2 has no problem working on Windows. I havent heard IBM calling Windows a "B category platform", on the contrary...Do you Yahoo!?
Yahoo! Shopping - Send Flowers for Valentine's Day

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Daniel Kalchev

Vatamanescu Victor said:
  Well, I havent seen much that unstable API. If you saw something unstable pl
 ease provide me source code that proves Windows API is unstable. Don't tel
 l me about some expert's oppinion: if you have a problem with Windows sh
 ow it to me. We are not users here.

Really? We ARE users of Windows here. Nobody has ever bothered to tell US how 
this thing works, why it works and what can we, users do when it doesn't work.

I am talking about the Windows API here, with it's numerous variants and 
'Service Packs'. I have learned my lesson on Windows: use only default 
configurations, anything else will bite you badly some day.

(this is not to say that you can't get the same effect on other platforms, 
just that on most 'commercial' UNIX platforms some things are very detailed 
documented - sometimes too much in detail)

 I agree that Windows 95/98 was a sh...
 , but I have news for you: Windows is becomming stronger every day. In 199
 8-1999 we could talk about the dying Microsoft - they were null on the ser
 ver platform and the growth of Linux seemed unlimited. Can you tell me tha
 t the situation is the same? Have you carrefully tested Windows 2003?

You are wrong about Microsoft here. They were null on the server platform, say 
at 1985, when their wonderful product Microsoft BASIC was my favorite on the 
Apple ][. Then IBM gave them the IBM PC to write an 'OS' for it.

I don't claim to be an expert on Windows, although I have actually 'tested' 
(both from the user's perspective and usability as 'platform') every Windows 
release since 1.03.

  About reinstalling: at home I have a Pentium III 1 ghz workstation. OS: Wind
 ows XP. I have installed on it Visual Studio .NET, Visual Studio 6, Micros
 oft SQL Server, IBM DB2, Cygwin, IIS, MSMQ etc etc(you got the picture). I
  installed it about 9 months ago (when I bought the hdd) and I dont have a
 ny trouble with it, I hadn't to reinstall it after a  month...

But rebooted it how many times?

I have never ever reinstalled a UNIX system. Some run for years (yes, that is 
some N x 365 days) without being reboot or powered down. They usually stop 
when that same flaky PC hardware breaks, and I have to build new system then 
anyway.

  By the way, DB2 has no problem working on Windows. I havent heard IBM callin
 g Windows a B category platform, on the contrary...

I was trying to avoid this part of the discussion... mostly because I don't 
believe PostgreSQL is yet able to compete with the 'big' database systems. 
Here I mean BIG database systems. One of the reasons in my opinion is that 
those guys that run their data on big iron just can't explain why they will 
spend millions of dollars a year on OS licenses and still want to install an 
open-source database there.

It seems you got my opinion wrong: I don't see anything wrong with running 
PostgreSQL on Windows. In fact, I have been considering such port some time 
ago (or alternatively a 'DOS' port :), because this allows greater flexibility 
in certain cases. However, moving main development effort on Windows, just to 
satisfy the Windows lovers' ego would be catastrophic and simply will not 
happen to PostgreSQL. One of the reasons this will not happen is because 
PostgreSQL has already grown, has become quality software and many people 
already know how it runs on the SAME Intel hardware on which Windows runs.

The difference with DB2 that runs on IBM other platforms is significant for 
the same reason. When it fails at your data set on the Windows platform, IBM 
can always tell you but,... this is because of the cheap and flaky PC 
hardware - here is what this wonderful software is designed to run on..




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Merlin Moncure



I think there should be a special mailing list set up called
pg-sql-win32-advocacy where people can continually harass the postgres
dev team and debate the merits of the win32 operating system.  In
particular, make sure to insult them for being elitists (while at the
same time asking them to work on software which you have not paid for).

We can also rout all the questions about how to set up cygin and the
ipc-daemon over there too.

Merlin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Windows SHMMAX (was: Default configuration)

2003-02-12 Thread Merlin Moncure

 
  Another way of looking at it is memory mapped files.  This probably
most
  closely resembles unix shared memory and is the de facto standard
way
  for interprocess memory block sharing.  Sadly, performance will
suffer
  because you have to rely on the virtual memory system (think:
writing to
  files) to do a lot of stupid stuff you don't necessarily want or
need.
 
 To the contrary, for the majority of the shared memory usage of
 postgres, which is cached file data, the virtual memory system is
doing
 exactly what you want it to: managing the movement of data between
 memory and disk, and caching the more frequently accessed data to
reduce
 the chances you will actually need to access the disk for it.

Yes.  Generally, I was trying to point out the disadvantages of memory
mapped files compared to shared memory.  In windows, there is no direct
equivalent so shared memory.  MMFs are very similar in usage.  I suspect
they might not perform quite as well as the shared memory functions.
For example, if used in place of shared memory to cache static file
data, you are maintaining:
1. the file itself,
2. the file cache handled by the os.
3. the MMF memory side cache (following a page fault).
4. the virtual memory space set aside for the os to swap it out should
the os need more memory.

MMFs are efficient when memory allocations are relatively static: they
work especially well with a freestore memory allocation system (this
minimizes movement inside the virtual memory pagefile).  For example,
the MMF is allocated at the startup of the backend and doled out to
processes through an internal 'as needed' basis.  This is equivalent in
function to memory allocations using the VirtualAlloc() family except
its good for IPC.  (IMHO, it will still run slower).

If memory allocations are frequent and dynamic, you start to run into
problems with fragmentation of the pagefile and such problems.  This is
very undesirable.  Also, if memory allocations are large, you could
potentially run into the worst possible scenario: your file cache system
is competing with the virtual memory system.  This will cause the server
to thrash.

One workaround for that is to set up the files for sequential access:
this minimizes os caching of files.  This also more or less removes
'double dipping' into the memory system to cache your static file data.
The down side is that the work of maintaining an intelligent file cache
has been offloaded from the OS to you, the programmer.  I am not
experienced enough with the postgres memory allocation system to say how
well this would work for PostgreSQL.


 
 For shared memory used only for IPC, typically a VM system treats it
no
 differently from any other non-shared memory, so if it's doing
something
 you don't want or need (a proposition I quite heartily disagree
with),
 it's going to be doing that very every piece of memory your
application
 allocates and uses, shared or not.
 
  The OS has to guarantee that the memory can be swapped out to file
at
  any time and therefore mirrors the pagefile to the allocated memory
  blocks.
 
 The OS does not need to write the pagefile. On modern Unix systems
that
 are not allowing overcommit, the space will be allocated but never
 written unless there's a need to free up some physical memory, and the
 pages in question are used infrequently enough that the system decides
 that they are good candidates to be paged out. I would imagine that
 Windows does the same.

In windows, things are backwards: the space is allocated in virtual
memory *first* (i.e. the page file), then following a page fault it gets
swapped into memory.  The overhead I spoke of was related to the fact
the windows always has to ensure space exists in the page file (or some
user defined file) to swap the file back out.  IMHO, *nix has a much
superior approach to IPC in this context.  It's much simpler and very
straightforward.

It also exlains why in windows, most server apps are multi threaded, not
multi process.  I agree with you on most salient points.  The question
is: are MMFs the proper analog of SHHMEM on native port of postgres?  My
answer to that question is: it is by no means certain, but what else is
there to use?

Merlin

 
 cjs
 --
 Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974
http://www.netbsd.org
 Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Robert Treat

On Wed, 2003-02-12 at 08:24, Kevin Brown wrote:
 Tom Lane wrote:
  You can only justify it as simpler if you propose hardwiring a value
  for $SOMECONFIGDIRECTORY ...
 
 Making things simpler from the standpoint of the code isn't the point.
 Making things simpler for the DBA and/or Unix sysadmin is.
 
 I'd say $SOMECONFIGDIRECTORY should be a hardwired default with a
 command line override.
 
 I doubt you'll get a whole lot of argument from the general user
 community if you say that the hard wired default should be
 /etc/postgresql.
 
  which is a proposal that will not fly with any of the core
  developers, because we all run multiple versions of Postgres on our
  machines so that we can deal with back-version bug reports, test
  installations, etc.
 
 I absolutely agree that the config directory to use should be
 something that can be controlled with a command line option.
 
  It is unlikely to fly with any of the RPM packagers either, due to
  the wildly varying ideas out there about the One True Place where
  applications should put their config files.
 
 There seems to be substantial agreement among the distribution
 maintainers that config files belong somewhere in /etc.  At least,
 I've seen very little disagreement with that idea except from people
 who believe that each package should have its own, separate directory
 hierarchy.  And the fact that the vast majority of packages put their
 config files somewhere in /etc supports this.
 
 Debian, for instance, actually *does* put the PostgreSQL config files
 in /etc/postgresql and creates symlinks in the data directory that
 point to them.  This works, but it's a kludge.
 

Seems like a good compromise would be to make the hard wired default
$SOMECONFIGDIRECTORY be $PGDATA; this makes each version of the software
more self contained/ less likely to interfere with another installation.
(This becomes really handy when doing major upgrades). If you really
have a strong desire to change this, you can.

As I see it, this change would (should?) need to be something that could
be changed in the configure script when building postgresql, as well
changeable via a command line option, any other places?

Robert Treat



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Options for growth

2003-02-12 Thread GB Clark

On Thu, 23 Jan 2003 11:19:36 -0700 (MST)
scott.marlowe [EMAIL PROTECTED] wrote:

 On 23 Jan 2003, Hannu Krosing wrote:
 
  Curt Sampson kirjutas N, 23.01.2003 kell 17:42:
   If the OS can handle the scheduling (which, last I checked, Linux couldn't,
  
  When did you do your checking ? 
  (just curious, not to start a flame war ;)
  
at least not without patches), eight or sixteen
   CPUs will be fine.
 
 Yeah, take a look here:
 
 http://www.sgi.com/servers/altix/
 
 64 CPUs seems scalable enough for me.  :-)  When can we expect BSD to run 
 on this system and use all 64 CPUs efficiently?
 

I think FreeBSD 5.[1|2] will be able to.  That was the entire reason for SMPng and
KSE.  There is not too much of the kernel left untouched from the 4.0 split.

As far as NetBSD or OpenBSD goes, I would not expect it too soon...

GB

-- 
GB Clark II | Roaming FreeBSD Admin
[EMAIL PROTECTED] | General Geek 
   CTHULU for President - Why choose the lesser of two evils?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

[HACKERS] Q about InsertIndexResult

2003-02-12 Thread Teodor Sigaev

Hi!

Insert function of any type of index must returns palloced InsertIndexResult 
(contains blkno and offset).
This result is returned by index_insert (backend/access/indexam/indexam.c).
This function is called in 3 places:
./access/heap/tuptoaster.c
./catalog/indexing.c
./executor/execUtils.c

But in all this places returned value doesn't used, just pfree'ed (except 
./access/heap/tuptoaster.c
where it's checked with NULL).

Q: why? why isn't it a bool value?
Is there some idea which I havn't see?

--
Teodor Sigaev
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Jean-Michel POURE

Le Mercredi 12 Février 2003 15:49, Merlin Moncure a écrit :
 I think there should be a special mailing list set up called
 pg-sql-win32-advocacy where people can continually harass the postgres
 dev team and debate the merits of the win32 operating system.  

I realize my views about PostgreSQL are not shared by anyone, otherwize in a 
few hours time, I would have received several emails saying I have the same 
feeling, etc It seems like everyone is looking for something different, 
which ultimately turns out to become ... PostgreSQL.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Q about InsertIndexResult

2003-02-12 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes:
 [ why do we have InsertIndexResult and not just a bool? ]

Good question.  Perhaps it was used once upon a time?  That API has been
like that since Berkeley days.

I can't offhand see a good reason to return the index tuple's tid.
There isn't any legitimate reason for anything outside the index AM
to be doing anything directly with the index tuple.

I dunno if it's worth the trouble to change it just to save one palloc
per insert, though.  If we ever decided that there was some other piece
of information that the index AM should return, we'd have to change
right back to returning a struct...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Changing the default configuration

2003-02-12 Thread Robert Treat

On Tue, 2003-02-11 at 21:00, Tatsuo Ishii wrote:
  
  while 200 may seem high, 32 definitely seems low.  So, what IS a good 
  compromise?  for this and ALL the other settings that should probably be a 
  bit higher.  I'm guessing sort_mem or 4 or 8 meg hits the knee for most 
  folks, and the max fsm settings tom has suggested make sense.
 
 32 is not too low if the kernel file descriptors is not
 increased. Beware that running out of the kernel file descriptors is a
 serious problem for the entire system, not only for PostgreSQL.
 

Had this happen at a previous employer, and it definitely is bad. I
believe we had to do a reboot to clear it up. And we saw the problem a
couple of times since the sys admin wasn't able to deduce what had
happened the first time we got it. IIRC the problem hit somewhere around
150 connections, so we ran with 128 max. I think this is a safe number
on most servers these days (running linux as least) though out of the
box I might be more inclined to limit it to 64. If you do hit a file
descriptor problem, *you are hosed*.

Robert Treat



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-12 Thread Andrew Sullivan

On Tue, Feb 11, 2003 at 05:25:29PM -0700, Rick Gigger wrote:

 The type of person who can't configure it or doesnt' think to try is
 probably not doing a project that requires any serious performance.

I have piles of email, have fielded thousands of phone calls, and
have had many conversations which prove that claim false.  People
think that computers are magic.  That they don't think the machines
require a little bit of attention is nowise an indication that they
don't need the system to come with reasonable defaults.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Merlin Moncure

Well said
I was just trying to be funny.  Since we are all programmers here, I'll try and 
re-express my thought in a (somewhat) portable language:

char HackerThought[] = Maybe there really is something to this windows business.  
Perhaps its time to consider a port.
int iAnnoyanceLevel = 0;

PerformDailyRoutine();

PerformDailyRoutine ()
{
char* EmailMessage; 
Think(HackerThought);
while (EmailMessage = GetEmailMessage(HACKER_LIST))
{
if (!strcmp(EmailMessage, Windows is an unstable, crappy OS)) 
iAnnoyanceLevel++;
if (!strcmp(EmailMessage, Windows is better than that pile of trash, 
linux)) iAnnoyanceLevel += 2;
if (!strcmp(EmailMessage, Can anybody here tell me how to install the 
IPC-Daemon as a service on my win98 machine?)) iAnnoyanceLevel += 100;
}
}

Really, I'm new here, and I have not business telling anybody anything, anyways :)
Merlin


 -Original Message-
 From: Jean-Michel POURE [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 12, 2003 11:26 AM
 To: [EMAIL PROTECTED]
 Subject: Re: [HACKERS] PostgreSQL Windows port strategy
 
 Le Mercredi 12 Février 2003 15:49, Merlin Moncure a écrit :
  I think there should be a special mailing list set up called
  pg-sql-win32-advocacy where people can continually harass the postgres
  dev team and debate the merits of the win32 operating system.
 
 I realize my views about PostgreSQL are not shared by anyone, otherwize in
 a
 few hours time, I would have received several emails saying I have the
 same
 feeling, etc It seems like everyone is looking for something
 different,
 which ultimately turns out to become ... PostgreSQL.
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Rod Taylor

On Wed, 2003-02-12 at 11:39, Andrew Sullivan wrote:
 On Tue, Feb 11, 2003 at 05:25:29PM -0700, Rick Gigger wrote:
 
  The type of person who can't configure it or doesnt' think to try is
  probably not doing a project that requires any serious performance.
 
 I have piles of email, have fielded thousands of phone calls, and
 have had many conversations which prove that claim false.  People

But IBM told me computers are self healing, so if there is a performance
problem should it just fix itself?

-- 
Rod Taylor [EMAIL PROTECTED]

PGP Key: http://www.rbt.ca/rbtpub.asc



signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Changing the default configuration

2003-02-12 Thread Greg Copeland

On Wed, 2003-02-12 at 10:36, Robert Treat wrote:
 On Tue, 2003-02-11 at 21:00, Tatsuo Ishii wrote:
   
   while 200 may seem high, 32 definitely seems low.  So, what IS a good 
   compromise?  for this and ALL the other settings that should probably be a 
   bit higher.  I'm guessing sort_mem or 4 or 8 meg hits the knee for most 
   folks, and the max fsm settings tom has suggested make sense.
  
  32 is not too low if the kernel file descriptors is not
  increased. Beware that running out of the kernel file descriptors is a
  serious problem for the entire system, not only for PostgreSQL.
  
 
 Had this happen at a previous employer, and it definitely is bad. I
 believe we had to do a reboot to clear it up. And we saw the problem a
 couple of times since the sys admin wasn't able to deduce what had
 happened the first time we got it. IIRC the problem hit somewhere around
 150 connections, so we ran with 128 max. I think this is a safe number
 on most servers these days (running linux as least) though out of the
 box I might be more inclined to limit it to 64. If you do hit a file
 descriptor problem, *you are hosed*.
 

That does seem like a more reasonable upper limit.  I would rather see
people have to knowingly increase the limit rather than bump into system
upper limits and start scratching their heads trying to figure out what
the heck is going on.


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Andrew Sullivan

On Tue, Feb 11, 2003 at 05:32:58PM -0700, scott.marlowe wrote:

 So, what OSes would have a problem, if any, with boosting something 
 like max connects to 200?  What are the breaking points on other OSes?

Solaris 8 broke at 200, the last time I tried on a new box.  150
didn't break it, though.

Given the absolute bare-minimum hardware you can get Solaris to work
on, Solaris installs with totally silly defaults for all this stuff. 
I dunno why they do it that way, but anyway, anyone using Solaris
_will_ need to reconfigure the kernel (just for the default config),
so problems there aren't an argument for leaving the defaults alone.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Changing the default configuration

2003-02-12 Thread Tom Lane

Robert Treat [EMAIL PROTECTED] writes:
 Had this happen at a previous employer, and it definitely is bad. I
 believe we had to do a reboot to clear it up. And we saw the problem a
 couple of times since the sys admin wasn't able to deduce what had
 happened the first time we got it. IIRC the problem hit somewhere around
 150 connections, so we ran with 128 max. I think this is a safe number
 on most servers these days (running linux as least) though out of the
 box I might be more inclined to limit it to 64. If you do hit a file
 descriptor problem, *you are hosed*.

If you want to run lots of connections, it's a real good idea to set
max_files_per_process to positively ensure Postgres won't overflow
your kernel file table, ie, max_connections * max_files_per_process
should be less than the file table size.

Before about 7.2, we didn't have max_files_per_process, and would
naively believe whatever sysconf() told us was an okay number of files
to open.  Unfortunately, way too many kernels promise more than they
can deliver ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Q about InsertIndexResult

2003-02-12 Thread Teodor Sigaev

I can't offhand see a good reason to return the index tuple's tid.

One reason why existing interface is not good:
Who say, that for one heap tuple should exists only one index tuple?
For example, Oleg and Vadim Mikheev had discussian pair years ago about
indexing arrays by  B-tree: for each heap tuple stores one index tuple per
element of array.




There isn't any legitimate reason for anything outside the index AM
to be doing anything directly with the index tuple.
I dunno if it's worth the trouble to change it just to save one palloc
per insert, though.  If we ever decided that there was some other piece
of information that the index AM should return, we'd have to change
right back to returning a struct...

Agreed.


--
Teodor Sigaev
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Options for growth

2003-02-12 Thread scott.marlowe

On Wed, 12 Feb 2003, GB Clark wrote:

 On Thu, 23 Jan 2003 11:19:36 -0700 (MST)
 scott.marlowe [EMAIL PROTECTED] wrote:
 
  On 23 Jan 2003, Hannu Krosing wrote:
  
   Curt Sampson kirjutas N, 23.01.2003 kell 17:42:
If the OS can handle the scheduling (which, last I checked, Linux couldn't,
   
   When did you do your checking ? 
   (just curious, not to start a flame war ;)
   
 at least not without patches), eight or sixteen
CPUs will be fine.
  
  Yeah, take a look here:
  
  http://www.sgi.com/servers/altix/
  
  64 CPUs seems scalable enough for me.  :-)  When can we expect BSD to run 
  on this system and use all 64 CPUs efficiently?
  
 
 I think FreeBSD 5.[1|2] will be able to.  That was the entire reason for SMPng and
 KSE.  There is not too much of the kernel left untouched from the 4.0 split.
 
 As far as NetBSD or OpenBSD goes, I would not expect it too soon...

I just downloaded 5.0 last week and I've a pretty little dual PPro sitting 
here that needs to be ridden hard.  It has lots of spare drives and Linux 
is already on one, so this will be a nice box for playing with different 
distros and what not.

Now I just need an altix...  Even a little one would do.  Now how do I 
convince the powers that be where I work that we have a need for an 8 to 
64 way SMP monster box?


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Q about InsertIndexResult

2003-02-12 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes:
 One reason why existing interface is not good:
 Who say, that for one heap tuple should exists only one index tuple?

Good point.  Is that reason enough to change it?  Not sure.  The index
AM doesn't really have to bother with telling the truth in the
InsertIndexResult it returns ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] parse_expr.c another type produced by transformations?

2003-02-12 Thread Greg Stark


Ran into this in cvs tip checked out as of yesterday:

db=  SELECT sum(CASE WHEN (upper(substr((select 'en'::varchar),1,1)) not between 'A' 
and 'Z') THEN 1  ELSE 0 END) AS n FROM tab;
ERROR:  transformExpr: does not know how to transform node 309 (internal error)

309 is T_FuncExpr. Is this a node type that should be added as described by
the comment? Appended is the patch to add it but I don't know if it should be
or not.


parse_expr.c:
/*
 * Quietly accept node types that may be presented when we are
 * called on an already-transformed tree.
 *
 * Do any other node types need to be accepted?  For now we are
 * taking a conservative approach, and only accepting node
 * types that are demonstrably necessary to accept.
 */


--- parse_expr.c.~1.144.~   2003-02-09 23:44:46.0 -0500
+++ parse_expr.c2003-02-12 13:02:35.0 -0500
@@ -670,6 +670,7 @@
case T_Param:
case T_Aggref:
case T_ArrayRef:
+   case T_FuncExpr:
case T_FieldSelect:
case T_RelabelType:
case T_CoerceToDomain:


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread scott.marlowe

On Tue, 11 Feb 2003, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  ... If he starts running out of semaphores, that's a 
  problem he can address while his database is still up and running in most 
  operating systems, at least in the ones I use.
 
 Back in the day, this took a kernel rebuild and system reboot to fix.
 If this has changed, great ... but on exactly which Unixen can you
 alter SEMMAX on the fly?

Tom, now you're making me all misty eyed for 14 platter 10 Meg hard 
drives and paper tape readers.  :-)

Seriously, I know Linux can change these on the fly, and I'm pretty sure 
Solaris can too.  I haven't played with BSD for a while so can't speak 
about that.  Anyone else know?

  So, my main point is that any setting that requires you to shut down 
  postgresql to make the change, we should pick a compromise value that 
  means you never likely will have to shut down the database once you've 
  started it up and it's under load.
 
 When I started using Postgres, it did not allocate the max number of
 semas it might need at startup, but was instead prone to fail when you
 tried to open the 17th or 33rd or so connection.  It was universally
 agreed to be an improvement to refuse to start at all if we could not
 meet the specified max_connections setting.  I don't want to backtrack
 from that.  If we can up the default max_connections setting, great ...
 but let's not increase the odds of failing under load.

I don't want to backtrack either, and I prefer that we now grab the 
semaphores we need at startup.

Note that on a stock RH 72 box, the max number of 
backends you can startup before you exhaust semphores is 2047 backends, 
more than I'd ever want to try and run on normal PC hardware.  So, on a 
linux box 150 to 200 max backends comes no where near exhausting 
semaphores.

I imagine that any joe average who doesn't really understand sysadmin 
duties that well and is trying for the first time to install Postgresql 
WILL be doing so on one of three general platforms, Linux, BSD, or 
Windows.  As long as the initial settings use only 10% or so of the file 
handle and / or semaphore resources on each of those systems, we're 
probably safe.

64 or 128 seems like a nice power of two number that is likely to keep us 
safe on inital installs while forestalling problems with too many 
connections.

Just for score, here's the default max output of rh72's kernel config:

kernel.sem = 25032000   32  128
fs.file-max = 8192

Note that while older kernels needed to have max inodes bumped up as well, 
nowadays that doesn't seem to be a problem, or they just set it really 
high and I can't hit the ceiling on my workstation without swap storms.

the definitions of the kernel.sem line are:

kernel.sem: max_sem_per_id max_sem_total max_ops_sem_call max_sem_ids

I'll try to get FreeBSD running today and see what research I can find on 
it, but 5.0 is likely to be a whole new beast for me, so if someone can 
tell us what the maxes are by default on different BSDs and what the 
settings are in postgresql that can exhaust them that gets us closer.

Like I've said before, anyone running HPUX, Irix, Solaris, or any other 
Industrial Strength Unix should already know to increase all these 
things and likely had to long before Postgresql showed up on their box, so 
a setting that keeps pgsql from coming up won't be likely, and if it 
happens, they'll most likely know how to handle it.

BSD and Linux users are more likely to contain the group of folks who 
don't know all this and don't ever want to (not that all BSD/Linux users 
are like that, just that the sub group mostly exists on those platforms, 
and windows as well.)  So the default settings really probably should be 
determined, for the most part, by the needs of those users.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Changing the default configuration

2003-02-12 Thread scott.marlowe

Oh, another setting that should be a default for most users is to initdb 
automatically with locale of C.  If they need a different locale, they 
should have to pick it.

The performance of Postgresql with a locale other than C when doing like 
and such is a serious ding.  I'd much rather have the user experience the 
faster searches first, then get to test with other locales and see if 
performance is good enough, than to start out slow and wonder why they 
need to change their initdb settings to get decent performance on a where 
clause with like in it.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] Brain dump: btree collapsing

2003-02-12 Thread Tom Lane

I've been thinking hard for the last few days about how to do space
reclamation in b-tree indexes, i.e., recycle pages that are in
no-longer-useful portions of the tree structure.  We know we need this to
solve the index bloat problem that occurs when the distribution of keys
changes over time.  I feel that it's critical that the reclamation be doable
by plain VACUUM, ie, without acquiring exclusive lock on the index as a
whole.  This discussion therefore assumes that page deletion must be able
to operate in parallel with insertions and searches.


Issues
--

We need to get rid of parent links in btree pages; otherwise removal of a
non-leaf page implies we must find and update all parent links that lead
to it.  This is messy enough that it would be better to do without.  The
only thing the parent link is really needed for is to find the new parent
level after a root split, and we can handle that (very infrequent) case by
re-descending from the new root.

Instead of storing parent links, label all pages with level (counting
levels up from zero = leaf, so that a page's level doesn't change in a
root split).  Then, if we find ourselves needing to re-descend, we can be
sure of finding the correct parent level, one above where we were, even if
there's been multiple root splits meanwhile.  The level will also make for
a useful cross-check on link validity: we will always know the level of
the page we expect to arrive at when following a link, so we can check
that the page has the right level.

Unfortunately this means tossing out most of the FixBtree code Vadim wrote
2 years ago, because it seems critically dependent on having parent links.
But I don't really see why we need it if we rely on WAL to maintain btree
consistency.  That will require some improvements in WAL-logging for
btrees, however.  (Details below.)

When a page is deleted, it can't actually be recycled until there are no
more potentially in-flight references to it (ie, readers that intend to
visit the page but have not yet acquired a lock on it).  Those readers
must be able to find the page, realize it's dead, and follow the correct
sidelink from it.  [LaninShasha86] describe the drain technique, which
they define as delay freeing the empty page until the termination of all
processes whose locate phase began when pointers to the page still
existed.  We can conveniently implement this by reference to
transactions: after removing all links to the page, label the now-dead
page with the current contents of the next-transaction-ID counter.  VACUUM
can recycle the page when this is older than the oldest open transaction.

Instead of an actively maintained freelist on disk as per Alvaro Herrera's
patch, I plan to use the FSM to remember where recyclable pages are, much
as we do for tables.  The FSM space requirements would be small, since
we'd not be needing to enter any data about partially-full pages; only
truly empty, recyclable pages would need to be stored.  (Is it worth
having an alternate representation in the FSM for indexes, so that we only
store page numbers and not the useless amount-free statistic?)

Without a freelist on disk, VACUUM would need to scan indexes linearly to
find dead pages, but that seems okay; I'm thinking of doing that anyway to
look for empty pages to turn into dead ones.


Restructuring the tree during page deletion
---

We will delete only completely-empty pages.  If we were to merge nearly-empty
pages by moving data items from one page to an adjacent one, this would
imply changing the parent's idea of the bounding key between them ---
which is okay if we are just deleting an internal key in the parent, but
what if the pages have different parent pages?  We'd have to adjust the
parents' own bounding key, meaning the parents' parent changes, perhaps
all the way to the root.  (Not to mention that with variable-size keys,
there's no guarantee we can make such changes without splitting the
upper-level pages.)  And, since we support both forward and backward
index scans, we can't move leaf items in either direction without risking
having a concurrent scan miss them.  This is way too messy, especially for
something that has only minimal return according to the literature
[Johnson89].  So, no merging.

Deletion of an empty page only requires removal of the parent's item
linking to it (plus fixing side pointers, which is pretty trivial).  We
also remove the next higher key in the parent, which is the parent's upper
bound for data that would have belonged on the target page.  Therefore,
the page's right sibling becomes responsible for storing the key range
that used to belong on the deleted page.

What if there is no next-higher key, you ask?  Well, I'm going to punt.
It is critical that the key space associated with a parent page match the key
space associated with its children (eg, the high key of the rightmost child
must match the parent's high key).  There is no way to atomically

Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-12 Thread Peter Eisentraut

Tom Lane writes:

 Well, as I commented later in that mail, I feel that 1000 buffers is
 a reasonable choice --- but I have to admit that I have no hard data
 to back up that feeling.

I know you like it in that range, and 4 or 8 MB of buffers by default
should not be a problem.  But personally I think if the optimal buffer
size does not depend on both the physical RAM you want to dedicate to
PostgreSQL and the nature and size of the database, then we have achieved
a medium revolution in computer science. ;-)

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] lock method

2003-02-12 Thread Sumaira Ali

hi, does anyone know what lockmethod means in the lock.h file and whats the use of lockmethodTable.?
thank you
SumairaAdd photos to your messages with  MSN 8.  Get 2 months FREE*.

[HACKERS] plpython attypmod broken in MODIFY trigger returns

2003-02-12 Thread Bradley McLean

In a plpython trigger, if you return MODIFY, the parsing of the
TD[new] dictionary uses the wrong (c array) index to locate the
atttypmod value, leading to subtle bugs dependent on the exact types,
names, and order of fields in the table in question.

(Types need to be those that use the typmod, names affect the ordering
in the python dictionary, and order affects the c array).

In my case, I ended up with TIMESTAMP(68), which was just slightly
more precision than I really wanted ;-)

Here's a patch that fixes my issue.  I have not extensively developed
tests for it.

-Brad

-


diff -ur postgresql-7.3.2.orig/src/pl/plpython/plpython.c 
postgresql-7.3.2/src/pl/plpython/plpython.c
--- postgresql-7.3.2.orig/src/pl/plpython/plpython.cFri Jan 31 17:35:27 2003
+++ postgresql-7.3.2/src/pl/plpython/plpython.c Wed Feb 12 19:22:08 2003
@@ -630,7 +630,7 @@
modvalues[j] = 
FunctionCall3(proc-result.out.r.atts[atti].typfunc,

CStringGetDatum(src),
 
ObjectIdGetDatum(proc-result.out.r.atts[atti].typelem),
-   
Int32GetDatum(tupdesc-attrs[j]-atttypmod));
+   
+Int32GetDatum(tupdesc-attrs[atti]-atttypmod));
modnulls[j] = ' ';

Py_DECREF(plstr);


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] PostgreSQL Tuning Results

2003-02-12 Thread Ron Mayer


Christopher Kings-Lynne wrote:

I reckon that sort_mem is the hardest thing to optimise1


Agreed... in part because it depends a lot on the query.

Also, if I understand correctly sort_mem not only affects sorts
but also hash table stuff as well, right?  If that's true for
the new hash aggregates, I think this means large sort_mem settings 
will become even more useful for data-warehouse-type applications.


One thing I've been wondering, is if sort_mem could be 
per connection/backend-process instead of per sorting operation 
so that sort_mem could be set more aggressivelly without running out
of memory so easily with large queries.

If that's not possible (i.e. one couldn't know how many simultaneous 
sorts are needed beforehand), how about only let the first one or 
two get all the memory and make the rest use a smaller one.


Anecdote:
I have a reasonably large data warehouse (1e6 to 1e8 rows
in various tables) with quite a bit of data (500k rows) added
each day.  A lot of processing (7 hours/day) is spent loading
data and generating various aggregates.  In a couple places
in the ETL part of the data warehouse code I have:

  set sort_mem = 25;
  /// something that only needs a single sort
  set sort_mem =  65536;
  ...
  set sort_mem = 4096;
  /// some ugly aggregate-creating join generated by a reporting tool 
  set sort_mem = 65536;

Ron

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Brain dump: btree collapsing

2003-02-12 Thread Justin Clift

Tom Lane wrote:
snip

The deletion procedure could be triggered immediately upon removal of the
last item in a page, or when the next VACUUM scan finds an empty page.
Not sure yet which way is better.


Having it triggered immediately upon removal of the last item in a page 
would make for a more self maintaining system wouldn't it?  That 
sounds nice.  :)

snip
In theory, if we find recyclable page(s) at the physical end of the index,
we could truncate the file (ie, give the space back to the filesystem)
instead of reporting these pages to FSM.  I am not sure if this is worth
doing --- in most cases it's likely that little space can be released this
way, and there may be some tricky locking issues.


Sounds like this would be beneficial for environments with high 
update/delete transaction volumes, perhaps on smaller amounts of 
live/valid data.


snip
This could be ignored in first implementation (there's always REINDEX).
Later, possibly handle it via LaninShasha's notion of a critic (think
VACUUM) that sets a fast pointer to the current effective root level.
(Actually I think we wouldn't need a separate critic process; split and
delete steps could be programmed to update the fast pointer for
themselves, in a separate atomic action, when they split a one-page level
or delete the next-to-last page of a level.)


This really sounds like good initial thoughts too.

:-)

Regards and best wishes,

Justin Clift

--
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Christopher Kings-Lynne

 Okay, here's one: most Unix systems store all of the configuration
 files in a well known directory: /etc.  These days it's a hierarchy of
 directories with /etc as the root of the hierarchy.  When an
 administrator is looking for configuration files, the first place he's
 going to look is in /etc and its subdirectories.  After that, he's
 forced to look through the startup scripts to figure out where things
 are located.  And if those aren't revealing, then he has to read
 manpages and hope they're actually useful.  :-)  And if that doesn't
 work, then he has to resort to tricks like doing strings on the
 binaries (he doesn't necessarily have access to the sources that the
 binaries were compiled from, which is all that matters here).

No goddammit - /usr/local/etc.  Why can't the Linux community respect
history

It is the ONE TRUE PLACE dammit!!!

Chris

(btw, there is humour + seriousness in above post...)


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Changing the default configuration

2003-02-12 Thread Christopher Kings-Lynne

 Had this happen at a previous employer, and it definitely is bad. I
 believe we had to do a reboot to clear it up. And we saw the problem a
 couple of times since the sys admin wasn't able to deduce what had
 happened the first time we got it. IIRC the problem hit somewhere around
 150 connections, so we ran with 128 max. I think this is a safe number
 on most servers these days (running linux as least) though out of the
 box I might be more inclined to limit it to 64. If you do hit a file
 descriptor problem, *you are hosed*.

Just yesterday I managed to hose my new Postgres installation during a
particular benchmarking run.  Postgres did restart itself nicely though.  I
have no idea why that particular run caused problems when all other runs
with identical settings didn't.  I checked the log and saw file descriptor
probs.  I was doing 128 connections with 128 max connetions.  This was the
log:

 2003-02-12 04:16:15 LOG:  PGSTAT: cannot open temp stats file
 /usr/local/pgsql/data/global/pgstat.tmp.41388: Too many open files in
 system
 2003-02-12 04:16:15 LOG:  PGSTAT: cannot open temp stats file
 /usr/local/pgsql/data/global/pgstat.tmp.41388: Too many open files in
 system
 2003-02-12 04:16:39 PANIC:  could not open transaction-commit log
 directory
 (/usr/local/pgsql/data/pg_clog): Too many open files in system
 2003-02-12 04:16:39 LOG:  statement: SET autocommit TO 'on';VACUUM
 ANALYZE
 2003-02-12 04:16:39 LOG:  PGSTAT: cannot open temp stats file
 /usr/local/pgsql/data/global/pgstat.tmp.41388: Too many open files in
 system

This was the MIB:

 kern.maxfiles: 1064
 kern.maxfilesperproc: 957

This was the solution:

 sysctl -w kern.maxfiles=65536
 sysctl -w kern.maxfilesperproc=8192

 .. and then stick

 kern.maxfiles=65536
 kern.maxfilesperproc=8192

 in /etc/sysctl.conf so its set during a reboot.

Which just goes to highlight the importance of rigorously testing a
production installation...

Chris





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Christopher Kings-Lynne

 Seriously, I know Linux can change these on the fly, and I'm pretty sure 
 Solaris can too.  I haven't played with BSD for a while so can't speak 
 about that.  Anyone else know?

You cannot change SHMMAX on the fly on FreeBSD.

Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Rod Taylor

  binaries (he doesn't necessarily have access to the sources that the
  binaries were compiled from, which is all that matters here).
 
 No goddammit - /usr/local/etc.  Why can't the Linux community respect
 history

History?  It's the only way to make a read-only / (enforced by
secure-level) and still be able to change the user applications.

I don't mind /usr/X11R6/etc either, but it's not exactly appropriate for
PostgreSQL ;)

-- 
Rod Taylor [EMAIL PROTECTED]

PGP Key: http://www.rbt.ca/rbtpub.asc



signature.asc
Description: This is a digitally signed message part

[HACKERS] horology and time failures on freebsd/alpha

2003-02-12 Thread Christopher Kings-Lynne

I'm still getting failure, but it gets weirder:

regression=# drop table timetz_tbl;
DROP TABLE
regression=# CREATE TABLE TIMETZ_TBL (f1 time(2) with time zone);

INSERT INTO TIMETZ_TBL VALUES ('00:01 PDT');
CREATE TABLE
regression=#
regression=# INSERT INTO TIMETZ_TBL VALUES ('00:01 PDT');
INSERT INTO TIMETZ_TBL VALUES ('01:00 PDT');
INSERT INTO TIMETZ_TBL VALUES ('02:03 PDT');
INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
INSERT 154307 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('01:00 PDT');
INSERT INTO TIMETZ_TBL VALUES ('02:03 PDT');
INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
INSERT 154308 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('02:03 PDT');
INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
INSERT INTO TIMETZ_TBL VALUES ('12:00 PDT');
INSERT INTO TIMETZ_TBL VALUES ('12:01 PDT');
INSERT 154309 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
INSERT INTO TIMETZ_TBL VALUES ('12:00 PDT');
INSERT INTO TIMETZ_TBL VALUES ('12:01 PDT');
INSERT INTO TIMETZ_TBL VALUES ('23:59 PDT');
INSERT INTO TIMETZ_TBL VALUES ('11:59:59.99 PM PDT');
INSERT 154310 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
INSERT 154311 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
INSERT 154312 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('12:00 PDT');
INSERT 154313 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('12:01 PDT');
INSERT 154314 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('23:59 PDT');
INSERT 154315 1
regression=# INSERT INTO TIMETZ_TBL VALUES ('11:59:59.99 PM PDT');
INSERT 154316 1
regression=# SELECT f1 AS Time TZ FROM TIMETZ_TBL;
Time TZ

 00:01:00-07
 01:00:00-07
 02:03:00-07
 07:07:00-08
 08:08:00-04
 11:59:00-07
 12:00:00-07
 12:01:00-07
 23:59:00-07
 23:59:59.99-07
(10 rows)

-- Now run EXACTLY the same query again...

regression=# SELECT f1 AS Time TZ FROM TIMETZ_TBL;
Time TZ

 00:00:00-07
 00:00:00-07
 00:00:00-07
 00:00:00-08
 00:00:00-04
 00:00:00-07
 00:00:00-07
 00:00:00-07
 00:00:00-07
 00:00:59.99-07

What the heck!!?!?!?!  The SELECT statement has somehow edited the data in
the table or something???  From this point forward, the table is all
stuffed...

Chris


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Lamar Owen

On Wednesday 12 February 2003 20:37, Christopher Kings-Lynne wrote:
  Okay, here's one: most Unix systems store all of the configuration
  files in a well known directory: /etc.  These days it's a hierarchy of

 No [snip] - /usr/local/etc.  Why can't the Linux community respect
 history

 It is the ONE TRUE PLACE [snip]

If PostgreSQL is supported as a part of the base operating system in a Linux 
distribution, and that distribution wishes to be Linux Standards Base 
compliant (most do), then PostgreSQL cannot go in /usr/local -- period.

IDIC at work.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Brain dump: btree collapsing

2003-02-12 Thread Tom Lane

Justin Clift [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The deletion procedure could be triggered immediately upon removal of the
 last item in a page, or when the next VACUUM scan finds an empty page.
 Not sure yet which way is better.

 Having it triggered immediately upon removal of the last item in a page 
 would make for a more self maintaining system wouldn't it?  That 
 sounds nice.  :)

Maybe.  This isn't about getting rid of VACUUM --- there's still a need
for routine maintenance vacuums.  So the question really comes down to
whether it's more efficient to do it in bulk during routine
maintenance sweeps, or retail.  I'm not sold yet, but am leaning to
the bulk side.

 In theory, if we find recyclable page(s) at the physical end of the index,
 we could truncate the file (ie, give the space back to the filesystem)
 instead of reporting these pages to FSM.  I am not sure if this is worth
 doing --- in most cases it's likely that little space can be released this
 way, and there may be some tricky locking issues.

 Sounds like this would be beneficial for environments with high 
 update/delete transaction volumes, perhaps on smaller amounts of 
 live/valid data.

It would only really be worth doing if you made a substantial reduction
in the number of rows in a table, and had no near-term intention of
loading the table back up again.  Seems like it might be sufficient to
tell people to REINDEX if they want the index size to drop in that
scenario.  I will look at physically truncating the index during VACUUM,
but I don't think it's worth getting real tense about...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] horology and time failures on freebsd/alpha

2003-02-12 Thread Christopher Kings-Lynne

Oops - just to clarify I accidentally copied too many INSERTs into the
email.  The result of the first SELECT is correct compared to the INSERTs
that I did.  I copied the INSERTs and CREATE TABLE as-is from timetz.sql

Chris

 -Original Message-
 From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, 13 February 2003 11:08 AM
 To: Hackers
 Cc: Tom Lane
 Subject: horology and time failures on freebsd/alpha


 I'm still getting failure, but it gets weirder:

 regression=# drop table timetz_tbl;
 DROP TABLE
 regression=# CREATE TABLE TIMETZ_TBL (f1 time(2) with time zone);

 INSERT INTO TIMETZ_TBL VALUES ('00:01 PDT');
 CREATE TABLE
 regression=#
 regression=# INSERT INTO TIMETZ_TBL VALUES ('00:01 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('01:00 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('02:03 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
 INSERT 154307 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('01:00 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('02:03 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
 INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
 INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
 INSERT 154308 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('02:03 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
 INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
 INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('12:00 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('12:01 PDT');
 INSERT 154309 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('07:07 PST');
 INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
 INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('12:00 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('12:01 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('23:59 PDT');
 INSERT INTO TIMETZ_TBL VALUES ('11:59:59.99 PM PDT');
 INSERT 154310 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('08:08 EDT');
 INSERT 154311 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('11:59 PDT');
 INSERT 154312 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('12:00 PDT');
 INSERT 154313 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('12:01 PDT');
 INSERT 154314 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('23:59 PDT');
 INSERT 154315 1
 regression=# INSERT INTO TIMETZ_TBL VALUES ('11:59:59.99 PM PDT');
 INSERT 154316 1
 regression=# SELECT f1 AS Time TZ FROM TIMETZ_TBL;
 Time TZ
 
  00:01:00-07
  01:00:00-07
  02:03:00-07
  07:07:00-08
  08:08:00-04
  11:59:00-07
  12:00:00-07
  12:01:00-07
  23:59:00-07
  23:59:59.99-07
 (10 rows)

 -- Now run EXACTLY the same query again...

 regression=# SELECT f1 AS Time TZ FROM TIMETZ_TBL;
 Time TZ
 
  00:00:00-07
  00:00:00-07
  00:00:00-07
  00:00:00-08
  00:00:00-04
  00:00:00-07
  00:00:00-07
  00:00:00-07
  00:00:00-07
  00:00:59.99-07

 What the heck!!?!?!?!  The SELECT statement has somehow edited
 the data in the table or something???  From this point forward,
 the table is all stuffed...

 Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] horology and time failures on freebsd/alpha

2003-02-12 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 What the heck!!?!?!?!  The SELECT statement has somehow edited
 the data in the table or something???  From this point forward,
 the table is all stuffed...

Hm, time to dig out the debugger and figure out where the breakage
is.  Do you want to do it, or lend an account to let someone else
dig into it?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] psql source suggestions

2003-02-12 Thread Jeroen T. Vermeulen

I've been playing around with the source for psql, and as a result I
did some cleaning up in common.c.  Would anyone be interested in seeing
patches for that?  The main benefit is some eliminated code duplication,
plus the removal of some warts like 'continue' and 'break' where they
weren't needed.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] log_duration

2003-02-12 Thread Greg Stark


 Christopher Kings-Lynne wrote:
   Someone asked about this at FOSDEM. The only way I know to do it is look
   in the pgsql_temp directory, but they disappear pretty quickly.  Folks,
   do we need something to report sort file usage?

Fwiw here's the perl one-liner I used to tune sort_mem recently, 
(run in the $PGDATA/base directory):

perl -e 'while (sleep(1)) {if ($s = -s pgsql_tmp/*) { if ($s  $m) { $m = $s; print 
$s\n; } } }'

When doing this I had precisely the same thought about having Postgres print
out the disk space usage for sorts. 

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] loading libraries on Postmaster startup

2003-02-12 Thread Joe Conway

While using PL/R in a web based application, I noticed that the library
load and initialization time is significant enough to be annoying. So I
wrote a quick hack to load and initialize the library on postmaster
startup. This way, the backends get a fully initialized copy of the
interpreter when they are forked. The hack was to postmaster.c just
after the SSL initialization code at about line 650 (just remembered
this is 7.3.2 though):

   if (true) /* later use startup GUC var */
   {
 char *fullname = $libdir/plr.so;
 char *funcname = start_interp;
 func_ptr initfunc;

 initfunc = (func_ptr)
 load_external_function(fullname, funcname, true, NULL);
 (*initfunc)();
   }

(I also had to add a #define for func_ptr)

This brings me to a couple questions:

1. Is there anything inherently dangerous with this approach?
My light testing seems to show that it works quite well for
my purpose.

2. It seems to me that other libraries such as those for PL/Tcl,
PL/Perl, etc may have the same issue. Is there any merit in
a GUC variable to allow libraries such as this to be loaded
and initialized at postmaster start? I'll generalize this and
send in a patch if there is interest.

Joe



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] log_duration

2003-02-12 Thread Kevin Brown

Greg Stark wrote:
 Not a big deal though, since I doubt anyone's actually parsing
 postgres logs.  Hm, brings up an interesting idea though, I wonder
 if it would be useful to log directly into postgres tables.

I was wondering roughly the same thing.  If you ran an external
program to process the logs and put them into a PostgreSQL database,
you'd have problems with the transactions of the log processor landing
in the logs as well, at least if all transactions were logged.  The
logging process would have to filter out its own transactions, which
might not be all that easy.


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] loading libraries on Postmaster startup

2003-02-12 Thread Greg Stark

Joe Conway [EMAIL PROTECTED] writes:

 2. It seems to me that other libraries such as those for PL/Tcl,
  PL/Perl, etc may have the same issue. Is there any merit in
  a GUC variable to allow libraries such as this to be loaded
  and initialized at postmaster start? I'll generalize this and
  send in a patch if there is interest.

A similar situation arises with mod_perl. Because perl is quite heavy-weight
and systems often need lots of packages with static data it's common to load a
startup.pl script that just loads lots of packages before the Apache server
forks. This reduces memory usage drastically.

The main gotcha is that you have to be careful about resources that you don't
want shared. The typical case is database handles which are sockets that
wouldn't be happy having two processes writing and reading on them.

At first blush it seemed unlikely you would have a database connection in an
embedded perl script. But then, hm, that would be a sly way of doing
interdatabase connections. In any case there are other situations where you
might want to have open file descriptors or sockets lying around.

So in short, not only is it useful, but it would be valuable to allow
mechanism to cause the language to load modules before forking. But there have
to be prominent caveats that no such shared packages should create resources
that can't be safely shared.

--
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] psql source suggestions

2003-02-12 Thread Tom Lane

Jeroen T. Vermeulen [EMAIL PROTECTED] writes:
 I've been playing around with the source for psql, and as a result I
 did some cleaning up in common.c.  Would anyone be interested in seeing
 patches for that?  The main benefit is some eliminated code duplication,
 plus the removal of some warts like 'continue' and 'break' where they
 weren't needed.

Sure, send it along to pgsql-patches.  Code beautification is a
worthwhile goal that we spend far too little time on.

Of course, beauty is in the eye of the beholder, so there may not be
universal agreement that you made improvements ;-).  But send 'em along
and then we can debate the point.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] loading libraries on Postmaster startup

2003-02-12 Thread Tom Lane

Joe Conway [EMAIL PROTECTED] writes:
 [ what about autoloading libraries into the postmaster? ]

I can see a couple possible downsides: (a) the library might have some
weird behavior across fork boundaries; (b) the additional memory space
that has to be duplicated into child processes will cost something per
child launch, even if the child never uses it.  But these are only
arguments that it might not *always* be a prudent thing to do, not that
we shouldn't give the DBA the tool to do it if he wants.  So fire away.

(I seem to recall Peter muttering about linking plperl, pltcl, etc
statically into the backend; which would reduce the need for this.
But it would not eliminate it ... and he hasn't done it anyway...)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] More benchmarking of wal_buffers

2003-02-12 Thread Christopher Kings-Lynne

Hi Everyone,

I've just spent the last day and a half trying to benchmark our new database
installation to find a good value for wal_buffers.  The quick answer - there
isn't, just leave it on the default of 8.

The numbers just swing up and down so much it's impossible to say that one
setting is better than another.  I've attached an openoffice doc with my old
shared_buffers tests plus the wal_buffers tests.  The wal results are a bit
deceptive as the results I've included are really what I consider the
'average' results.  Just occasionally, I'd get a spike that I could never
repeat...

Even if you look at the attached charts and you think that 128 buffers are
better than 8, think again - there's nothing in it.  Next time I run that
benchmark it could be the same, lower or higher.  And the difference between
the worst and best results is less than 3 TPS - ie. nothing.

One proof that has come out of this is that wal_buffers does not affect
SELECT only performance in any way.  So, for websites where the
select/update ratio is very high, wal_buffers is almost an irrelevant
optimisation.

Even massively heavy sites where you are getting write transactions
continuously by 64 simultaneous people, I was unable to prove that any
setting other than the default helped.  In this situation, probably the
commit_delay and commit_siblings variables will give you the best gains.

I'm not sure what I could test next.  Does FreeBSD support anything other
than fsync?  eg. fdatasync, etc.  I can't see it in the man pages...

Chris

ps. I don't think the attachments are too large, but if they annoy anyone,
tell me.  Also, I've cross posted to make sure people who read my previous
benchmark, see this one also.



PostgreSQL Benchmarking.sxc
Description: OpenOffice Calc spreadsheet
attachment: select.gifattachment: tpcb.gif
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Peter Bierman

At 12:31 AM -0500 2/13/03, mlw wrote:

The idea that a, more or less, arbitrary data location determines 
the database configuration is wrong. It should be obvious to any 
administrator that a configuration file location which controls the 
server is the right way to do it.


Isn't the database data itself a rather significant portion of the 
'configuration' of the database?

What do you gain by having the postmaster config and the database 
data live in different locations?

-pmb

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] More benchmarking of wal_buffers

2003-02-12 Thread Neil Conway

On Thu, 2003-02-13 at 00:16, Christopher Kings-Lynne wrote:
 Even if you look at the attached charts and you think that 128 buffers are
 better than 8, think again - there's nothing in it.  Next time I run that
 benchmark it could be the same, lower or higher.  And the difference between
 the worst and best results is less than 3 TPS - ie. nothing.

One could conclude that this a result of the irrelevancy of wal_buffers;
another possible conclusion is that the testing tool (pgbench) is not a
particularly good database benchmark, as it tends to be very difficult
to use it to reproduceable results. Alternatively, it's possible that
the limited set of test-cases you've used doesn't happen to include any
circumstances in which wal_buffers is useful.

We definitely need some better benchmarking tools for PostgreSQL (and
no, OSDB does not cut it, IMHO). I've been thinking of taking a look at
improving this, but I can't promise I'll get the time or inclination to
actually do anything about it :-)

Cheers,

Neil
-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] location of the configuration files

2003-02-12 Thread mlw

Peter Bierman wrote:


At 12:31 AM -0500 2/13/03, mlw wrote:


The idea that a, more or less, arbitrary data location determines the 
database configuration is wrong. It should be obvious to any 
administrator that a configuration file location which controls the 
server is the right way to do it.



Isn't the database data itself a rather significant portion of the 
'configuration' of the database?

What do you gain by having the postmaster config and the database data 
live in different locations? 

While I don't like to use another product as an example, I think amongst 
the number of things Oracle does right is that it has a fairly standard 
way for an admin to find everything. All one needs to do is find the 
ORACLE_HOME directory, and everything can be found from there.

If, assume, PostgreSQL worked like every other system. It would have 
either an entry in /etc or some other directory specified by configure.

Somene please tell me how what I'm proposing differs from things like 
sendmail, named, or anyother standards based UNIX server?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Changing the default configuration

2003-02-12 Thread Daniel Kalchev

Bruce Momjian said:
[...]
  For example, we can ask them how many rows and tables they will be
  changing, on average, between VACUUM runs.  That will allow us set the
  FSM params.  We can ask them about using 25% of their RAM for shared
  buffers.  If they have other major apps running on the server or have
  small tables, we can make no changes.  We can basically ask them
  questions and use that info to set values.

Bruce, this is an very good idea and such tool would simplify setup for the 
me-too type of DBA - we should definitely try to attract them.

However, how could one possibly answer the above question, if they setup their 
database for the first time?

What is more, these settings are on a per-installation, not per-database - 
which means, that if you have several small, but active databases and one 
large database the requirements will be very different.

Nobody likes answering such questions when installing new software. You might 
enjoy it the first few times, but then learn the 'answers' and don't even 
think what the question is. (we all know the answer :)

Perhaps indeed a better idea is to have PostgreSQL itself collect usage 
statistics, and from time to time print 'suggestions' to the log file (best in 
my opinion), or have these available via some query. These suggestions should 
best reflect the of course require minimal intervention to the database 
system, such as restart etc.


Daniel



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Kevin Brown


Before I get started, I should note that it may be a good compromise
to have the data directory be the same as the config file directory,
when neither the config file nor the command line specify something
different.  So the changes I think may make the most sense are:

1.  We add a new GUC variable which specifies where the data is.
The data is assumed to reside in the same place the config files
reside unless the GUC variable is defined (either in
postgresql.conf or on the command line, as usual for a GUC
variable).  Both -D and $PGDATA therefore retain their current
semantics unless overridden by the GUC variable, in which case
they fall back to the new semantics of specifying only where the
config files can be found.

2.  We add a configure option that specifies what the hardcoded
fallback directory should be when neither -D nor $PGDATA are
specified: /etc/postgresql when the option isn't specified to
configure.

3.  We supply a different default startup script and a different
default configuration file (but can make the older versions
available in the distribution as well if we wish).  The former
uses neither $PGDATA nor -D (or uses /etc/postgresql for them),
and the latter uses the new GUC variable to specify a data
directory location (/var/lib/postgres by default?)

This combination should work nicely for transitioning and for package
builders.  It accomplishes all of the goals mentioned in this thread
and will cause minimal pain for developers, since they can use their
current methods.  Sounds like it'll make Tom happy, at least.  :-)


Tom Lane wrote:
 mlw [EMAIL PROTECTED] writes:
  The idea that a, more or less, arbitrary data location determines the 
  database configuration is wrong. It should be obvious to any 
  administrator that a configuration file location which controls the 
  server is the right way to do it.
 
 I guess I'm just dense, but I entirely fail to see why this is the One
 True Way To Do It.  

But we're not saying it's the One True Way, just saying that it's a
way that has very obvious benefits over the way we're using now, if
your job is to manage a system that someone else set up.

 What you seem to be proposing (ignoring syntactic-sugar issues) is
 that we replace postmaster -D /some/data/dir by postmaster
 -config /some/config/file.  I am not seeing the nature of the
 improvement.

The nature of the improvement is that the configuration of a
PostgreSQL install will becomes obvious to anyone who looks in the
obvious places.  Remember, the '-D ...' is optional!  The PGDATA
environment variable can be used instead, and *is* used in what few
installations I've seen.  That's not something that shows up on the
command line when looking at the process list, which forces the
administrator to hunt down the data directory through other means.

 It looks to me like the sysadmin must now grant the Postgres DBA
 write access on *two* directories, viz /some/config/ and
 /wherever/the/data/directory/is.  How is that better than granting
 write access on one directory?

The difference in where you grant write access isn't a benefit to be
gained here.  The fact that you no longer have to give root privileges
to the DBA so that he can change the data directory as needed is the
benefit (well, one of them, at least).  A standard packaged install
can easily set the /etc/postgresql directory up with write permissions
for the postgres user by default, so the sysadmin won't even have to
touch it if he doesn't want to.

A big production database box is usually managed by one or more system
administrators and one or more DBAs.  Their roles are largely
orthogonal.  The sysadmins have the responsibility of keeping the
boxes up and making sure they don't fall over or crawl to a
standstill.  The DBAs have the responsibility of maximizing the
performance and availability of the database and *that's all*.  Giving
the DBAs root privileges means giving them the power to screw up the
system in ways that they can't recover from and might not even know
about.  The ways you can take down a system by misconfiguring the
database are bad enough.  No sane sysadmin is going to give the DBA
the power to run an arbitrary script as root at a time during the boot
cycle that the system is the most difficult to manage unless he thinks
the DBA is *really* good at system administration tasks, too.  And
that's assuming the sysadmin even *has* the authority to grant the DBA
that kind of access.  Many organizations keep a tight rein on who can
do what in an effort to minimize the damage from screwups.

The point is that the DBA isn't likely to have root access to the box.
When the DBA lacks that ability, the way we currently do things places
greater demand on the sysadmin than is necessary, because root access
is required to change the startup scripts, as it should be, and the
location of the data, as it should *not* be.

 Given that we can't manage to

Re: [HACKERS] psql and readline

2003-02-12 Thread Bruce Momjian


Patch applied.  Thanks.

---


Ross J. Reedstrom wrote:
 On Fri, Jan 10, 2003 at 11:02:55PM +0100, Peter Eisentraut wrote:
  Ross J. Reedstrom writes:
  
   I already posted a one-line patch to implement this, but it doesn't
   seem to hve come through to the list. Here it is inline, instead of as
   an attachment:
  
  We need this to work without readline as well.  (Of course there won't be
  any history, but it needs to compile.)
 
 blush Even after slogging my way through the nesting #ifdefs for readline
 and win32, I forgot! Let's make that a three line patch, then.
 
 
 Index: src/bin/psql/command.c
 ===
 RCS file: /projects/cvsroot/pgsql-server/src/bin/psql/command.c,v
 retrieving revision 1.84
 diff -u -r1.84 command.c
 --- src/bin/psql/command.c2002/10/23 19:23:56 1.84
 +++ src/bin/psql/command.c2003/01/10 22:06:07
 @@ -1639,6 +1639,9 @@
   error = true;
   }
  
 +#ifdef USE_READLINE
 + replace_history_entry(where_history(),query_buf-data,NULL);
 +#endif
   fclose(stream);
   }
  
 Ross
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] log_duration

2003-02-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 One nice thing is that each element is orthoginal.  But, for the
 functionality desired, we have to merge log_statement and log_duration
 and have it print for statements taking over X milliseconds.  I have no
 problem adding it, but it has to be clear it isn't orthoginal but is a
 conditional combination of two other parameters.

Actually, I was wondering if we shouldn't *replace* the current
log_duration with a combined form (that specifies a minimum interesting
duration).  I can't quite see the need for orthogonality here.  The
only reason you'd care about query duration is that you're looking for
the slow ones, no?  So why bother logging the fast ones?  Besides, you
can specify min-duration zero if you really want 'em all.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] log_duration

2003-02-12 Thread Greg Stark


  One option is to have log_query output an identifier with the query such as a
  hash of the query or the pointer value for the plan, suppressing duplicates.
  Then log_duration prints the identifier with the duration. 

 Actually, log_pid is the proper way to do this.  You can then add log
 connections, and get a full snapshot of what is happening for that
 session.

Personally I would prefer a unique identifier. I guess the best way of
illustrating my intuition would be: Imagine loading all this data into a
relational database, what would you need to full normalize it?. 

Parsing log files programmatically is much easier if you have unique
identifiers instead of having to rely on the relative relationships of entries
in the log. 

Not a big deal though, since I doubt anyone's actually parsing postgres logs.
Hm, brings up an interesting idea though, I wonder if it would be useful to
log directly into postgres tables.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[HACKERS] PostgreSQL Windows port strategy

2003-02-12 Thread Jean-Michel POURE

Dear Friends,

As a minor contributor to pgAdmin, I would like to express ideas as regards 
the Windows port. As a personal point of view, it may or may not reflect the 
community ideas, who knows. Don't flame me too much, I am only a casual user 
of PostrgreSQL...

**

Microsoft success is not due to a particular software (Word, Excel, Access, Ms 
SQL Server, Internet Explorer, Visual Basic, etc...) but to the combination 
of all these softwares on a proprietary platform (Windows). 

Whatever free software breaks into this combination of proprietary softwares 
will participate, one step after another, in the destruction of Microsoft 
monopoly. Therefore, PostgreSQL and OpenOffice for Windows are probably the 
most valuable efforts to break down Microsoft monopoly.

Windows alone, without Word, Excel, Power Point, Visual Basic, MS SQL Server, 
will not convince a single customer. And this will be the end of Microsoft.

On the converse, Linux rise is due, not only to the availability of a large 
number of softwares, but also to the notion a collaborative works inside a 
community. Even without cross-porting software from Unixes to Windows, I 
believe Windows will die of its own death because collaborative development 
in communities is superior. It is only a question of time.

The conditions for PostgreSQL to defeat Microsoft quickly is that we stay 
united and wage war on the two fronts: Windows on the one hand and 
Linux/Unixes on the other hand.

Most of you are Americans and as such, you probably studied the American civil 
war. One of the reasons why the civil war claimed 500.000 dies is that the 
armies fought on different fronts, at different times, without 
synchronization.

As regards PostgreSQL, our efforts should concentrate on both platforms at the 
same time. In other words, this means: porting PostgreSQL to Windows, 
bringing pgAdmin2/3 to Linux and other Free Unixies. And probably delivering 
bundles under Linux, Free Unixes and Windows, offering the best Free 
softwares: PostgreSQL server, pgAdmin client and PhpPgAdmin web interface.

MySQL success is largely due to its availability under Windows in bundles, 
which is a pure Microsoft strategy. PostgreSQL are not and will never be 
playing Microsoft strategy because we are a world community.

To sum up:

1) Microsoft sucess is due to the availibility of bundles under a proprietary 
platform. Replacing Microsoft leading softwares with Free alternatives 
participates in the destruction of Microsoft.

2) Linux and other free Unixes also offer a large number of softwares. Our 
competive advantage is to work in communities. Even without cross-porting, 
Free software is going to replace closed software, this is only a question of 
time.

3) To accelerate the replacement of Windows closed source solutions, 
PostgreSQL community should synchronize the releases of PostgreSQL under 
Windows, Linux and Unixes. Not porting PostgreSQL to Windows is playing 
Microsoft strategy.

4) Also, we should focus on offering users a bundle including: server 
(PostgreSQL), client (pgAdmin), web interface (phpPgAdmin) and probably Php. 
This does not need to be a single installer, but at least it should exist as 
links on the web page.

Just my 2 cents.
Best regards,
Jean-Michel POURE

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] PGP signing release

2003-02-12 Thread Greg Copeland

On Tue, 2003-02-11 at 20:17, Bruce Momjian wrote:
 I hate to poo-poo this, but this web of trust sounds more like a web
 of confusion.  I liked the idea of mentioning the MD5 in the email
 announcement.  It doesn't require much extra work, and doesn't require a
 'web of %$* to be set up to check things.  Yea, it isn't as secure as
 going through the motions, but if someone breaks into that FTP server
 and changes the tarball and MD5 file, we have much bigger problems than
 someone modifying the tarballs;  our CVS is on that machine too.
 
 ---
 
 Greg Copeland wrote:
  On Tue, 2003-02-11 at 18:27, Curt Sampson wrote:
   On Wed, 11 Feb 2003, Greg Copeland wrote:
   
On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:
   
[Re: everybody sharing a single key]
   
This issue doesn't change regardless of the mechanism you pick.  Anyone
that is signing a key must take reasonable measures to ensure the
protection of their key.
   
   Right. Which is why you really want to use separate keys: you can determine
   who compromised a key if it is compromised, and you can revoke one without
   having to revoke all of them.
   
   Which pretty much inevitably leads you to just having the developers use
   their own personal keys to sign the release.
   
Basically, you are saying:
You trust a core developer
You trust they can protect their keys
You trust they can properly distribute their trust
You don't trust a core developer with a key
   
   Not at all. I trust core developers with keys, but I see no reason to
   weaken the entire system by sharing keys when it's not necessary. Having
   each developer sign the release with his own personal key solves every
   problem you've brought up.
   
   cjs
  
  You need to keep in mind, I've not been advocating, rather, clarifying. 
  The point being, having a shared key between trusted core developers is
  hardly an additional risk.  After all, either they can be trusted or
  they can't.
  
  At this point, I think we both understand where the other stands. 
  Either we agree or agree to disagree.  The next step is for the
  developers to adopt which path they prefer to enforce and to ensure they
  have the tools and knowledge at hand to support it.
  
  Anyone know if Tom and Bruce know each other well enough to sign each
  other's keys outright, via phone, via phone and snail-mail?  That would
  put us off to an excellent start.
  
 

Bruce,

Since you just got back in town I'm not sure if you've been able to
follow the thread or not.  Just the same, I wanted to remind you that
using MD5 is not a security mechanism of any worth.  As such, this
thread was an effort to add a layer of authenticity.  Again, this is not
something that MD5 is going to provide for, now or in the future.

If it sounds confusing, it's only because you've never done it. 
Honestly, once you take the 20-minutes to do it the first time, you'll
understand what's going on.  Beyond that, you won't have to sign
additional keys until you can validate them or as they expire.  It only
takes minutes once you understand what's going on after that.

The time to actually sign packages is more or less the same as creating
your hashes.

Lastly, don't forget that your site is mirrored all over the place.  As
such, you're not the only place open to attack.  Just because you have
additional software running on this box is no reason to throw your hands
in the air and say, I don't care.  Simple fact is, it only takes one
site to become compromised to significantly effect PostgreSQL's
reputation.  And that site doesn't have to be yours.  If it's an
official mirror, it reflects (oh...a pun!) accordingly on the project.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [JDBC] [HACKERS] Wrong charset mappings

2003-02-12 Thread Barry Lind

I don't see any jdbc specific requirements here, other than the fact 
that jdbc assumes that the following conversions are done correctly:

dbcharset - utf8 - java/utf16

where the dbcharset to/from utf8 conversion is done by the backend and 
the utf8 to/from java/utf16 is done in the jdbc driver.

Prior to 7.3 the jdbc driver did the entire conversion itself.  However 
versions of the jdk prior to 1.4 do a terrible job when it comes to the 
performance of the conversion.  So for a significant speed up in 7.3 we 
moved most of the work to the backend.

thanks,
--Barry


Thomas O'Dowd wrote:
Hi Ishii-san,

Thanks for the reply. Why was the particular change made between 7.2 and
7.3? It seems to have moved away from the standard. I found the
following file...

src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl

Which generates the mappings. I found it references 3 files from unicode
organisation, namely: 

http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0212.TXT

The JIS0208.TXT has the line...

0x8160 0x2141 0x301C # WAVE DASH

1st col is sjis, 2nd is EUC - 0x8080, 3rd is utf16.

Incidently those mapping files are marked obsolete but I guess the old
mappings still hold.

I guess if I run the perl script it will generate a mapping file
different to what postgresql is currently using. It might be interesting
to pull out the diffs and see what's right/wrong. I guess its not run
anymore?

I can't see how the change will affect the JDBC driver. It should only
improve the situation. Right now its not possible to go from sjis -
database (utf8) - java (jdbc/utf16) - sjis for the WAVE DASH character
because the mapping is wrong in postgresql. I'll cc the JDBC list and
maybe we'll find out if its a real problem to change the mapping.

Changing the mapping I think is the correct thing to do from what I can
see all around me in different tools like iconv, java 1.4.1, utf-8
terminal and any unicode reference on the web.

What do you think?

Tom.

On Wed, 2003-02-12 at 22:30, Tatsuo Ishii wrote: 

I think the problem you see is due to the the mapping table changes
between 7.2 and 7.3. It seems there are more changes other than
u301c. Moreover according to the recent discussion in Japanese local
mailing list, 7.3's JDBC driver now relies on the encoding conversion
performed by the backend. ie. The driver issues set client_encoding =
'UNICODE'. This problem is very complex and I need time to find good
solution. I don't think simply backout the changes to the mapping
table solves the problem.



Hi all,

One Japanese character has been causing my head to swim lately. I've
finally tracked down the problem to both Java 1.3 and Postgresql.

The problem character is namely:
utf-16: 0x301C
utf-8: 0xE3809C
SJIS: 0x8160
EUC_JP: 0xA1C1
Otherwise known as the WAVE DASH character.

The confusion stems from a very similar character 0xFF5E (utf-16) or
0xEFBD9E (utf-8) the FULLWIDTH TILDE.

Java has just lately (1.4.1) finally fixed their mappings so that 0x301C
maps correctly to both the correct SJIS and EUC-JP character. Previously
(at least in 1.3.1) they mapped SJIS to 0xFF5E and EUC to 0x301C,
causing all sorts of trouble.

Postgresql at least picked one of the two characters namely 0xFF5E, so
conversions in and out of the database to/from sjis/euc seemed to be
working. Problem is when you try to view utf-8 from the database or if
you read the data into java (utf-16) and try converting to euc or sjis
from there.

Anyway, I think postgresql needs to be fixed for this character. In my
opinion what needs to be done is to change the mappings...

euc-jp - utf-8- euc-jp
====
0xA1C1 - 0xE3809C0xA1C1

sjis   - utf-8- sjis
====
0x8160 - 0xE3809C0x8160

As to what to do with the current mapping of 0xEFBD9E (utf-8)? It
probably should be removed. Maybe you could keep the mapping back to the
sjis/euc characters to help backward compatibility though. I'm not sure
what is the correct approach there.

If anyone can tell me how to edit the mappings under:
	src/backend/utils/mb/Unicode/

and rebuild postgres to use them, then I can test this out locally.


Just edit src/backend/utils/mb/Unicode/*.map and rebiuld
PostgreSQL. Probably you might want to modify utf8_to_euc_jp.map and
euc_jp_to_utf8.map.
--
Tatsuo Ishii




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Incremental backup

2003-02-12 Thread Bruce Momjian


Someone at Red Hat is working on point-in-time recovery, also known as
incremental backups.  It will be in 7.4.

---

Martin Marques wrote:
 How's this issue going on the 7.4 development tree?
 I saw it on the TODO list, but didn't find much on the archives of this 
 mailing list.
 
 -- 
 Porqu? usar una base de datos relacional cualquiera,
 si pod?s usar PostgreSQL?
 -
 Mart?n Marqu?s  |[EMAIL PROTECTED]
 Programador, Administrador, DBA |   Centro de Telematica
Universidad Nacional
 del Litoral
 -
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] UNIQUE not unique with inheritance (workaround?)

2003-02-12 Thread Bruce Momjian


Or create a sequence, and make each table in the inheritance hierarchy
use the same sequence as it's default value.

---

Oliver Elphick wrote:
 On Sat, 2003-02-08 at 19:34, Samuel Sieb wrote:
  Is there any workaround for this problem?  I'm getting involved in a 
  project where inheritance is an incredibly useful feature, but the 
  non-unique issue could be a serious stumbling block.  Is there any way 
  to work around it with a trigger or something?
 
 Give each table in the hierarchy a foreign key reference to another
 table which holds a unique list of the primary keys and a column that
 says which table they are in.  Use triggers to update this other table
 and to prevent duplications in the hierarchy.
 
 .
 -- 
 Oliver Elphick[EMAIL PROTECTED]
 Isle of Wight, UK http://www.lfix.co.uk/oliver
 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
  
  But the LORD is in his holy temple; let all the earth 
   keep silence before him.   Habakkuk 2:20 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] log_duration

2003-02-12 Thread Bruce Momjian

Greg Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Christopher Kings-Lynne [EMAIL PROTECTED] writes:
   Looking at the log_duration postgresql.conf option.  How about adding an
   option log_duration_min which is a value in milliseconds that is the minimum
   time a query must run for before being logged.
  
  Fine with me --- but you'll need to add more logic than that.  Right
  now, log_duration *only* causes the query duration to be printed out;
  if you ain't got log_statement on, you're in the dark as to what the
  query itself was.  You'll need to add some code to print the query
  (the log_min_error_statement logic might be a useful source of
  inspiration).  Not sure how this should interact with the case where
  log_duration is set and the min-duration isn't.  But maybe that case
  is silly, and we should just redefine log_duration as a minimum runtime
  that causes the query *and* its runtime to be printed to the log.

Tom is right here.  log_duration _just_ prints the duration, so we would
need to basically create a merged param that does log_duration and
log_statement and have it activate only if the statement takes more than
X milliseconds, something like log_long_statement, or something like
that.

Here are the log_* params we have:

log_connections = false
log_hostname = false
log_source_port = false
log_pid = false
log_statement = false
log_duration = false
log_timestamp = false

Basically, log_pid pulls them all together.  Without that, you don't
have any way to pull together individual lines in the log, and with pid
wraparound, you can't even do that 100%.  I wonder if we should put a
number before the pid and increment it on every pid wraparound. 

One nice thing is that each element is orthoginal.  But, for the
functionality desired, we have to merge log_statement and log_duration
and have it print for statements taking over X milliseconds.  I have no
problem adding it, but it has to be clear it isn't orthoginal but is a
conditional combination of two other parameters.

 Is it even guaranteed to be properly ordered on a busy server with multiple
 processors anyways?
 
 One option is to have log_query output an identifier with the query such as a
 hash of the query or the pointer value for the plan, suppressing duplicates.
 Then log_duration prints the identifier with the duration. 
 
 This means on a busy server running lots of prepared queries you would see a
 whole bunch of queries on startup, then hopefully no durations. Any durations
 printed could cause alarms to go off. To find the query you grep the logs for
 the identifier in the duration message.

Actually, log_pid is the proper way to do this.  You can then add log
connections, and get a full snapshot of what is happening for that
session.

 This only really works if you're using prepared queries everywhere. But in the
 long run that will be the case for OLTP systems, which is where log_duration
 is really useful.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] win32 port

2003-02-12 Thread Bruce Momjian


It is going to be non-threaded in 7.4.  SRA may contribute their
threaded version to a future release of PostgreSQL, but I don't think it
will be 7.4.  We are using PeerDirect's Win32 port, with a few
improvements from SRA's port (minus their thread changes).  I am going
to work on it in March.

---

Merlin Moncure wrote:
 Has a final decision been made if the win32 port is going to be threaded
 or not?
 Merlin

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Incremental backup

2003-02-12 Thread Christopher Kings-Lynne

 Someone at Red Hat is working on point-in-time recovery, also known as
 incremental backups.  It will be in 7.4.

Does that mean that the poor guy/gal is implementing redo for all the index
types?

Chris


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Larry Rosenman



--On Thursday, February 13, 2003 09:47:28 +0800 Christopher Kings-Lynne 
[EMAIL PROTECTED] wrote:

Seriously, I know Linux can change these on the fly, and I'm pretty sure
Solaris can too.  I haven't played with BSD for a while so can't speak
about that.  Anyone else know?


You cannot change SHMMAX on the fly on FreeBSD.

Yes you can, on recent 4-STABLE:

Password:
lerlaptop# sysctl kern.ipc.shmmax=6600
kern.ipc.shmmax: 33554432 - 6600
lerlaptop#uname -a
FreeBSD lerlaptop.lerctr.org 4.7-STABLE FreeBSD 4.7-STABLE #38: Mon Feb  3 
21:51:25 CST 2003 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/LERLAPTOP  i386
lerlaptop#


Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749




---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] log_duration

2003-02-12 Thread Christopher Kings-Lynne

 Tom is right here.  log_duration _just_ prints the duration, so we would
 need to basically create a merged param that does log_duration and
 log_statement and have it activate only if the statement takes more than
 X milliseconds, something like log_long_statement, or something like
 that.

 Here are the log_* params we have:

   log_connections = false
   log_hostname = false
   log_source_port = false
   log_pid = false
   log_statement = false
   log_duration = false
   log_timestamp = false

OK, while I'm doing all this benchmarking and stuff - is there any sort of
option where I can see it logged when a sort doesn't have enought sort
memory and hence hits the disk?  eg. an elog(LOG) is emitted?

Chris


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Incremental backup

2003-02-12 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  Someone at Red Hat is working on point-in-time recovery, also known as
  incremental backups.  It will be in 7.4.
 
 Does that mean that the poor guy/gal is implementing redo for all the index
 types?

No idea.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] log_duration

2003-02-12 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  Tom is right here.  log_duration _just_ prints the duration, so we would
  need to basically create a merged param that does log_duration and
  log_statement and have it activate only if the statement takes more than
  X milliseconds, something like log_long_statement, or something like
  that.
 
  Here are the log_* params we have:
 
  log_connections = false
  log_hostname = false
  log_source_port = false
  log_pid = false
  log_statement = false
  log_duration = false
  log_timestamp = false
 
 OK, while I'm doing all this benchmarking and stuff - is there any sort of
 option where I can see it logged when a sort doesn't have enought sort
 memory and hence hits the disk?  eg. an elog(LOG) is emitted?

Someone asked about this at FOSDEM. The only way I know to do it is look
in the pgsql_temp directory, but they disappear pretty quickly.  Folks,
do we need something to report sort file usage?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] log_duration

2003-02-12 Thread Christopher Kings-Lynne

 Someone asked about this at FOSDEM. The only way I know to do it is look
 in the pgsql_temp directory, but they disappear pretty quickly.  Folks,
 do we need something to report sort file usage?

How about a new GUC variable: log_sort_tempfiles

And in the code that creates the temp file, if the GUC variable is true,
then do:

elog(LOG, Sort needed temp file.  Sort required 2456K.  Try increasing
sort_mem.);

Or something?

Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  Seriously, I know Linux can change these on the fly, and I'm pretty sure 
  Solaris can too.  I haven't played with BSD for a while so can't speak 
  about that.  Anyone else know?
 
 You cannot change SHMMAX on the fly on FreeBSD.

And part of the reason is because some/most BSD's map the page tables
into physical RAM (kernel space) rather than use some shared page table
mechanism.  This is good because it prevents the shared memory from
being swapped out (performance disaster).

It doesn't actually allocate RAM unless someone needs it, but it does
lock the shared memory into a specific fixed location for all processes.

The more flexible approach is to make shared memory act just like the
memory of a user process, and have other user processes share those page
tables, but that adds extra overhead and can cause the memory to behave
just like user memory (swapable).

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] location of the configuration files

2003-02-12 Thread J. M. Brenner


Christopher Kings-Lynne [EMAIL PROTECTED] wrote: 

  Okay, here's one: most Unix systems store all of the configuration
  files in a well known directory: /etc.  These days it's a hierarchy of
  directories with /etc as the root of the hierarchy.  When an
  administrator is looking for configuration files, the first place he's
  going to look is in /etc and its subdirectories.  

 No goddammit - /usr/local/etc.  Why can't the Linux community respect
 history
 
 It is the ONE TRUE PLACE dammit!!!

Well, to the extent that you're serious, you understand that 
a lot of people feel that /usr/local should be reserved for 
stuff that's installed by the local sysadmin, and your
vendor/distro isn't supposed to be messing with it. 

Which means if the the vendor installed Postgresql (say, the
Red Hat Database) you'd expect config files to be in /etc.
If the postgresql is compiled from source by local admin, 
you might look somewhere in /usr/local.

I've got the vauge feeling that this is all more than a
little silly... directory locations floating about depending
on who did what, as thought it were such a radical thing 
to do a ./configure, make  make install.  But this is a 
pretty common feeling among the unix world (more wide spread
than just in the Linux world). 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-12 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 I know you like it in that range, and 4 or 8 MB of buffers by default
 should not be a problem.  But personally I think if the optimal buffer
 size does not depend on both the physical RAM you want to dedicate to
 PostgreSQL and the nature and size of the database, then we have achieved
 a medium revolution in computer science. ;-)

But this is not about optimal settings.  This is about pretty good
settings.  As long as we can get past the knee of the performance curve,
I think we've done what should be expected of a default parameter set.

I believe that 1000 buffers is enough to get past the knee in most
scenarios.  Again, I haven't got hard evidence, but that's my best
guess.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] log_duration

2003-02-12 Thread Bruce Momjian


Well, part of the issue here is that it isn't always bad to use sort
file;  certainly it is better to use them than to swap.

We have a checkpoint_warning in 7.4 that will warn about excessive
checkpointing.  What would our criteria be for warning about sort_mem? 
Seems we would have to know how much free memory there is available, and
in fact, if there is lots of free memory, the sort files will just sit
in the kernel disk cache anyway.

I am not saying this is a bad idea --- we just need to define it clearer.

---

Christopher Kings-Lynne wrote:
  Someone asked about this at FOSDEM. The only way I know to do it is look
  in the pgsql_temp directory, but they disappear pretty quickly.  Folks,
  do we need something to report sort file usage?
 
 How about a new GUC variable: log_sort_tempfiles
 
 And in the code that creates the temp file, if the GUC variable is true,
 then do:
 
 elog(LOG, Sort needed temp file.  Sort required 2456K.  Try increasing
 sort_mem.);
 
 Or something?
 
 Chris
 
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Brain dump: btree collapsing

2003-02-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 It would be nice if VACUUM FULL would be able to compress the actual
 index file and return unused space to the operating system.  REINDEX
 does this, but I was thinking of something a little lighter that could
 be done automatically as part of VACUUM FULL.

But indexes tend to be very dependent on physical layout.  You can't
just shove stuff around without thinking about the consequences.
Tables (heaps) are *much* more forgiving about that.

My feeling is that what we need to fix now is index bloat during normal
operation.  If you want the indexes to actually *shrink*, that's a job
for REINDEX.  Perhaps someday we can improve on that --- but let's not
blur our focus on the immediate problem.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Changing the default configuration (was Re:

2003-02-12 Thread Tom Lane

 You cannot change SHMMAX on the fly on FreeBSD.

I think we suffered some topic drift here --- wasn't the last question
about whether SEMMAX can be increased on-the-fly?  That wouldn't have
anything to do with memory-mapping strategies...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Brain dump: btree collapsing

2003-02-12 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  It would be nice if VACUUM FULL would be able to compress the actual
  index file and return unused space to the operating system.  REINDEX
  does this, but I was thinking of something a little lighter that could
  be done automatically as part of VACUUM FULL.
 
 But indexes tend to be very dependent on physical layout.  You can't
 just shove stuff around without thinking about the consequences.
 Tables (heaps) are *much* more forgiving about that.
 
 My feeling is that what we need to fix now is index bloat during normal
 operation.  If you want the indexes to actually *shrink*, that's a job
 for REINDEX.  Perhaps someday we can improve on that --- but let's not
 blur our focus on the immediate problem.

My point is only that while we need VACUUM and VACUUM FULL to match all
heap needs, we need a VACUUM FULL capability for indexes too.  REINDEX
may be that capability, but it would be nice if we could compress out
some index space during VACUUM FULL.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Brain dump: btree collapsing

2003-02-12 Thread Bruce Momjian


I think having VACUUM record free index pages just like free heap pages
makes perfect sense, and is consistent.

This brings up one item it would be nice to address at the same time. 
It would be nice if VACUUM FULL would be able to compress the actual
index file and return unused space to the operating system.  REINDEX
does this, but I was thinking of something a little lighter that could
be done automatically as part of VACUUM FULL.  If we can do that, it
would make consistent behavior for vacuum and heap/index files.

---

Tom Lane wrote:
 Justin Clift [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  The deletion procedure could be triggered immediately upon removal of the
  last item in a page, or when the next VACUUM scan finds an empty page.
  Not sure yet which way is better.
 
  Having it triggered immediately upon removal of the last item in a page 
  would make for a more self maintaining system wouldn't it?  That 
  sounds nice.  :)
 
 Maybe.  This isn't about getting rid of VACUUM --- there's still a need
 for routine maintenance vacuums.  So the question really comes down to
 whether it's more efficient to do it in bulk during routine
 maintenance sweeps, or retail.  I'm not sold yet, but am leaning to
 the bulk side.
 
  In theory, if we find recyclable page(s) at the physical end of the index,
  we could truncate the file (ie, give the space back to the filesystem)
  instead of reporting these pages to FSM.  I am not sure if this is worth
  doing --- in most cases it's likely that little space can be released this
  way, and there may be some tricky locking issues.
 
  Sounds like this would be beneficial for environments with high 
  update/delete transaction volumes, perhaps on smaller amounts of 
  live/valid data.
 
 It would only really be worth doing if you made a substantial reduction
 in the number of rows in a table, and had no near-term intention of
 loading the table back up again.  Seems like it might be sufficient to
 tell people to REINDEX if they want the index size to drop in that
 scenario.  I will look at physically truncating the index during VACUUM,
 but I don't think it's worth getting real tense about...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] psql source suggestions

2003-02-12 Thread Bruce Momjian


Sure.  Shoot them over.

---

Jeroen T. Vermeulen wrote:
 I've been playing around with the source for psql, and as a result I
 did some cleaning up in common.c.  Would anyone be interested in seeing
 patches for that?  The main benefit is some eliminated code duplication,
 plus the removal of some warts like 'continue' and 'break' where they
 weren't needed.
 
 
 Jeroen
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] log_duration

2003-02-12 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  One nice thing is that each element is orthoginal.  But, for the
  functionality desired, we have to merge log_statement and log_duration
  and have it print for statements taking over X milliseconds.  I have no
  problem adding it, but it has to be clear it isn't orthoginal but is a
  conditional combination of two other parameters.
 
 Actually, I was wondering if we shouldn't *replace* the current
 log_duration with a combined form (that specifies a minimum interesting
 duration).  I can't quite see the need for orthogonality here.  The
 only reason you'd care about query duration is that you're looking for
 the slow ones, no?  So why bother logging the fast ones?  Besides, you
 can specify min-duration zero if you really want 'em all.

We did talk about this a while ago, and folks wanted the query printed
_before_ it was executed, so they could see the query in the logs at the
time it was issued, both for monitoring and for showing the time the
query started when log_timestamp is enabled.

Seems the clearest option would be for log_duration to print the query
string too, and convert it to an integer field.  I can see zero meaning
print all queries and durations.  What value do we use to turn it off? 
-1?

This would give us log_statement that prints at query start, and
log_duration that prints query and duration at query end.  How is that?

Maybe we should rename them as log_statement_start and
log_statement_duration.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] location of the configuration files

2003-02-12 Thread mlw







Tom Lane wrote:

  Kevin Brown [EMAIL PROTECTED] writes:
  
  
I assume $PGDATA was around long before GUC?

  
  
Yes, it was.  But I have not yet seen an argument here that justifies
why $SOMECONFIGDIRECTORY/postgresql.conf is better than
$PGDATA/postgresql.conf.  The latter keeps all the related files
together.  The former seems only to introduce unnecessary complexity.
You can only justify it as simpler if you propose hardwiring a value for
$SOMECONFIGDIRECTORY ... which is a proposal that will not fly with any
of the core developers, because we all run multiple versions of Postgres
on our machines so that we can deal with back-version bug reports,
test installations, etc.  It is unlikely to fly with any of the RPM
packagers either, due to the wildly varying ideas out there about the
One True Place where applications should put their config files.

(This point was pretty much why mlw's previous proposal was rejected,
IIRC.)
  

I wasn't talking about a "default directory" I was talking about configuring
a database in a configuration file.

While I accept that the PostgreSQL group can not be playing catch-up with
other databases, this does not preclude the notion accepting common practices
and adopting them.

Understand, I really like PostgreSQL. I like it better than Oracle, and it
is my DB of choice. That being said, I see what other DBs do right. Putting
the configuration in the data directory is "wrong," no other database or
service under UNIX or Windows does this, Period.

Does the PostgreSQL team know better than the rest of the world?

The idea that a, more or less, arbitrary data location determines the database
configuration is wrong. It should be obvious to any administrator that a
configuration file location which controls the server is the "right" way
to do it. Regardless of where ever you choose to put the default configuration
file, it is EASIER to configure a database by using a file in a standard
configuration directory (/etc, /usr/etc, /usr/local/etc, /usr/local/pgsql/conf
or what ever). The data directory should not contain configuration data as
it is typically dependent on where the admin chooses to mount storage.

I am astounded that this point of view is missed by the core group.


Mark.

Re: [HACKERS] More benchmarking of wal_buffers

2003-02-12 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
 I'm not sure what I could test next.  Does FreeBSD support anything other
 than fsync?  eg. fdatasync, etc.  I can't see it in the man pages...

You are already getting the best default for your OS.  It say 'fsync'
for default, but the comment says the default is OS-specific.  The only
thing you can compare there is open_fdatasync vs fdatasync.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] location of the configuration files

2003-02-12 Thread Tom Lane

mlw [EMAIL PROTECTED] writes:
 The idea that a, more or less, arbitrary data location determines the 
 database configuration is wrong. It should be obvious to any 
 administrator that a configuration file location which controls the 
 server is the right way to do it.

I guess I'm just dense, but I entirely fail to see why this is the One
True Way To Do It.  What you seem to be proposing (ignoring
syntactic-sugar issues) is that we replace postmaster -D
/some/data/dir by postmaster -config /some/config/file.  I am not
seeing the nature of the improvement.  It looks to me like the sysadmin
must now grant the Postgres DBA write access on *two* directories, viz
/some/config/ and /wherever/the/data/directory/is.  How is that better
than granting write access on one directory?  Given that we can't manage
to standardize the data directory location across multiple Unixen, how
is it that we will be more successful at standardizing a config file
location?

All I see here is an arbitrary break with our past practice.  I do not
see any net improvement.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] location of the configuration files

2003-02-12 Thread mlw







Tom Lane wrote:

  mlw [EMAIL PROTECTED] writes:
  
  
The idea that a, more or less, arbitrary data location determines the 
database configuration is wrong. It should be obvious to any 
administrator that a configuration file location which controls the 
server is the "right" way to do it.

  
  
I guess I'm just dense, but I entirely fail to see why this is the One
True Way To Do It.  What you seem to be proposing (ignoring
syntactic-sugar issues) is that we replace "postmaster -D
/some/data/dir" by "postmaster -config /some/config/file".  I am not
seeing the nature of the improvement.  It looks to me like the sysadmin
must now grant the Postgres DBA write access on *two* directories, viz
/some/config/ and /wherever/the/data/directory/is.  How is that better
than granting write access on one directory?  Given that we can't manage
to standardize the data directory location across multiple Unixen, how
is it that we will be more successful at standardizing a config file
location?

All I see here is an arbitrary break with our past practice.  I do not
see any net improvement.

  

There is a pretty well understood convention that a configuration file will
be located in some standard location depending on your distro. Would you
disagree with that?

There is also a convention that most servers are configured by a configuration
file, located in a central location. Look at sendmail, named,, et al. 

Here is the test, configure a server, with sendmail, named, apache, and PostgreSQL.
Tell me which of these systems doesn't configure right.

Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-12 Thread Bruce Momjian

Peter Eisentraut wrote:
 Tom Lane writes:
 
  Well, as I commented later in that mail, I feel that 1000 buffers is
  a reasonable choice --- but I have to admit that I have no hard data
  to back up that feeling.
 
 I know you like it in that range, and 4 or 8 MB of buffers by default
 should not be a problem.  But personally I think if the optimal buffer
 size does not depend on both the physical RAM you want to dedicate to
 PostgreSQL and the nature and size of the database, then we have achieved
 a medium revolution in computer science. ;-)

I have thought about this and I have an idea.  Basically, increasing the
default values may get us closer, but it will discourage some to tweek,
and it will cause problems with some OS's that have small SysV params.

So, my idea is to add a message at the end of initdb that states people
should run the pgtune script before running a production server.

The pgtune script will basically allow us to query the user, test the OS
version and perhaps parameters, and modify postgresql.conf with
reasonable values.  I think this is the only way to cleanly get folks
close to where they should be.

For example, we can ask them how many rows and tables they will be
changing, on average, between VACUUM runs.  That will allow us set the
FSM params.  We can ask them about using 25% of their RAM for shared
buffers.  If they have other major apps running on the server or have
small tables, we can make no changes.  We can basically ask them
questions and use that info to set values.

We can even ask about sort usage maybe and set sort memory.  We can even
control checkpoint_segments this way if they say they will have high
database write activity and don't worry about disk space usage.  We may
even be able to compute some random page cost estimate.

Seems a script is going to be the best way to test values and assist
folks in making reasonable decisions about each parameter.  Of course,
they can still edit the file, and we can ask them if they want
assistance to set each parameter or leave it alone.

I would restrict the script to only deal with tuning values, and tell
people they still need to review that file for other useful parameters.

Another option would be to make a big checklist or web page that asks
such questions and computes proper values, but it seems a script would
be easiest.  We can even support '?' which would explain why the
question is being ask and how it affects the value.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

87 matches

Mail list logo