from:"scott.marlowe"

On Mon, 26 Apr 2004, Josh Berkus wrote:

 Shachar,
 
  I think the concensus was that the runtime part was aprox. four lines 
  where the case folding currently takes place. Obviously, you would have 
  to get a var, and propogate that var to that place, but not actually 
  change program flow.
 
 That's only if you ignore the system catalogs entirely, which maybe you're 
 prepared to do.  If you want to change case folding for the system catalogs, 
 though, you'll need to update code in thousands of places, becuase the 
 back-end code is expecting lower-case identifiers 

As someone who has discussed this with Tom in the past, I seem to remember 
that there were major issues with handling the system catalogs, because 
internally, the backends treat the identifiers as if they have already 
been quoted.

I think the answer to all of this would require a lot of code being 
touched to either make it case fold, costing performance, or the 
replacement of the default lower cased catalog with upper cased catalog.

i.e. no simple switch setting, but an initdb option that would be set like 
locale currently is, for the life of the cluster.

A more comprehensive solution, one which allowed switching from upper 
folding to lower folding to no folding, to case insensitive, or some 
subset of those possibilities results in 

a:  slower backend performance, due to folding case for system catalogs
b:  touching a helluva lot of backend code to make it possible to fold up 
or down.

I'm not 100% sure on this all, but that seems to be the point Tom and I 
came to in our discussion, and neither of the two solutions seemed very 
good at the time.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Usability, MySQL, Postgresql.org, gborg, contrib, etc.

On Mon, 26 Apr 2004, Josh Berkus wrote:

  I think that a talented manager could make the case for certain features.
 
 So?   So could any community member with a good grasp of database engineering 
 and an ability to write persuasive e-mails.

I'd like to inject here that I was the one who started the whole argument 
over date style enforcement about a year ago.  It was a bit of a slog at 
first, but I felt my input was good, and the cause was just :-).

Because of that discussion, PostgreSQL accpeting dates like

2004-31-03 and turning them into 2004-03-31 on the fly stopped.  as did 
postgresql accepting a date like 03/31/2004 and turning it into 31/03/2004 
when it was set to accept Euro style dates.

I am not a PostgreSQL hacker, just a user.  My input made the difference.  
No title needed.  What PostgreSQL needs more of is caring users who can 
spot bugs / misbehaviours, make a decent argument about it, and maybe even 
a patch or two.  We're way more short on indians than chiefs in my 
opinion.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Usability, MySQL, Postgresql.org, gborg, contrib, etc.

On Tue, 27 Apr 2004, Jochem van Dieten wrote:

 [EMAIL PROTECTED] wrote:
  (5) Programming languages. We need to make a programming language standard
  in PostgreSQL. plpgsql is good, but isn't someone working on a Java
  language. That would be pretty slick.
 
 IMHO SQL/PSM would be the obvious choice for the standard 
 procedural language. Not only because it is part of the SQL 
 standard (ISO/IEC 9075-4:2003), but also because it is reasonably 
 intuitive and it is the same as what is implemented in DB2 (and 
 in MySQL 5 due sometime not so soon).
 The only problem is that there is no PostgreSQL implementation.

While I can see heading in that direction, the lack of an implementation 
makes this suggestion impractical.  It will take time not just to 
implement it, but to test it and debug it, and for it to reach maturity.

PL/pgsql is mature and tested, has a great deal of code already written 
for it, and has reached maturity.  I'd say including it by default 
represents little or no security risk, and increases the value, out of the 
box, of postgresql for most folks while costing very little in terms of 
wasted disk space etc...


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] [pgsql-advocacy] What can we learn from MySQL?

On Mon, 26 Apr 2004, Andrew Payne wrote:

 
 Bruce asked an excellent question:
 
  My question is, What can we learn from MySQL?  I don't know there is
  anything, but I think it makes sense to ask the question.
 
 After watching the traffic on this, the biggest MySQL lesson has gone
 largely unmentioned:  that a well-funded, well-marketed, focused commercial
 entity clearly associated with the project can do wonders to overcome
 feature and technical shortcomings.
 
 At some point (probably there now), I think the lack of a Postgres, Inc.
 is going to hinder adoption.  Companies want to 'buy' from vendors that look
 like real, viable companies, and provide them products with support,
 training, features, and direction.  With MySQL, you get one stop shopping.
 With Postgres, you've got to find and assemble the parts yourself.  Most
 CIOs stop there, and start waiting for MySQL to get better before switching
 from Oracle.

I'm gonna disagree here.  I think that not having a postgresql inc to go 
to means that by the time postgresql becomes ubiquitous, it will be like 
apache.  no company behind it, every company using it.  I.e. we'll earn 
our stripes one at a time by proving we're the better database for 95% of 
all purposes, and anyone not using postgresql will be behind the power 
curve and doing themselves no favor.  like CIO's who call Open Source 
Shareware and believe that .net provides for a more efficient 
programming environment, people who poo poo postgresql will find 
themselves behind the 8 ball in the long run.  No need for a postgresql 
inc to do that, just time, good code, and knowledgable DBAs choosing it 
more and more often.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Is there any method to keep table in memory at startup

On Wed, 21 Apr 2004 [EMAIL PROTECTED] wrote:

 Hi
 I am working on a project in postgres..in which i designed customized data type
 and operations on it.it requires a look up table..
 I have three options regarding this table...
 1. Every time a query is executed it creates table assigns values and after
 execution destroys it...which is overhead..
 
 2. store table on disk in database and access it whenever required but it
 degrades the performance
 
 3. whenever psql starts it can load the table in memory from database which is
 efficient way to do

PostgreSQL has no facility to put tables into memory.

Assuming this lookup table will be hit quite often, it WILL be in memory 
for selects.  updates / deletes / inserts will have to get flushed out to 
disk of course.

the Linux and BSD kernels are both quite good at keeping commonly used 
data in memory.  I think you are mistaken in assuming that an on disk 
table will be significantly slower than if it was fixed in memory due to 
the very efficient cachine of the most common unix kernels.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [pgsql-advocacy] What can we learn from MySQL?

On Mon, 26 Apr 2004, Andrew Payne wrote:

 For those that look to Apache:  Apache never had a well-established
 incumbent (Oracle), an a well-funded upstart competitor (MySQL).  Rob
 McCool's NCSA httpd (and later, Apache) were good enough and developed
 rapidly enough that they prevented any other HTTP server projects from
 getting critical mass.

This is a followup to my previous message where I mentioned apache, but 
did not really followup on it.

While Apache is and has been wildly popular for bulk hosing and domain 
parking, for serious commercial use, Netscape's enterprise server, now Sun 
One, has long been a leader in commercial web sites.  That has now changed 
too.  While Netscape's server was pretty good, it is simply harder to 
configure, not as versatile as apache, and not as reliable or as fast 
nowadays.  This was not always the case.  There was a time when its 
performance was considered to be much better than apache (I'm thinking 
about apache 1.3.4 or so) and apache configuration was a black art few 
understood.  with modern gui tools for configuring apache, and the 
incredible performance gains the late model 1.3 versions and now 2.0.x 
versions have, it is quickly displacing the more expensive netscape.

Apache did not start in first place when it comes to enterprise class 
web servers, no matter how many small personal web sites ran on it.  Most 
commercial companies didn't use it at first.  It too had to earn its 
stripes over time and by proving it was better.  Now I know people who 
think Open Source is just so much pie in the sky hand waving philosophical 
candy who think apache and jboss are the bomb.  they'll come around on 
PostgreSQL too, once someone with some foresight points out the advantages 
it has.  and one of its advantages is that it doesn't have a large 
monolithic organization driving development.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] TPC H data

On Wed, 21 Apr 2004, Shalu Gupta wrote:

 Hello,
 
 We are trying to import the TPC-H data into postgresql using the COPY
 command and for the larger files we get an error due to insufficient
 memory space.
 
 We are using a linux system with Postgresql-7.3.4
 
 Is it that Postgresql cannot handle such large files or is there some
 other possible reason.

what method(s) are you using to load the data?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [pgsql-advocacy] What can we learn from MySQL?

2004-04-23 Thread scott.marlowe

On Fri, 23 Apr 2004, Bruce Momjian wrote:

Here is a blog about a recent MySQL conference with title, Why MySQL
Grew So Fast:

http://www.oreillynet.com/pub/wlg/4715

and a a Slashdot discussion about it:

http://developers.slashdot.org/article.pl?sid=04/04/20/2229212mode=nestedtid=137tid=185tid=187tid=198

My question is, What can we learn from MySQL? I don't know there is
anything, but I think it makes sense to ask the question.

My immediate rhetorical response is What could the Tortoise learn from
the Hare?

I think we all know which is which in my question.

Questions I have are:

o Are we marketing ourselves properly?

I'm never sure about this. I think the best marketing is experienced
users selling pg to their bosses one at a time. While our MSSQL servers
at work have died under load innumerable times, our small collection of
postgresql servers (one's so old and embedded it's running 6.4) have been
very reliable. So, slowly but surely, PostgreSQL is proving itself here.

o Are we focused enough on ease-of-use issues?

Enough for me, but I don't think databases should necessarily be all that
easy to use by people who don't understand basic relational theory. So
for me, ease of use means things like transactable DDL and well indexed,
well written documentation. I suspect ease of use for my boss is
something entirely differnt, and may have to do with why he bought the EMS
postgresql manager packages he did.

o How do we position ourselves against a database that some
say is good enough (MySQL), and another one that some
say is too much (Oracle)

Hey, we're like the porridge in goldilock's, just right... :-)

dba folks don't pick MySQL, because it's so limited and basically has
so many bugs (it's a feature that we don't bounds check data!) And it's
pretty easy to get an Oracle guy to play with postgresql when you show him
things like transactionable DDL.

o Are our priorities too technically driven?

I don't think so. But I'm a database / coder geek. :-)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] contrib vs. gborg/pgfoundry for replication solutions

2004-04-21 Thread scott.marlowe

I almost agree, but I think things that are being actively developed to 
eventually move into the backend, like autovacuum or slony-I should be in 
contrib.  Things that aren't destined for backend integration should be 
removed though, like pgbench or dblink or whatnot.

On Wed, 21 Apr 2004, Joshua D. Drake wrote:

 Hello,
 
 My personal opinion is that contrib should be removed entirely. Just 
 have a contrib.txt that says all contrib modules are at pgfoundry or 
 whatever.
 
 Sincerely,
 
 Joshua D. Drake
 
 
 Jan Wieck wrote:
 
  Taking into account that quite a few people have repeatedly stated that 
  the components in contrib are considered more supported/recommended than 
  similar solutions found on gborg or any other external site, I suggest 
  we move the projects dbmirror and dblink to gborg. The rserv contrib 
  module seems to me to be an early Perl prototype of erserver, nobody is 
  working on any more. I suggest we drop that entirely.
  
  Comments/alternatives?
  
  
  Jan
  
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] pg_encoding not needed anymore

2004-04-20 Thread scott.marlowe

On Tue, 20 Apr 2004, Bruce Momjian wrote:

 Andrew Dunstan wrote:
  past.  I think createuser is much worse.  :-)
  

  
  Agreed. Actually, the big problem with the name initdb is that the 
  name is misleading, and newbies often get confused by it. You are 
  preparing a data store for many databases, not a single database.  But I 
  think it is far too sanctified by history to change now, just as Ken 
  Thompson now wishes he had put an 'e' on the end of 'creat' but can't go 
  back and fix it. Maybe we should think about a symlink/hardlink to use a 
  better name.
 
 Yea, initcluster would have been better, but cluster confuses with
 CLUSTER, just like database schema confuses with CREATE SCHEMA.

Maybe initpg or pg_init or something like that?


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Remove MySQL Tools from Source?

2004-04-16 Thread scott.marlowe

On Fri, 16 Apr 2004, Christopher Kings-Lynne wrote:

  I always ran one of the 2 scripts (can't remember which one) and after that 
  started checking the dump file, because there were things that didn't get 
  changed correctly[1].
  
  [1]: I always remember the first conversion I did. I found out that MySQL 
  accepted dates like 30/2/2000 or 0-0-.
  Very odd.
 
 Yes, MySQL has always accepted those as perfectly valid dates.  It's 
 quite broken.

the sad thing is that while MySQL implemented a -ansi switch that 
supposedly turns on ansi compliance, it only fixes about 1/2 of all the 
non-compliance issues.  Yet another half-implemented feature... :)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] make == as = ?

2004-04-08 Thread scott.marlowe

On Thu, 8 Apr 2004, Bruce Momjian wrote:

 Fabien COELHO wrote:
  
This would help me, at least, write correct and portable SQL. :)
  
   Added to TODO:
  
 * Add a session mode to warn about non-standard SQL usage
  
  So it seems that having C-like operators would hurt a lot;-)
  
  So you want to generate warnings for SERIAL, TEXT and a bunch of other
  types, WITH[OUT] OIDS, RULE, ~ regular expressions, arrays, any use of
  pg_catalog instead of information_schema (I may be wrong in the list, I
  don't have the standard at hand, but I think I'm right in the spirit)...
  
  This is going to be noisy;-)
 
 Yep, it sure is going to be noisy.

Could we consider a three (or more) way setting, for what to do?  
Something like:

sql_noncompliance_mode = error;
sql_noncompliance_mode = warn / notice;
sql_noncompliance_mode = ignore;

Just wondering...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_dump and INCREMENT BY

2004-04-07 Thread scott.marlowe

On Wed, 7 Apr 2004, Dennis Bjorklund wrote:

 On Wed, 7 Apr 2004, Dennis Bjorklund wrote:
 
 Replying to myself here :-)
 
  wants to import it into a 7.3 database. Use the 7.3 dump you might say, 
  but since BY does not do anything why not remove it from the dump output?
 
 I just realized there is yet another new construct in 7.4. The sequences
 are created with NO MAXVALUE (and MINVALUE). Couldn't these just be
 removed in the pg_dump output in this case. It's the default anyway,
 right.
 
 I just imported a fairly big dump from a 7.4 database into a 7.3 database 
 and the sequences was the only thing I had to change. Of course the 
 database in itself was designed on 7.3 so it didn't use any 7.4 features.
 
 It's not a big issue, it's just that it seems simple to dump SQL code (for
 sequences) that work the same on both 7.4 and older, not just 7.4.

Maybe a switch to set the destination version would be a good compromise?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] what do postgresql with view ?

2004-04-06 Thread scott.marlowe

On 31 Mar 2004, elrik wrote:

 [EMAIL PROTECTED] (elrik) wrote in message news:[EMAIL PROTECTED]...
  In information i have:
  
  1. when creating view : PostgreSQL parse the query and stock the tree query.
  2. when using : PostgreSQL use this tree like a subselect.
  
  my question : Do PostgreSQL makes an analyse of the resulted tree ?
 
 
 I can answer by myself : YES !
 
 In fact the probleme is when i have more than 11 tables in the query...

Sounds like you might be wresting with the genetic query optimizer.  Try 
turning up the geqo theshold and see if things work better.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] pg_dump end comment

2004-03-31 Thread scott.marlowe

On Wed, 31 Mar 2004, Philip Warner wrote:

 At 12:13 AM 31/03/2004, Bruce Momjian wrote:
 Yes, they have to check for a proper exit from pg_dump, but there is
 still a file sitting around after the dump, with no way to tell if it is
 accurate.
 
 Why don't we write a hash into the header or footer. Then use something like:
 
  pg_restore --verify dump-file
 
 if file integrity is the objective.

I like this idea.  Nice to be able to check the md5 sig to make sure a 
backup is not corrupted OR short.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Increasing security in a shared environment ...

2004-03-29 Thread scott.marlowe

On Mon, 29 Mar 2004, Marc G. Fournier wrote:

 On Mon, 29 Mar 2004, Dave Page wrote:
 
  It's rumoured that Euler Taveira de Oliveira once said:
   Hi Christopher,
  
The \l command should only list databases that the current user is
authorized for, the \du command should only list users authorized
for the current database (and perhaps only superusers should get
even that much information), etc.  Perhaps it is possible to set PG
to do this, but that should probably be the default.
   
   Seem reasonable. Why not prevent normal users to dig on the pg_catalog?
   What is the impact of it?
 
  Because they can't use tools like pgAdmin or phpPgAdmin unless they can at
  least read all the catalogs.
 
 k, but what I'm suggesting shouldn't prevent that, should it?  They should
 only be able to see those resources that they have permissions to see, not
 all of them ... no?

I think an auto-filtering system for \l and other backslash commands as 
needed, makes a lot more sense than trying 
to deny access to the catalogs.  Obscuring them for security reasons is no 
win, really.  Obscuring them so user number 1,000,000 in his own database 
doesn't have to look at user numbers 1 through 999,999 to see his database 
go by.

While I'm not sure I'd build a 1,000,000 user database, somewhere between 
the 80 we currently have at work and a few thousand you'd go nuts if you 
saw a bunch of data that didn't belong to you every time you hit \l.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Log rotation

2004-03-24 Thread scott.marlowe

On Wed, 24 Mar 2004, Andrew Dunstan wrote:

 . Peter Eisentraut's program
   pro: portable, better featured, no license issues
   con: code state uncertain, less well tested

Where is Peter's code available, I'd like to try it out.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Default Stats Revisited

2004-03-11 Thread scott.marlowe

On Thu, 11 Mar 2004, Josh Berkus wrote:

 Scott,
 
  I like it.  Would a multiplier be acceptable?  
  default_stats_index_multiplier = 10
 
 Yeah, I thought about that, but a multiplier would be harder to manage for 
 most people.I mean, what if your default_stats are at 25 and you want 
 your index_stats at 40?   PITA.   Also, if you wanted to increase the default 
 stats but *forgot* that the index_stats were a multiplier ...
 
 I think a straight number is less confusing.

But possible more error prone.  If you crank up the default statistics to 
50, but the index default is still 25...  OTOH, you could always have the 
setting of used for index default be whichever is greater... hmmm.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] About hierarchical_query of Oracle

2004-03-10 Thread scott.marlowe

On Wed, 10 Mar 2004, Li Yuexin wrote:

 Who can tell me how to complete oracle's hierarchical_query through 
 postgresql?

Look in the contrib/tablefunc directory for the connect_by function.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] raising the default default_statistics_target

2004-03-09 Thread scott.marlowe

On Sun, 7 Mar 2004, Tom Lane wrote:

 Neil Conway [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  This is something we need to consider, but we'll need more evidence
  before making a choice.  One thing that we have very little data about
  is how much difference it makes in the quality of planner choices.
 
  Right, but is there a practical way to actually get this data?
 
 I haven't thought of one yet, but perhaps someone will have an idea.

Hi Tom.  I ran some very simple tests on analyze times and query plan 
times on a very simple table, with data randomly distributed.  The index 
was on a date field, since that's what I was testing last.

This was all done on my 512Meg memory 1.1GHz celeron workstation with an 
IDE drive.  I'd love more input on better testing methodologies here...

with 100k or 1M rows that look kinda like this:  (I'll test 10M rows 
later, which means the dataset won't fit in memory, so there'll be lots of 
access going on.  Right now the 1M row table is 80 meg)

 select * from test2 limit 5;
info | dt  |   id
-+-+-
 Francize perfectible swirling fluctuates| 2004-05-20 20:12:04 | 2721995
 Fields chauffeur attentionality grandmother | 2004-04-07 14:36:02 | 2721996
 Belgium bilked explosively defendant| 2004-09-16 16:27:22 | 2721997
 perspectives Buenos Pollux discriminates| 2004-11-11 12:28:31 | 2721998
 Victorianize Savonarola blackmails sufficed | 2004-02-27 21:17:20 | 2721999
(5 rows)

here's what I get with different statistics targets for analyze times:

100k1M  1M  
analyze analyze plan
target  ms  ms  ms  
10  250 875 2   
20  350 1250
30  430 1500
40  520 1725
50  580 1900
60  690 2100
70  775 2175
80  850 2300
90  950 2400
100 100026002.5 
200 18063700
300 26004800
400 26005900
500 26007200
700 26009500
1000260013000   5   

Since this data is randomly distributed, I didn't bother doing a lot of 
testing to see how accurate each target setting was.  If that would be 
useful to know I'd gladly test it, but I was only setting out to test the 
time to analyze and the time to plan.

Note that I only tested 3 targets for planning time, as it didn't seem to 
make a very big difference.  The query was:

select * from test2 where dt between 'march 11, 2004' and 'march 13, 2004';

I also ran some quick tests on smaller tables (1000 and 10k rows) and 
there, the plateau that we see in the 100k analyze shows up much quicker, 
at something like 50 or so.  I.e. the analyze time flattened out quickly 
and higher numbers cost very little if anything.

Since this query was quite an easy plan, I'd expect to need a much more 
complex one to test the increase in planning time, say something that has 
to look at a lot of statistics.  Any particular join type or something 
that's likely to do that?



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [pgsql-hackers-win32] [HACKERS] Tablespaces

2004-03-05 Thread scott.marlowe

On Fri, 5 Mar 2004, Thomas Swan wrote:

 [EMAIL PROTECTED] wrote:
 
 [EMAIL PROTECTED] writes:
 
   
 
 [EMAIL PROTECTED] wrote:
   
 
 Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes:
   
 
 My feeling is that we need not support tablespaces on OS's without
 symlinks.
 
 
 To create symlinked directories on Win2k NTFS see:
   http://www.sysinternals.com/ntw2k/source/misc.shtml#junction
 I think Win2000 or XP would be a reasonable restriction for Win32 PG
 installations that want tablespaces.
 
 
 Oh, good --- symlinks for directories are all that we need for this
 design.  I think that settles it then.
 
   
 
 What archival tools are there that would restore this to this back to
 the
 filesystem: tar? zip?  What would happen if a symlink were removed or
 pointed to an invalid location while the postmaste was running?
 
 
 Well, for backup, just run tar or find on /data with a flag to
 follow symlinks, and you are done.  Can't get much easier than
 that.
   
 
 I'm ruferring to NTFS and the win32 platforms.  How does tar handle
 these symlinks on the NTFS filesystem?  What about if someone finds
 that FAT32 is significantly better for the database?
 
 
 
 tar doesn't know anything about PostgreSQL system catalogs.  If we use
 symlinks for tablespaces then it would be possible to backup downed
 databases with a simple tar command on every platform *I* care about
 (and probably Windows too).  Using system catalogs for this stuff
 would simply guarantee that I would have to read the system catalogs
 and then back up each tablespace manually.  In short, your idea would
 trade off (maybe) having to backup tablespaces manually on a few
 platforms for the certainty of having to backup tablespaces manually
 on all platforms.
 
 How is that a win?
 
   
 
 Apparently, I have failed tremendously in addressing a concern. The
 question is does PostgreSQL need to rely on symlinks and will that
 dependency introduce problems? 
 
 There is an active win32 port underway (see this mailing list).   One
 proposal was to try to use an OS specific filesystem feature to perform
 a symlink on NTFS.  Can the special symlink that NTFS allegedly supports
 be archived the same way symlinks are archived on Unix?  If so, is there
 a utility that can do this (zip, tar, etc).  The backup operator would
 still need to know what directories needed to be archived in addtion to
 the pgdata directory.Is this symlink structure a normal/special file
 that can be archived by normal means (tar,zip, etc)?

According to this page:

http://www.linuxinfor.com/en/man1/ln.1.html

from the linux man pages, 
On existing implementations, if it is at all possible to make a hard link 
to a directory, this may be done by the superuser only. POSIX forbids the 
system call link(2) and the utility ln to make hard links to directories 
(but does not forbid hard links to cross filesystem boundaries).

and states that the command ln is POSIX 1003.2. However, POSIX 1003.2 
(1996) does not discuss soft links. Soft links were introduced by BSD, and 
do not occur in System V release 3 (and older) systems.

I fear the more useful of the two would be soft links, but if soft links 
are not a part of the POSIX standard, then postgresql probably shouldn't 
base key features on them unless said features would be onerous to 
implement without soft links.

Oddly enough though, Microsoft has now released their unix services 
package for free, and it comes with the commands to create a symbolic 
link, and runs on Windows NT 4.0. Windows 2000. Windows XP. and Windows 
Server 2003.

So, soft links would appear to not be a real non-starter, as long as the 
ability to make softlinks on those systems won't rely on having some 
strange package installed (like MS's Unix services package.)

I imagine there's a standard OS call in modern MS OSes that will let you 
create a symbolic link with no special libs installed, and if that's the 
case, the the argument that maybe FAT would be a better file system comes 
under the same heading as running your database on NFS.  Neither is a good 
idea, and PGSQL can't guarantee normal, reliable operation.




 
 Example:
 
 PGDATA is C:\pgdata
 I have a tablespace in Z:\1\ and Z:\2\
 There exists an alleged symlink in
 C:\pgdata\data\base\tablespaces\schmoo - Z:\1
 
 Can I archive [ C:\pgdata, Z:\1, Z:\2 ], restore them, and have
 postgresql working just as before?

Have you tried building said structure and backing up and restoring it to 
see?

 We aren't talking about a feature that work[s] on Linux on not on
 FreeBSD.  We are talking about a feature that works on every OS that
 suports symlinks (which includes even operating systems like Windows
 that PostgreSQL doesn't currently support).
 
 Hello?  What was this response from Tom Lane? My feeling is that we
 need not support tablespaces on OS's without symlinks.  That seems to
 be indicative of a feature set restriction

Re: [HACKERS] BTrees with record numbers

2004-03-01 Thread scott.marlowe

On 27 Feb 2004, Chad wrote:

 Is it possible for Postgres Btrees to support access by logical row number ?
 If not available is ti a huge job to support for sombebody willing to have a go ?

Are talking about logical row operators as maintained by your own code 
outside the database, or having postgresql suddenly start maintaining 
logical row numbers?  I doubt postgresql will ever have built in logical 
row numbers because maintaining them in a materialized way is quite 
expensive.

On the other hand, if you just need logical row numbers for a one off 
thing, you can do this:

create temp sequence judy;
select *, nextval('judy') from mytable;


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Tablespaces

2004-02-27 Thread scott.marlowe

On Fri, 27 Feb 2004, Zeugswetter Andreas SB SD wrote:

 
   Ahh. I forgot to detail my ideas on this. It seems to me that we cannot
   drop a table space until the directory is empty.
 
 Agreed.
 
  
  How would it get to be empty?  Are you thinking of some sort of connect
  database to tablespace and disconnect database from tablespace
  commands that would respectively create and delete the per-database
  subdirectory?  That seems moderately reasonable to me.  We could then
 
 I would only allow the drop if the directory only contains empty db oid 
 directories.

Wouldn't this be better tracked in the dependency tracking that's already 
built into postgresql?  Checking to see if the directory is empty is open 
to race conditions, but locking the dependency tracking while dropping a 
tablespace isn't.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Tablespaces

2004-02-27 Thread scott.marlowe

On Fri, 27 Feb 2004, Gavin Sherry wrote:

 On Thu, 26 Feb 2004, Alex J. Avriette wrote:
 
  On Thu, Feb 26, 2004 at 11:22:28PM +1100, Gavin Sherry wrote:
 
   Certainly, table spaces are used in many ways in oracle, db2, etc. You can
   mirror data across them, have different buffer sizes for example.
   In some implementations, they can be raw disk partitions (no file system).
   I don't intend going this far, however.
 
  Perhaps now would be a good time to bring up my directio on Solaris question
  from a year or so back? Is there any interest in the ability to use raw
  disk?
 
 I do not intend to undertake raw disk tablespaces for 7.5. I'd be
 interested if anyone could provide some real world benchmarking of file
 system vs. raw disk. Postgres benefits a lot from kernel file system cache
 at the moment. Also, I believe that database designers have traditionally
 made bad file system designers. Raw database partitions often lack the
 tools essential to a scalable environment. For example, the ability to
 resize partitions.

Is possible / reasonable / smart and or dumb to look at implementing the 
tablespaces as riding atop the initlocation handled stuff.  I.e. 
postgresql can only create tablespaces in areas that are created by 
initlocation, thus keeping it in its box, so to speak?


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Tablespaces

2004-02-27 Thread scott.marlowe

On Fri, 27 Feb 2004, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  Is possible / reasonable / smart and or dumb to look at implementing the 
  tablespaces as riding atop the initlocation handled stuff.
 
 In my mind, one of the main benefits of this work will be that we'll be
 able to get *rid* of the initlocation stuff.  It's a crock.

OK, that's fine, but I keep thinking that a superuser should have to 
create the tablespace itself, and then tables can be assigned by users 
based on the rights assigned by the dba / superuser.  Is that how we're 
looking at doing it, or will any user be able to create a tablespace 
anywhere postgresql has write permission, or will only dbas be able to 
create AND use table spaces. I'm just not sure how that's gonna be 
handled, and haven't seen it addressed.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Sparc optimizations

2004-02-24 Thread scott.marlowe

On Tue, 24 Feb 2004, Shridhar Daithankar wrote:

 http://www.osnews.com/printer.php?news_id=6136

That page gets a please don't link to printer ready pages error and 
redirects to here:

http://www.osnews.com/story.php?news_id=6136




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] MS SQL features for new version

2004-02-17 Thread scott.marlowe

On Tue, 10 Feb 2004, Rodrigo wrote:

 Shridhar Daithankar wrote:
  Just stumbled upon this. just an FYI,
  
  http://www.microsoft.com/sql/yukon/productinfo/top30features.asp
  
   Shridhar
 
  From the page:
 
   A new Snapshot Isolation (SI) level will be provided at the
   database level. With SI, users will be able to access the
   last committed row using a transitionally consistent view
   of the database. This capability will provide greater
   scalability for very large database (VLDB) implementations.
 
 Is Snapshot Isolation == MVCC ?

I think it goes that MVCC is a kind of snap shot, but snap shotting could 
be provided by more ways than just MVCC.

But I'm not 100% certain on that.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] How can I have 2 completely seperated databases in

On Wed, 11 Feb 2004 [EMAIL PROTECTED] wrote:

 Thank you very much for your reply.
 
 Yes, that's true. But it seems not a good idea if I have many databases
 and I want them totally seperated with each other.
 
 What's your opinion? Thanks.

OK, here's the issue.  Postgresql uses certain resources in a shared 
manner, and other resources are completely seperate.  For instance, the 
shared memory buffers are shared within a single instance or cluster, by 
all the databases.  One data directory, and one collation, and one set of 
logs are also shared by one instance.

The individual databases within a cluster share a set of global users.  
I.e if I create a user in one database, he can then be granted access to 
the other databases (or denied access) with a simple change to 
pg_hba.conf.  So, it's very easy to add / remove people's access to 
individual databases.

If you seperate out each database into it's own instance, you now have two 
(or more) postgresql instances, each with a different data directory, 
shared memory buffers, user list and passwords.  I.e. now nothing passes 
between them, at all.

If you would have allocated 128 megs of shared buffer memory for a single 
cluster which contained 4 databases, and you split those out into 
individual instances, then you'd need to give each of the four cluster / 
instances 32 megs of shared buffer or you'd risk using up more memory than 
before.

With the single cluster, if one database has a lot of data to sling 
around, and the others are handling a few k at a time, it has 128 Megs to 
work in.  With four clusters, no matter how little the other three are 
working, you'd only have 32 meg to play in.

Taking this to the logical extreme of having n databases, where n is 
fairly good size, say 20, 40, or 100, then you have issues that if you set 
up each database with enough shared memory to do its job when it needed 
to, you risk starting a swap storm should a couple dozen of those 
databases have a few large result sets open, thus using up all the shared 
memory they'd have alloted.

Lowering the shared memory for each database low enough to prevent this 
would result in individual databases that each had very small amounts of 
shared memory.

Also, maintainance gets harder.  You now have to vacuum multiple seperate 
clusters, and you need to schedule it so that you don't have two or three 
running at once and swamping your storage subsystem.

For certain setups, multiple clusters are a great thing.  I've used them 
as hot backups where I put a secondary instance online, placed its 
storage on a NAS, backed up to it, and then shut it down to have a cold 
live spare, or used it for testing administrative procedures one shouldn't 
practice on a live database.

But for seperating out applications from each other, there's really 
nothing to be gained by putting each seperate database application into 
it's own cluster.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] How can I have 2 completely seperated databases in

On Thu, 12 Feb 2004, Rod Taylor wrote:

  But for seperating out applications from each other, there's really 
  nothing to be gained by putting each seperate database application into 
  it's own cluster.
 
 I believe the initial email requested individual logs, and presumably
 the ability to grant superuser access without risking a user crossing
 into another clients space.

Well, I got the feeling it was only the logs he wanted seperate, and if 
that's all he wanted, then seperating will get him that, but there's a 
cost, and I wanted to make sure he understood that as well.

I'm not sure he was intimating seperate super users, but I could 
definitely see situations where that would be useful.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] RFC: Query Planner making a distinction between Cross

On Thu, 12 Feb 2004, Stef wrote:

  U.  Postgresql doesn't natively support cross database queries...
  
 
 I know, but it does schema's, and currently, the same
 notation is used to specify schema's as 'cross database'.
 
 So the planner often reports 'cross-database not allowed'
 in areas where it should at least report 'cross-schema
 support is unavailable for this'
 
 case in point, the example trigger. i would have expected
 deliberate schemaname.table during an insert to work, but
 instead the parser complains about cross-database.

I would think just changing the error message to no schema by the name of 
suchandsuch found would make it pretty clear.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] RFC: Query Planner making a distinction between Cross

On Thu, 12 Feb 2004, Stef wrote:

   case in point, the example trigger. i would have expected
   deliberate schemaname.table during an insert to work, but
   instead the parser complains about cross-database.
  
  I would think just changing the error message to no schema by the name of 
  suchandsuch found would make it pretty clear.
  
 
 indeed, the only problem being, that this is a 
 -deliberately- called schema and it does exist
 jst that its not in the search_path. surely the
 pl/pglsql or parser should be able to search the
 schemanames if i give it a -deliberate- name and
 if it isnt there say 'not found' ?
 
 so, either thats an error (improper parsing/expr
 forming) in the pl/pgsql trigger code, or, well,
 i dont know. 
 
 thoughts ? comments ? barking mad ?

Hmmm.  I would think the first step would be to simply change the cross-db 
queries aren't supported to one of schema either does not exist or is not 
in the search path.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] How can I have 2 completely seperated databases in

2004-02-11 Thread scott.marlowe

On Wed, 11 Feb 2004 [EMAIL PROTECTED] wrote:

 Hi, all
 
 What should I do if I want to have 2 completely seperated databases in
 PostgreSQL? I want each database to have its own data, log and
 everything needed to access that database. I don't want them to share
 anything. Has anyone done this before? Or,
 could anyone give me some clue of how to do this?

You're telling us what you want, but not the why.  The why may help us to 
figure out both how to do it, and whether or not it's a good idea.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [PATCHES] Current-stream read for psql's \copy

2004-02-10 Thread scott.marlowe

On Tue, 10 Feb 2004, Tom Lane wrote:

 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  No it doesn't.  EOF will do fine.  The source program doesn't
  necessarily have to know anything about COPY, as long as its output is
  in a format COPY can cope with (eg, tab-delimited).
 
  The current behavior seems quite strange and counter-intuitive.  We
  might break the code for 1-2 people, but we will make it more
  predicable for everyone using it.
 
 I don't think it's acceptable to simply remove the functionality.
 If you wanted to argue about swapping the meanings of STDIN and -
 (as defined by the patch) then we could debate about which way is
 more consistent and whether it's worth breaking backwards compatibility
 to improve consistency.  I could probably be talked into supporting
 that; as you say, we've done that before.  But taking out a useful
 behavior that has been there a long time, simply because you've decided
 it's unintuitive, is not okay.

Why not make a -i switch (for input file) that does it the way Bruce wants 
and leave -f alone so people who depend on it behaving the way it does 
won't get a surprise in their next upgrade?


---(end of broadcast)---
TIP 8: explain analyze is your friend

[HACKERS] bug in substring???

2004-02-06 Thread scott.marlowe

I'm using substring.  Since I'm a coder more than a database guy, I 
expected this:

select substring('abcdefgh',0,4);

would give me 

abcd

but it gives me a left aligned 'abc'

select substring('abcdefgh',1,4);

works fine.

select substring('abcdefgh',-4,4);

gives me nothing.  Shouldn't a negative offset, or even 0 offset result in 
an error or something here?  Or is there a special meaning to a negative 
offset I'm not getting?

Just wondering.  




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] bug in substring???

2004-02-06 Thread scott.marlowe

On Fri, 6 Feb 2004, Joe Conway wrote:

 scott.marlowe wrote:
  gives me nothing.  Shouldn't a negative offset, or even 0 offset result in 
  an error or something here?  Or is there a special meaning to a negative 
  offset I'm not getting?
 
 In varlena.c there is this comment:
 
   * text_substr()
   * Return a substring starting at the specified position.
   * - thomas 1997-12-31
   *
   * Input:
   *   - string
   *   - starting position (is one-based)
   *   - string length
   *
   * If the starting position is zero or less, then return from the start
   *  of the string adjusting the length to be consistent with the
   *  negative start per SQL92. If the length is less than zero, return
   *  the remaining string.

thanks.  I just got done looking up the SQL explanation, and I think my 
head exploded.  Thanks for the heads up.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Question on database backup

2004-02-04 Thread scott.marlowe

On Wed, 4 Feb 2004, Michael Brusser wrote:

 We have customers who prefer to use their backup facilities
 instead of what we provide in the app (we use pg_dump)
 I hear speed is at least one consideration.
 
 The questions I need to answer are these:
 
 1) Is this absolutely safe to do file copy (cpio, or 
 smth. else, whatever the robust backup app. would use)
 on the Postgres db, when it's completely shut down.

Yes, it is.

 2) Same question, but the database is up and running in
 read-only mode. We're making sure that no updates are 
 taking place.

Most likely, it is.  No guarantees if the database is up and running, even 
if you're certain there are no updates happening.

Also, you can use a snapshotting file system to make a backup image and 
then back up the image, while the database is up and being accessed, both 
by readers and writers, assuming you get REAL snapshots.

 If it matters - this is on Solaris, HP, Linux.
 We're using v.7.3.2 running on UDS and v.7.3.4 on TCP/IP
 We provide no explicit settings for wal, fsync and the like.
 And (yes, I know) they often install it on NFS.

OK, here's a couple more issues to chew on as well.  Suppose you have a 
backup, and the database server was compiled with UNKNOWN switches.  The 
machine burns to the ground.  now you get to try and figure out how to 
compile the database on the new server so it can read the old dataset.  
This may or may not be a complete friggin' nightmare for you.  Dumps can 
move between versions / hardware configs / differently compiled versions 
of postgresql with some amount of reliability.  binary copies, may or may 
not move so easily.

Scenario II, the HP burns to the ground, and your boss just buys a big old 
intel box.  how do you get your data up and running with a binary backup?  
you don't.

Scenario III.  Subtle corruption gets into your dataset due to a bad block 
or what not.  No one notices for a while.  Suddenly, someone notices.  
With only file system backups, with no error messages in them, how do you 
determine when the corruption occurred and get the uncorrupt data out 
leaving the corrupted behind?

Plain and simple.  Postgresql is designed to be backed up by pg_dump.  
Using anything else isn't supported so to speak, and may cause you 
untold grief in the future.  That said, sometimes file backups are the 
perfect solution, just go into with your eyes open to the possible 
problems, and I'd make a pg_dump every so often just in case.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Write cache

2004-01-28 Thread scott.marlowe

On Wed, 28 Jan 2004 [EMAIL PROTECTED] wrote:

 I agree I MAY have an hardware problem. What happens is more a system
 freeze than a system crash (there's no panic, no nothing, just freezes, no
 disk activity, not network)

I would suspect either bad hardware,a flakey SCSI driver, or a possible 
kernel bug.  If your system is freezing hard, it is NOT postgresql's 
fault.  It simply doesn't have the access to the kind of system resources 
needed to freeze a machine.

Is there a different SCSI driver / card you can try in there?  We've (and 
many others have too) had good luck with the LSI/MegaRAID cards and both 
the older 1.18 seris and new 2.x series drivers.  No freezes, no crashes, 
no hangs on the boxes with those in them.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Bunch o' dead code in GEQO

2004-01-22 Thread scott.marlowe

On Wed, 21 Jan 2004, Tom Lane wrote:

 The GEQO planner module contains six different recombination algorithms,
 only one of which is actually used --- the others are ifdef'd out, and
 have been ever since we got the code.  Does anyone see a reason not to
 prune the deadwood?

considering the recent discussion about REALLY slow query planning by the 
GEQO module, it might be worth testing each one to see which works best 
before lopping them off.

But I don't do anything that needs GEQO, so for me, it doesn't really 
matter.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Bunch o' dead code in GEQO

2004-01-22 Thread scott.marlowe

On Thu, 22 Jan 2004, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  On Wed, 21 Jan 2004, Tom Lane wrote:
  The GEQO planner module contains six different recombination algorithms,
 
  considering the recent discussion about REALLY slow query planning by the 
  GEQO module, it might be worth testing each one to see which works best 
  before lopping them off.
 
 I'm assuming that the original author of the GEQO code already did that
 testing ...

Hmmm.  I was figuring he wasn't sure so he left them in for other people 
to test.  Is this a part of the code that eats up much time, or something 
simple and fast that isn't part of the GEQO takes 8 seconds to plan 
problem?


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] cache control?

2004-01-16 Thread scott.marlowe

On Fri, 16 Jan 2004, Michael Brusser wrote:

 Is there a way to force database to load
 a frequently-accessed table into cache and keep it there?

Nope.  But there is a new cache buffer handler that may make it into 7.5 
that would make that happen automagically.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] What's planned for 7.5?

2004-01-12 Thread scott.marlowe

On Mon, 12 Jan 2004, Martin Marques wrote:

 Mensaje citado por Marc G. Fournier [EMAIL PROTECTED]:
 
  
  Native Win32 is planned for it (whether it makes it or not is another
  question, but it is the goal) ...
 
 Replication wasn't another BIG one?

Actually, I think it was PITR (Point in Time Recovery).  Of course, since 
that works by replaying log files into the database, it's a pretty good 
way of setting up replication, which would likely follow a release or so 
after PITR was implemented.

OTOH, there are some replication solutions already available, just no as 
part of the core.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Project status pages

2003-12-22 Thread scott.marlowe

On Fri, 19 Dec 2003, Bruce Momjian wrote:

 Robert Treat wrote:
  Wasn't there a patch posted many months ago for PITR. IIRC it wasn't
  complete, but would be a good starting point for those interested in
  helping out. If it's in the archives it would be nice to add a link to
  it on the project page... which brings up the question on whats the
  process for updating these pages?  Perhaps they should be moved into the
  wiki framework up on techdocs?
 
 They are just web pages.  If someone want to set up wiki and pull the
 content, I can remove my versions.

http://techdocs.postgresql.org/guides is an open wiki instance, you could 
just put it there.  I've edited a couple of pages there.  Nothing fancy, 
but it is pretty easy to edit.  You don't (or didn't have to until 
recently if it's changed) have to register to use it.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Resurrecting pg_upgrade

2003-12-16 Thread scott.marlowe

On Tue, 16 Dec 2003, Jon Jensen wrote:

 On Tue, 16 Dec 2003, Jan Wieck wrote:
 
  If you want to prevent accidential access, start postmaster on a 
  non-standard port.
 
 That seems like an unfriendly thing to do. You'd have to check to see what 
 port is standard for this particular installation, and pick something 
 else. You may choose an unused port, but perhaps it needs to be used in a 
 few minutes by some other process, but then will be occupied. The 
 administrator may also not be happy to have an open port facing the world, 
 or even just other possibly untrusted users on the same machine, assuming 
 you bind to localhost.

But aren't ports above a certain number fair game?

Yep, just answered my own question, quoting from /etc/services:

The latest IANA port assignments can be gotten from
#   http://www.iana.org/assignments/port-numbers
# The Well Known Ports are those from 0 through 1023.
# The Registered Ports are those from 1024 through 49151
# The Dynamic and/or Private Ports are those from 49152 through 65535

so as long as we use 49152 and above we're cool.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Performance features the 4th

2003-11-10 Thread scott.marlowe

On Sun, 9 Nov 2003, Jan Wieck wrote:

 scott.marlowe wrote:
 
  On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:
  
  - Original Message - 
  From: Jan Wieck [EMAIL PROTECTED]
   Tom Lane wrote:
Gaetano and a couple of other people did experiments that seemed to show
it was useful.  I think we'd want to change the shape of the knob per
later suggestions (sleep 10 ms every N blocks, instead of N ms every
block) but it did seem that there was useful bang for little buck there.
  
   I thought it was sleep N ms every M blocks.
  
   Have we seen any numbers? Anything at all? Something that gives us a
   clue by what factor one has to multiply the total time a VACUUM
   ANALYZE takes, to get what effect in return?
  
  I have some time on sunday to do some testing.  Is there a patch that I can
  apply that implements either of the two options? (sleep 10ms every M blocks
  or sleep N ms every M blocks).
  
  I know Tom posted the original patch that sleept N ms every 1 block (where N
  is  10 due to OS limitations).  Jan can you post a patch that has just the
  sleep code in it? Or should it be easy enough for me to cull out of the
  larger patch you posted?
  
  The reason for the change is that the minumum sleep period on many systems 
  is 10mS, which meant that vacuum was running 20X slower than normal.  
  While it might be necessary in certain very I/O starved situations to make 
  it this slow, it would probably be better to be able to get a vacuum that 
  ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
  less than 10mS on most systems, it's better to just leave it at a fixed 
  amount and change the number of pages vacuumed per sleep.
 
 I disagree with that. If you limit yourself to the number of pages being 
 the only knob you have and set the napping time fixed, you can only 
 lower the number of sequentially read pages to slow it down. Making read 
 ahead absurd in an IO starved situation ...
 
 I'll post a patch doing
 
  every N pages nap for M milliseconds
 
 using two GUC variables and based on a select(2) call later.

I didn't mean fixed in the code  I meant in your setup.  I.e. find a 
delay (10mS, 50, 100 etc...) then vary the number of pages processed at a 
time until you start to notice the load, then back it off.

Not being forced by the code to have one and only one delay value, setting 
it yourself.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Performance features the 4th

2003-11-07 Thread scott.marlowe

On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:

 - Original Message - 
 From: Jan Wieck [EMAIL PROTECTED]
  Tom Lane wrote:
   Gaetano and a couple of other people did experiments that seemed to show
   it was useful.  I think we'd want to change the shape of the knob per
   later suggestions (sleep 10 ms every N blocks, instead of N ms every
   block) but it did seem that there was useful bang for little buck there.
 
  I thought it was sleep N ms every M blocks.
 
  Have we seen any numbers? Anything at all? Something that gives us a
  clue by what factor one has to multiply the total time a VACUUM
  ANALYZE takes, to get what effect in return?
 
 I have some time on sunday to do some testing.  Is there a patch that I can
 apply that implements either of the two options? (sleep 10ms every M blocks
 or sleep N ms every M blocks).
 
 I know Tom posted the original patch that sleept N ms every 1 block (where N
 is  10 due to OS limitations).  Jan can you post a patch that has just the
 sleep code in it? Or should it be easy enough for me to cull out of the
 larger patch you posted?

The reason for the change is that the minumum sleep period on many systems 
is 10mS, which meant that vacuum was running 20X slower than normal.  
While it might be necessary in certain very I/O starved situations to make 
it this slow, it would probably be better to be able to get a vacuum that 
ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
less than 10mS on most systems, it's better to just leave it at a fixed 
amount and change the number of pages vacuumed per sleep.

I'm certainly gonna test the patch out too.  We aren't really I/O bound, 
but it would be nice to have a database that only slowed down ~1% or so 
during vacuuming.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Hacking PostgreSQL to work in Mac OS X 10.3 (Panther

Is this a bug we should fix for 7.3.5 when it eventually comes out?

On Tue, 4 Nov 2003, Andrew Rawnsley wrote:

 
 Just build RC1 today on Panther, no problems.
 
 
 On Nov 4, 2003, at 5:06 PM, Jeff Hoffmann wrote:
 
  Tom Lane wrote:
  [EMAIL PROTECTED] writes:
  After spending a few hours of trying to get Postgresql7.3.4 to build
  from source (tar.gz) on a Panther (release, not beta) system,
  Try 7.4RC1 instead.  Apple made some incompatible changes in their
  compiler in Panther.
 
  I was going to recommend the same thing.  I compiled a 7.4 beta out of 
  the box without a hitch, so I'd assume the RC would be fine as well.
 
  -- 
 
  Jeff Hoffmann
  PropertyKey.com
 
 
  ---(end of 
  broadcast)---
  TIP 8: explain analyze is your friend
 
 
 
 Andrew Rawnsley
 President
 The Ravensfield Digital Resource Group, Ltd.
 (740) 587-0114
 www.ravensfield.com
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] 7.4RC1 tag'd, branched and bundled ...

On Tue, 4 Nov 2003, Gaetano Mendola wrote:

 I agree in general with you for these general arguments, but here we 
 are talking about to introduce a sleep ( removable by guc ) or not! What 
 about the hash refactoring introduced with 7.4? Are you going to
 discourage people to use the hash?

That's not fair.  Everyone's had the chance to test 7.4 betas 1 through 4 
on their own hardware to see if it is safe and stable.  If I've load 
tested the betas and found them reliable, and suddenly find that 7.4 rc1 
or release were to be less reliable (I'm not talking about THIS patch in 
particular, I'm talking about ANY patch...) than the betas I would be 
upset that the change happened at the end instead of the beginning of the 
release cycle.

I think this patch is a great idea, but it's probably got a lot of 
refinement coming down the line.  I'd be glad to test it on the 7.4 branch 
as a patch, but it's just too late to put performance enhancements into 
the main branch.  Now is bug fixing time, not performance tweaking time.  
That was pre-beta1 really.

To review: The hasgagg has had testing, and lots of it, by the people who 
need it.  This patch has had little testing.  While I'm sure it's a fairly 
safe change, and likely to be a good thing overall, it's just too late in 
the cycle for it to get in.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Very poor estimates from planner

On Wed, 5 Nov 2003, Rod Taylor wrote:

 Since this is a large query, attachments for the explains / query.
 
 Configuration:
 dev_iqdb=# select version();
 version
 
  PostgreSQL 7.4beta1 on i386-portbld-freebsd4.7, compiled by GCC
 2.95.4
 (1 row)
 
 SET default_statistics_target = 1000;
 ANALYZE;
 set from_collapse_limit = 100;
 set join_collapse_limit = 20;

I'm not sure if that will actually change the default_statistics_target of 
the tables you're analyzing, I think it will only apply to newly created 
tables.  

I believe you have to alter table alter column set statistics 1000 for 
each column you want a statistic of 1000.  You might wanna try starting 
with 50 or 100 and see if that works first.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Very poor estimates from planner

On Wed, 5 Nov 2003, Tom Lane wrote:

 Rod Taylor [EMAIL PROTECTED] writes:
  I'm not sure if that will actually change the default_statistics_target
 
  Hmm.. I was under the impression that it would work for any tables that
  haven't otherwise been overridden.
 
 It will.  I think Scott is recalling the original circa-7.2
 implementation, where it wouldn't.  If you're unsure that you
 affected it, check out the actual sizes of the array values in pg_stats.

Hey, can anyone guess what version I'm running in production :-)  Thanks 
for the catch.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-04 Thread scott.marlowe

On Tue, 4 Nov 2003, Tom Lane wrote:

 Jan Wieck [EMAIL PROTECTED] writes:
  What still needs to be addressed is the IO storm cause by checkpoints. I 
  see it much relaxed when stretching out the BufferSync() over most of 
  the time until the next one should occur. But the kernel sync at it's 
  end still pushes the system hard against the wall.
 
 I have never been happy with the fact that we use sync(2) at all.  Quite
 aside from the I/O storm issue, sync() is really an unsafe way to do a
 checkpoint, because there is no way to be certain when it is done.  And
 on top of that, it does too much, because it forces syncing of files
 unrelated to Postgres.
 
 I would like to see us go over to fsync, or some other technique that
 gives more certainty about when the write has occurred.  There might be
 some scope that way to allow stretching out the I/O, too.
 
 The main problem with this is knowing which files need to be fsync'd.

Wasn't this a problem that the win32 port had to solve by keeping a list 
of all files that need fsyncing since Windows doesn't do sync() in the 
classical sense?  If so, then could we use that code to keep track of the 
files that need fsyncing?


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] 7.4RC1 planned for Monday

2003-10-31 Thread scott.marlowe

On Thu, 30 Oct 2003, Joshua D. Drake wrote:

 If I understood correctly, Josh was complaining about VACUUM sucking too
 
 much of his disk bandwidth.  autovacuum wouldn't help that --- in fact
 would likely make it worse, since a cron-driven vacuum script can at
 least be scheduled for low-load times of day.  autovacuum is likely to
 kick in at the least convenient times.
 
   
 
 Exactly!

Wait a minute, I thought the problem was that vacuums were happening too 
far apart, therefore taking too long, and may have been full, no?

If the autovacuum daemon causes a lazy vacuum to run on only the busiest 
(i.e. most updated) tables, then it is likely to only take a few minutes 
to run, instead of hours, plus you can try to keep things steady state.  
I.e. no more than x% or so dead tuples in a table at any given time, and 
they all get reused by fsm / lazy vacuum.

So, have you TESTED the autovacuum daemon with your load, and set it to 
run every 5 minutes?  Keep in mind, it won't actually vacuum every table 
every 5 minutes, it'll just check the stats to see which ones have updated 
a fair bit and vacuum those, and they're lazy vacuums.  I've found it to 
be a win.  If you haven't tested it and dismissed it out of hand, then you 
should really give it a try to see if it can be configured to provide good 
performance and behavior.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Deadlock problem

2003-10-30 Thread scott.marlowe

On 30 Oct 2003, Vatsal Avasthi wrote:

 
 Hi,
   I am facing a strange problem and thats bugging me for a long time,
 I am using postgres version 7.2.1.

Is it possible for you to upgrade to 7.2.4 just to make sure it's not a 
problem that was fixed from 7.2.1 to 7.2.4?


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] pg_user

2003-10-30 Thread scott.marlowe

On Thu, 30 Oct 2003, ivan wrote:

 
 you can also patch your kernel and when you write cat /etc/passwd system
 give you only your line , whitout any others users, so exacly what you
 need ,
 in pgsql i think that users dont need to know about others , and also
 them
 databases, i call it security :)

technically, that's just obscurity.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] 7.4RC1 planned for Monday

2003-10-30 Thread scott.marlowe

On Thu, 30 Oct 2003, Joshua D. Drake wrote:

 Hello,
 
I know I will probably be flamed into oblivion for this but I would 
 like to make a suggestion about
 the upcoming release.
 
What if we delayed until the end of the year?
 
The two reasons that I can come up with are:
 
1. The irritating (but work around capable) bigint index issue.
2. More importantly the recent potential discovery by Jan on vacuum.
 
   I have several high end users that are really beating their heads 
 against the wall with even lazy vacuum
 because of how brutal it can be on the system. If we could make vacuum a 
 little less harsh it could be
 a large boon.

Are these folks for whom the autovacuum daemon provides no relief?

I think it's too late in the release cycle to put everything on hold, 
especially with no known fix in sight (for bigint at least.)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread scott.marlowe

On 29 Oct 2003, Doug McNaught wrote:

 Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 
  FreeBSD 4.9 was released today.  In the release notes was:
  
  2.2.6 File Systems
  
  A new DIRECTIO kernel option enables support for read operations that
  bypass the buffer cache and put data directly into a userland
  buffer. This feature requires that the O_DIRECT flag is set on the
  file descriptor and that both the offset and length for the read
  operation are multiples of the physical media sector size.
  
  Is that of any use?
 
 Linux and Solaris have had this for a while.  I'm pretty sure it's
 been discussed before--search the archives.  I think the consensus
 was that it might be useful for WAL writes, but would be a fair amount
 of work and would introduce portability issues...

I would think the biggest savings could come from using directIO for 
vacuuming, so it doesn't cause the kernel to flush buffers.

Would that be just as hard to implement?  


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [BUGS] [HACKERS] Autocomplete TAB on Postgres7.4beta5 not

2003-10-28 Thread scott.marlowe

On Tue, 28 Oct 2003, Tom Lane wrote:

 Rod Taylor [EMAIL PROTECTED] writes:
  I say leave it the way it is. If you want system table tab completion,
  simply:
  ALTER USER ... SET search_path =3D pg_catalog,...;
 
 Unfortunately, that *does not* affect the tab-completion behavior;
 it will still not offer the system catalogs as completions unless
 you explicitly prefix pg_catalog..

It seems a good compromise then would be that if pg_catalog is in your 
search path, then do the old fashioned completion, i.e. ptab will list 
everything in both pg_catalog and the other schemas in your search path.

Afterall, the system catalog kind of hitchhikes along for a ride in your 
search path.  Most users aren't interested in the system catalogs, so 
having them suddenly show up when you just wanted the phonebook table is 
kinda a bother for them.

Is it possible to remove the implicit search path of pg_catalog from a 
psql session without it breaking lots of stuff?  If not, then it's kind of 
ugly to the user to have no way to get the system from proffering 
pg_catalog tables when they have no interest in them.  If so, then why 
make it part of the default?




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [BUGS] [HACKERS] Autocomplete TAB on Postgres7.4beta5 not

2003-10-28 Thread scott.marlowe

On Tue, 28 Oct 2003, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  Is it possible to remove the implicit search path of pg_catalog from a 
  psql session without it breaking lots of stuff?
 
 Do you consider +, count(), etc to be important stuff?

Me, hardly ever use them  :-)  So I can assume that removing the implicit 
pg_catalog from the search path is a bad thing.

In that case, does my proposed solution of having to implicitly include 
pg_catalog in your search path to get it to be seen by tab completion as 
just another schema make sense?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] round() function wrong?

2003-10-24 Thread scott.marlowe

On Wed, 22 Oct 2003, Jochen Westland [invigo] wrote:

 Hi All,
 i'm running Postgresql 2.2x, so i am not quitse sure wether the bug i am reporting 
 is already fixed
 in newer versions or not.
 
 In my version 
 select round(2.5); returns 2;
 select round(2.501) returns 3;
 
 refering to my math professor thats wrong, at least in germany.
 select round(2.5); should return 3

I just tried that on my 7.2.4 and 7.4 beta 4 machines and I get 2 for 
round(2.5)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] round() function wrong?

2003-10-24 Thread scott.marlowe

On Fri, 24 Oct 2003, Michael Brusser wrote:

 But this seems to work correctly on 7.3.2 and 7.3.4:
 psql -c select round (2.5)
 Password: 
  round 
 ---
  3
 (1 row)
 
 =
  
  I just tried that on my 7.2.4 and 7.4 beta 4 machines and I get 2 for 
  round(2.5)

Ackkk.  I accidentally tested it on my 7.3.4 box, not my 7.4 beta 4 box. 
but the output is the same.  The original, I believe, had '' marks in it.

Anyway, it seems to matter about the ''s in 7.3.x:


In pgsql 7.2:

select round(2.5::float);
 round
---
 2
(1 row)

select round(2.5::numeric);
 round
---
 3
(1 row)

select round(2.5);  -- would appear to be coerced to float here)
 round
---
 2
(1 row)

select round('2.5');
ERROR:  Function 'round(unknown)' does not exist
Unable to identify a function that satisfies the given argument 
types
You may need to add explicit typecasts


pgsql 7.3.4:


select round(2.5::float);
 round
---
 2
(1 row)

select round(2.5::numeric);
 round
---
 3
(1 row)

select round(2.5);  -- would appear to be coerced to numeric
 round
---
 3
(1 row)

select round('2.5');
 round
---
 2
(1 row)

pgsql 7.4 beta5: behaves the same as 7.3.4

So it would appear to be that the automatic assumptions about what is 
float and what is numeric changed from 7.2 to 7.3, i.e. it's assumed that 
numeric is the input type.

But I'm just guessing here.




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Failed to create temporary file

2003-10-23 Thread scott.marlowe

Sounds like your drives are full.

On Thu, 23 Oct 2003, Yuval Lieberman wrote:

 Hi!
 
 I'm doing a select (from an OACS page or from psql) and I get:
 ERROR:  Failed to create temporary file pgsql_tmp/pgsql_tmp27212.775
 
 The same select work ok a different database (which is on a different
 machine)
 
 the select is :
 select a.attribute_id, a.pretty_name,
a.ancestor_type, t.pretty_name as ancestor_pretty_name
   from acs_object_type_attributes a,
(select t2.object_type, t2.pretty_name,
  tree_level(t2.tree_sortkey) - tree_level(t1.tree_sortkey) + 1 as
 type_level
   from acs_object_types t1, acs_object_types t2
   where t1.object_type = 'group'
 and t2.tree_sortkey between t1.tree_sortkey and
 tree_right(t1.tree_sortkey)) t
  where a.object_type = 'group'
and t.object_type = a.ancestor_type
 order by type_level
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] integer ceiling in LIMIT and OFFSET

2003-10-22 Thread scott.marlowe

On Wed, 22 Oct 2003, Tom Lane wrote:

 Rod Taylor [EMAIL PROTECTED] writes:
  That said, perhaps the TODO for changing LIMIT / OFFSET to be expression
  based should also mention bumping them to int8.
 
 Can't get excited about it ... this would slow down the normal use of
 the facility for what seems a completely hypothetical need.

While I'm pretty sure 

select * from sometable limit 2147483648

isn't gonna be common, maybe someone would be likely to do something like:

select * from sometable limit 10 offset 2147483648

I wouldn't do it, but who knows what shadows lurk in men's minds?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] So, are we going to bump catversion for beta5, or not?

2003-10-22 Thread scott.marlowe

On Wed, 22 Oct 2003, Tom Lane wrote:

 Richard Huxton [EMAIL PROTECTED] writes:
  On Wednesday 22 October 2003 06:55, Peter Eisentraut wrote:
  The idea is that you give each function its own schema search path at
  creation time, and that path applies to that function for the rest of its
  life.  Then that function would be immune to schema path changes later on.
 
  But surely that would mean I couldn't do ...
 
 Certainly you can invent scenarios where letting the search path vary
 from call to call is useful, but the question is which behavior is
 *more* useful.  I think it's becoming clear that having a predictable
 search path is usually what a function author will want.
 
 It would probably be a good idea to allow the function's search path to
 be explicitly specified as a clause of CREATE FUNCTION (otherwise it
 will be a headache for pg_dump).  So we could allow both viewpoints,
 if there is a way to explicitly say don't force any search path.
 Perhaps specifying an empty path could mean that.  But I think the
 default should be to adopt the current search path (at the time of
 CREATE FUNCTION) as the function's permanent path.

It might be nice to have an alter function capability that could change 
the search path at a later date should one add schema etc... later on.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_ctl reload - is it safe?

2003-10-14 Thread scott.marlowe

On 14 Oct 2003, Greg Stark wrote:

 
 Michael Brusser [EMAIL PROTECTED] writes:
 
   Michael Brusser [EMAIL PROTECTED] writes:
2003-10-10 22:37:05 ERROR:  cannot read block 0 of s_noteimportlinks:
Interrupted system call
   
   Hmm.  I found this hard to believe at first, but indeed my local man
   pages for read() and write() say they can return EINTR if interrupted
   by a signal.  This may depend on the filesystem in use (are you using
   NFS?)
 
 The traditional unix semantics are the read/write my return EINTR if
 interrupted -- but that that would only EVER happen for network connections.
 The traditional semantics are that it would NEVER happen on disk i/o. BSD
 kernels at least, and probably all unix kernels, do an uninterruptible sleep
 on disk accesses, hence the dreaded D in ps listings.
 
  Yes, we use NFS. Many of our customers use it as well.
 
 Normally NFS guarantees the traditional unix semantics. 
 Unless you're using either soft or intr options.
 
 If you are, well, stop.
 
 If you use intr then this type of thing can happen. Lots of programs assume
 the unix semantics for disk accesses. You can get all kinds of bugs when
 they're violated.
 
 If you use soft then the consequences can be much much worse. If your
 fileserver were to reboot you could silently lose disk writes corrupting your
 database.

What if the WAL was local on disk, and the data was going to nfs storage, 
would that be safe, or saferer?  :-)


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] IDE Drives and fsync

2003-10-08 Thread scott.marlowe

OK, I've done some more testing on our IDE drive machine.

First, some background.  The hard drives we're using are Seagate 
drives, model number ST380023A.  Firmware version is 3.33.  The machine 
they are in is running RH9.  The setup string I'm feeding them on startup 
right now is:  hdparm -c3 -f -W1 /dev/hdx

where:

-c3 sets I/O to 32 bit w/sync (uh huh, sure...)
-f sets the drive to flush buffer cache on exit
-W1 turns on write caching

The drives come up using DMA.  turning unmask IRQ on / off has no affect 
on the tests I've been performaing.

Without the -f switch, data corruption due to sudden power down is an 
almost certain.  Running 'pgbench -c 5 -t 100' and pulling the plug 
will result in recovery failing with the typical invalid page type 
messages.

the pgbench database was originally set to -s 1 when initializing.

If I turn off write caching (-W0) then the data is coherent no matter how 
many concurrents I'm running, but performance is abysmal (drops from ~ 200 
tps down to 45, 10 if I'm using /dev/md0, a mirror set.)  This is all on a 
single drive.

If I use -W1 and -f, then I get corruption on about every 4th test or so 
if the number of parallel beaters is 50 or so.  If I crank it up to 200 or 
increase the size of the database by using -s 10 during initilization.  
Note that EITHER a larger test database OR a larger number of clients 
seems to increase the chance of corruption.

I'm guessing that the with -W1 and -f, what's happening is that at lower 
levels of parallel access, or a larger data set, the time between when the 
drive reports and fsync and when it actually writes the data out is 
climbing, and it is more likely that data that is in transit to the wal is 
getting lost during the power plug pull.

Tom, you had mentioned adding a delay of some kind to the fsync logic, and 
I'd be more than willing to try out any patch you'd like to toss out to me 
to see if we can get a semi-stable behaviour out of IDE drives with the 
-W1 and -f switches turned on.  As it is, the performance is quite good, 
and under low to medium loads, it seems to be capable of surviving the 
power plug being pulled, so I'm wondering if we can come up with a slight 
delay, that might drop the performance some small percentage while 
greatly decreasing the chance of data corruption.

Is this worth looking into?  I can see plenty of uses for a machine that 
runs on IDE for cost savings, while still providing a reasonable amount of 
data security in case of power failure, but I'm not sure if we can get rid 
of the problem completely or not.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Question regarding coopting Database Engine

2003-10-03 Thread scott.marlowe

On Tue, 30 Sep 2003, Steve Yalovitser wrote:

 Hello,
 
 I'd like to know if its possible to coopt the postgres storage subsystem to
 rely entirely on ram based structures, rather than disk. Any documentation
 or ideas would be appreciated.

Sure, create a ram disk.  Set $PGDATA to it with proper permissions, 
initdb, and restore.

Not sure why'd you wanna do it, as any crash loses all data, and 
postgresql's strengths lie in its ability to survive the most brutal power 
off situations mid-transaction etc...


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Thoughts on maintaining 7.3

On Tue, 30 Sep 2003, Joshua D. Drake wrote:

 Hello,
 
   With the recent stint of pg_upgrade statements and the impending 
 release of 7.4 what
 do people think about having a dedicated maintenance team for 7.3? 7.3 
 is a pretty
 solid release and I think people will be hard pressed to upgrade to 7.4. 
 Of course
 a lot of people will, but I have customer that are just now upgrading to 
 7.3 because
 of legacy application and migratory issues.
 
Anyway I was considering a similar situation to how Linux works where 
 their is a
 maintainer for each release... Heck even Linux 2.0 still released until 
 recently.
 
   Of course the theory being that we backport some features and fix 
 any bugs that
 we find?
 
What are people's thoughts on this?

It seems to me the upgrade from 7.2 to 7.4 is easier than an upgrade to 
7.3, since at least 7.4's pg_dumpall can connect to a 7.2 database and 
suck in everything, whereas in 7.3 I had to dump with 7.2's dumpall and 
then tweak the file by hand a fair bit to get it to go into 7.3.

With 7.4 I'm finding upgrading to be easier.  I'll likely upgrade out 
production servers to 7.4.0 when it comes out and wind up skipping 7.3 
altogether.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Wednesday beta postponed till Thursday

On Tue, 30 Sep 2003, Tom Lane wrote:

 It seems some junior electrician in Panama pulled the wrong circuit
 breaker ... and then the mail.postgresql.org server spent an
 unreasonable number of hours fsck'ing.  (Why is Marc a FreeBSD fan
 anyway?  Don't ask me, I work for Red Hat.)  Anyhow, due to the loss
 of project communications for today, it seems best to put off tomorrow's
 intended 7.4beta4 release for a day.  We'll plan Thursday instead.

Hey Jim!?  What does this big red switch do?
@#$% NO CARRIER


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Thoughts on maintaining 7.3

On Wed, 1 Oct 2003, Joshua D. Drake wrote:

 
 With 7.4 I'm finding upgrading to be easier.  I'll likely upgrade out 
 production servers to 7.4.0 when it comes out and wind up skipping 7.3 
 altogether.
   
 
 
 Sure but I talking about people who are running 7.3 and are happy with 
 it. The reality is that for probably 95% of the people
 out there , there is no reason for 7.4. When you have existing system 
 that works... why upgrade? That is one of the benefits
 of Open Source stuff, we no longer get force into un-needed upgrade cycles.

Agreed, we've been on 7.2 for a while now because it just works.  
The regex substring introduced in 7.3 was a pretty cool feature, for 
instance, that makes life easy.

 When you deal with the systems I do, the cost to a customer to migrate 
 to 7.4 would be in the minimum of 10,000-20,000 dollars.
 They start to ask why were upgrading with those numbers.

then maybe they would be willing to donate some small amount each ($500 or 
so) to pay for backporting issues.  Since mostly what I'd want on an older 
version would be bug / security fixes, that $500 should go a long way 
towards backporting.

 That is not to say that 7.4 is not worth it from a technical sense but 
 for my customers, If it ain't broke, don't fix it is a mantra and
 the reality is that 7.3 is not broke in their minds. There is 
 limitations pg_dump/pg_restore has some issues, having to reindex the 
 database
 (which 7.4 doesn't fix), vacuum (which 7.4 doesn't fix) but my customers 
 accept them as that.

I was under the imporession that 7.4 removed the need to reindex caused by 
monotonically increasing index keys, no?

 Your mileage may vary but I can only talk from my experience.

Yeah, I would rather have had more back porting to 7.2 because there were 
tons of little improvements form 7.2 to 7.3 I could have used while 
waiting for 7.4's improved pg_dumpall to come along.

Cheers:-)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Lost mails

On Wed, 1 Oct 2003, Darko Prenosil wrote:

 Two mails with updated translations for /src/backend/po/hr.po are lost.
 First time I send clear po file, second tar.gz - no result.
 Is something blocking mails with attachment ? I didn't receive notification
 that mail is blocked or something like that.
 Can I try to send it to some other address ?

Just send it again, yesterday a junior eletrical tech in Panama turned off 
the mail server for the lists for a little while.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] feature request: show pgsql version when running initdb

On Fri, 26 Sep 2003, Christopher Kings-Lynne wrote:

 Is there any chance we could have initdb show the version of postgresql
 it is running as when initdb is run?
  
  
  If you install many different versions in parallel, don't you give your
  installation paths some meaning that contain the version number?  In any
  case, you can run initdb --version first if you're not sure about what is
  where.
 
 Yes I do, but sometimes as different users you don't know what the path 
 is.  I guess I can just go --version.

Anytime I'm running multple versions on the same box, I install them to 
paths like /usr/local/pgsql72 and create a user like pgsql72 and then set 
it so the path for that use is /usr/local/pgsql72/bin.

that way, just looking at the command prompt I know which version / 
superuser I'm playing with.  Plus, no matter how badly I try to screw up, 
pgsql72 doen't have permission to initdb pgsql73's $PGDATA directory.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Bruce Momjian wrote:

 Tom Lane wrote:
  Peter Eisentraut [EMAIL PROTECTED] writes:
   Tom Lane writes:
   so it appears that cygwin's echo generates a different newline style
   than what got put into sql_features.txt.  A possible way to fix this is
   to put the \. line into sql_features.txt, but maybe there's a cleaner
   answer.  Peter, any thoughts?
  
   There's no clean answer to this on Cygwin.  This specific case is just a
   little problem that we could solve locally, but in general you'll just end
   up annoying people if you require them to use consistent line endings on
   Cygwin.
  
  Yeah, I was wondering whether you wouldn't propose dropping the newline
  consistency check.  I'm not very comfortable with that, but maybe we
  should.  Bruce?
 
 I posted on that a few minutes ago.  Yea, we can drop it, but we risk
 eating carraige returns as data values.  I am not sure how consistently
 we output literal carriage returns in old dumps, nor how many apps
 produce on literal carriage returns in COPY. If we conditionally eat
 them, we run the risk of discarding some of their data without warning. 
 Perhaps we can throw a warning rather than an error, and adjust initdb
 to be consistent.

I'm running into issues where 7.4's pg_dump/pg_dumpall from a 7.2 database 
to a 7.4beta3 database is producing some errors like this:

ERROR:  literal newline found in data
HINT:  Use \n to represent newline.
CONTEXT:  COPY FROM, line 59

ERROR:  literal carriage return found in data
HINT:  Use \r to represent carriage return.
CONTEXT:  COPY FROM, line 41


These show up with little or no context, only the line number of the 
dump file.  Since I'm wrapping these up in pg_dumpall, I don't have the 
dump file so I don't know where the error is really occuring.  It would be 
nice to have such occurances echo the table / row they are getting the 
error on, or maybe just the first 20 or so characters, so they'd be easier 
to identify.

Is this related to this issue?


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  I'm running into issues where 7.4's pg_dump/pg_dumpall from a 7.2 database 
  to a 7.4beta3 database is producing some errors like this:
 
  ERROR:  literal newline found in data
  HINT:  Use \n to represent newline.
  CONTEXT:  COPY FROM, line 59
 
  ERROR:  literal carriage return found in data
  HINT:  Use \r to represent carriage return.
  CONTEXT:  COPY FROM, line 41
 
 Really?  7.2 should dump data \r or \n as the backslash versions ...
 and does in my tests.  Can you make a reproducible test case?

The attached file produces this problem.  Note it's a blank trailing field 
that looks to be causing it.  The error for this .sql file is:

ERROR:  literal carriage return found in data
HINT:  Use \r to represent carriage return.
CONTEXT:  COPY FROM, line 2

Note that loading this into pico and saving it back out fixes the problem.

If I remove the preceding row that doesn't end in a blank field, I get a 
different error, this one:

ERROR:  end-of-copy marker does not match previous newline style
CONTEXT:  COPY FROM, line 2

  It would be nice to have such occurances echo the table / row they are
  getting the error on, or maybe just the first 20 or so characters, so
  they'd be easier to identify.
 
 That's not a bad idea.  I think it would be fairly easy now for the
 CONTEXT line of the error message to include the input data line:
 
   CONTEXT:  COPY FROM, line 41: data here 
 
 at least up through the field where the error gets thrown, and with some
 limit on the length of the data that will get echoed.  If people like
 that idea I'll see about making it happen.

table name too, like Bruce said.  The bothersome bit is that in pg_dump, 
it says the line, relative to just this part of the copy command, so you 
don't even know which table is giving the error.
--
-- PostgreSQL database dump
--

\connect - marl8412

--
-- TOC entry 2 (OID 283043147)
-- Name: people2; Type: TABLE; Schema: ; Owner: marl8412
--

CREATE TABLE people2 (
id integer,
persons text
);


--
-- Data for TOC entry 3 (OID 283043147)
-- Name: people2; Type: TABLE DATA; Schema: ; Owner: marl8412
--

COPY people2 (id, persons) FROM stdin;
59  Chance Terry--S
60  

\.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Bruce Momjian wrote:

 scott.marlowe wrote:
  The attached file produces this problem.  Note it's a blank trailing field 
  that looks to be causing it.  The error for this .sql file is:
  
  ERROR:  literal carriage return found in data
  HINT:  Use \r to represent carriage return.
  CONTEXT:  COPY FROM, line 2
  
  Note that loading this into pico and saving it back out fixes the problem.
  
  If I remove the preceding row that doesn't end in a blank field, I get a 
  different error, this one:
  
  ERROR:  end-of-copy marker does not match previous newline style
  CONTEXT:  COPY FROM, line 2
 
 OK, 'vi' shows it as:
   
   COPY people2 (id, persons) FROM stdin;
   59  Chance Terry--S
   60  ^M
   \.
 
 which is _exactly the case the error was supposed to catch.  Now, the
 big question is where did this dump come from?  Pg version?  OS platform?

The originating system is a RedHat 7.2 box with postgresql 7.2.x running 
on it.

The destination system is a RedHat 7.2 box with postgresql 7.4 beta3 
running on it.

The data likely came out of a (gasp, horrors) windows box.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Bruce Momjian wrote:

 scott.marlowe wrote:
   OK, 'vi' shows it as:
 
 COPY people2 (id, persons) FROM stdin;
 59  Chance Terry--S
 60  ^M
 \.
   
   which is _exactly the case the error was supposed to catch.  Now, the
   big question is where did this dump come from?  Pg version?  OS platform?
  
  The originating system is a RedHat 7.2 box with postgresql 7.2.x running 
  on it.
  
  The destination system is a RedHat 7.2 box with postgresql 7.4 beta3 
  running on it.
  
  The data likely came out of a (gasp, horrors) windows box.
 
 OK, try this on your 7.2:
 
   test= create table test(x text);
   CREATE TABLE
   test= insert into test values ('\r');
   INSERT 17158 1
   test= copy test to '/tmp/out';
   COPY
 
 Then 'vi' /tmp/out. It should show \r, not ^M.  Please report back.

I'm not much of a vi guy, so the out file thing didn't tell me much, but 
if I try and dump it from the 7.4 beta3 box into a new table, I get:

ERROR:  end-of-copy marker does not match previous newline style
CONTEXT:  COPY FROM, line 2



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Bruce Momjian wrote:

 scott.marlowe wrote:
   OK, 'vi' shows it as:
 
 COPY people2 (id, persons) FROM stdin;
 59  Chance Terry--S
 60  ^M
 \.
   
   which is _exactly the case the error was supposed to catch.  Now, the
   big question is where did this dump come from?  Pg version?  OS platform?
  
  The originating system is a RedHat 7.2 box with postgresql 7.2.x running 
  on it.
  
  The destination system is a RedHat 7.2 box with postgresql 7.4 beta3 
  running on it.
  
  The data likely came out of a (gasp, horrors) windows box.
 
 OK, try this on your 7.2:
 
   test= create table test(x text);
   CREATE TABLE
   test= insert into test values ('\r');
   INSERT 17158 1
   test= copy test to '/tmp/out';
   COPY
 
 Then 'vi' /tmp/out. It should show \r, not ^M.  Please report back.

Figured out iv (-b switch, man pages rock) and it's a ^M



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Peter Eisentraut wrote:

 Bruce Momjian writes:
 
  The argument that you want a warning because you might have mixed
  newlines in the file seems less likely than this case where they are
  using a literal carriage return as a data value at the end of the line.
 
 I don't agree with that assessment.  Who actually has CRs in their data?

I do.  And so do lots of other people.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Peter Eisentraut wrote:

 scott.marlowe writes:
 
  but I get basically the same thing if I dump it to a .sql file and do:
 
  psql dbname dbname.sql
 
 Use psql -f dbname.sql instead.

and the output is:

psql:webport.sql:803: ERROR:  function odbc_user already exists with 
same argument types
REVOKE
REVOKE
GRANT
You are now connected as new user ayousuff.
psql:webport.sql:869: ERROR:  literal newline found in data
HINT:  Use \n to represent newline.
CONTEXT:  COPY FROM, line 59
You are now connected as new user smarlowe.
You are now connected as new user ayousuff.
CREATE INDEX
CREATE INDEX
CREATE INDEX


the same.  It doesn't tell me which table in my dump caused the problem, 
and it certainly isn't line 59 of the dump file, but of the table 
producing the error.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

On Fri, 26 Sep 2003, Tom Lane wrote:

 Peter Eisentraut [EMAIL PROTECTED] writes:
  scott.marlowe writes:
  but I get basically the same thing if I dump it to a .sql file and do:
  psql dbname dbname.sql
 
  Use psql -f dbname.sql instead.
 
 This doesn't seem like a good argument not to add more information to
 the CONTEXT line for COPY errors.  Sure, in theory the existing info
 should be sufficient, but what if the information is not coming in
 through psql?  (For instance, maybe the COPY data is being generated
 on-the-fly by some other program.)  Or what if the dump file is so large
 you can't easily edit it to determine which line number is in question?
 There are plenty of scenarios where it's not all that convenient to
 triangulate on a problem from outside information.  Minimalism isn't
 really a virtue in error reports anyway.
 
 I'm thinking maybe:
 
   CONTEXT:  COPY tablename, line 41: data ...
 
 would serve the purpose nicely.

Yeah, just having the table name and line number would be plenty for me.  
It's the lack of a table name that makes it so frustrating.  I had to 
basically dump / restore the tables one at a time to figure out which one 
was causing the error.  On a database with hundreds of tables, that could 
be painful.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)