Re: [HACKERS] Oracle/PostgreSQL incompatibilities

2003-10-04 Thread Mark Kirkwood
Finger or brain trouble, here is the correction :

NUMBER - INTEGER when transporting schemas from Oracle to Pg.
(This needs a little care - as NUMBER in Oracle has bigger *precision* 
than INTEGER in Pg)

Thinking about this a bit more, its probably fine to just substitute 
NUMERIC for NUMBER, but obviously INTEGER is going to perform better if 
it can be used.

regards

Mark



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-04 Thread Hannu Krosing
Mike Mascari kirjutas L, 04.10.2003 kell 06:32:

 
 2) The query language should be computationally complete. The user
 should be able to author complete applications in the language, rather
 than the language being a sublanguage.

To me it seems like requiring that one should be able to author complete
programs in regex.

Yes, when all you have is a hammer everything looks like a nail ;)


Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Using backslash in query

2003-10-04 Thread Hannu Krosing
Tom Lane kirjutas R, 03.10.2003 kell 18:34:
 Michael Brusser [EMAIL PROTECTED] writes:
  But when I doubled the number of backslashes:
SELECT smth. FROM  tbl WHERE situation LIKE '%%';
  - it actually worked fine.
 
 Backslash is special to both the string-literal parser and the LIKE code.
 So when you write the above, the pattern value that arrives at the LIKE
 processor has one less level of backslashing:
   %\\%
 and the LIKE processor interprets this as percent, a literal backslash,
 and another percent.

Regarding the dollar-quoting discussions -

Will we be able to write the above query as 

SELECT smth. FROM  tbl WHERE WHERE situation LIKE $$%\\%$$;

in 7.4 or is \ still special there ?

if it is then one \ in regex in plpython still needs to be entered as
 which has some geek coolness but would not be what I'd prefer
to do on a regular basis.


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] hr translations

2003-10-04 Thread Darko Prenosil
I see that hr translation for libpq is applied, but not that one for
backend that I send few days ago.
Is there any problem with it ?

Regards !


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Quick question

2003-10-04 Thread Christopher Kings-Lynne
Thanks - that's what I ended up doing.  Hooray for CLUSTER indexes and 
constraints in phpPgAdmin CVS now!

Chris

Tom Lane wrote:

Christopher Kings-Lynne [EMAIL PROTECTED] writes:

If someone could help me with this, it would be cool.  How do I query 
the catalogs to find the underlying index for a constraint?  (Assuming 
the constraint is primary or unique)


A first approximation is that the constraint and the index have the same
name, but I suppose someone could break that association by renaming the
index.  Look in pg_depend for an internal dependency entry from the
index to the constraint if you want to be sure.
			regards, tom lane


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Weird locking situation

2003-10-04 Thread Christopher Kings-Lynne

regression=# UPDATE foo SET f2=now() WHERE f1=1;
ERROR:  deadlock detected
DETAIL:  Process 18122 waits for AccessExclusiveLock on relation 154635 of database 
17139; blocked by process 18133.
Process 18133 waits for ShareLock on transaction 6330; blocked by process 18122.
The trouble here is that GIST indexes are not concurrency-safe.
This is on the TODO list but I fear it's not a small task ...
Wow, that's bad.  I always thought the TODO item was talking about poor 
concurrent performance - not actual concurrency errors!

Chris



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Index/Function organized table layout (from Re:

2003-10-04 Thread Hannu Krosing
Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
 [EMAIL PROTECTED] (Jean-Luc Lachance) writes:
  That's one of the draw back of MVCC.  
  I once suggested that the transaction number and other house keeping
  info be included in the index, but was told to forget it...
  It would solve once and for all the issue of seq_scan vs index_scan.
  It would simplify the aggregate problem.
 
 It would only simplify _one_ case, namely the case where someone cares
 about the cardinality of a relation, and it would do that at
 _considerable_ cost.
 
 A while back I outlined how this would have to be done, and for it to
 be done efficiently, it would be anything BUT simple.  

Could this be made a TODO item, perhaps with your attack plan. 
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

 It would be very hairy to implement it correctly, and all this would
 cover is the single case of SELECT COUNT(*) FROM SOME_TABLE;

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes. 

 If you had a single WHERE clause attached, you would have to revert to
 walking through the tuples looking for the ones that are live and
 committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for walking through the
tuples in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

 And it still begs the same question, of why the result of this query
 would be particularly meaningful to anyone.  I don't see the
 usefulness; I don't see the value of going to the considerable effort
 of fixing this purported problem.

Being able to do fast count(*) is just a side benefit.


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Using backslash in query

2003-10-04 Thread Andrew Dunstan

- Original Message - 
From: Hannu Krosing [EMAIL PROTECTED]

 Regarding the dollar-quoting discussions -

 Will we be able to write the above query as

 SELECT smth. FROM  tbl WHERE WHERE situation LIKE $$%\\%$$;


Yes, as I understand it (as long as you remove the one of the WHEREs :-) ).
If not we need some rethinking.  Think of this as our equivalent of XML's
CDATA quoting - you don't need to escape  or  or  inside a CDATA node.


 in 7.4 or is \ still special there ?

I don't believe so.


cheers

andrew


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Using backslash in query

2003-10-04 Thread Andrew Dunstan
but this will be in 7.5, not 7.4, won't it?

Andrew Dunstan wrote:

- Original Message - 
From: Hannu Krosing [EMAIL PROTECTED]
 

Regarding the dollar-quoting discussions -

Will we be able to write the above query as

SELECT smth. FROM  tbl WHERE WHERE situation LIKE $$%\\%$$;

   

Yes, as I understand it (as long as you remove the one of the WHEREs :-) ).
If not we need some rethinking.  Think of this as our equivalent of XML's
CDATA quoting - you don't need to escape  or  or  inside a CDATA node.
 

in 7.4 or is \ still special there ?
   

I don't believe so.

 



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Beta4 Tag'd and Bundled ...

2003-10-04 Thread Adam Witney

Hi,

Many of the regression tests are failing on my OSX 10.2.6 machine. I have
put the regression.diffs file here

http://bugs.sghms.ac.uk/downloads/regression.diffs

Has this been seen before?

Thanks

adam

 
 Check her over and let me know if there are any problems ... will do a
 full general announce tomorrow for it ...
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-04 Thread Christopher Browne
The world rejoiced as [EMAIL PROTECTED] (Mike Mascari) wrote:
 It's a very provocative read. At a minimum, one can learn what to
 avoid with SQL. The language looks neat on paper. Perhaps one day
 someone will provide an open source implementation. One could envision
 a D project along the same lines as the same sort of project that
 added SQL to Postgres...

I think you summed it up nicely.  The manifesto is a provocative, if
painful, read.  It is very useful at pointing out pointy edges of
SQL that might be wise to avoid.

I'm not thrilled with the language; I think they have made a mistake
in trying to make it too abbreviation-oriented.  They keep operator
names short, to a fault.

As you say, the most likely way for a D to emerge in a popular way
would be by someone adding the language to an existing database
system.

There is a project out on SourceForge for a D implementation, called
Duro.  It takes the opposite approach; the operators are all defined
as C functions, so you write all your code in C.  It uses a data store
built atop Berkeley DB.

I think an implementor would be better off using an SQL database
underneath, and using their code layer in between to accomplish the
divorce from the aspects of SQL that they disapprove of.  Sort of
like MaVerick, a Pick implementation in Java that uses a DBMS such as
PostgreSQL as the underlying data store.

You do a proof of concept by building something that translates D
requests to SQL requests.  And _then_ get a project going to put a D
parser in as an alternative to the SQL parser.  (Yes, that
oversimplifies matters.  Tough...)
-- 
let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;;
http://www3.sympatico.ca/cbbrowne/rdbms.html
Rules of the Evil Overlord #81. If I am fighting with the hero atop a
moving platform, have disarmed him, and am about to finish him off and
he glances behind  me and drops flat, I too will  drop flat instead of
quizzically turning around to find out what he saw.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Do we move empty index pages to the end before truncation during vacuum
 full?

No.  You'd be better off using REINDEX for that, I think.  IIRC we have
speculated about making VAC FULL fix the indexes via REINDEX rather than
indexbulkdelete.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Using backslash in query

2003-10-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Regarding the dollar-quoting discussions -

 Will we be able to write the above query as 
 SELECT smth. FROM  tbl WHERE WHERE situation LIKE $$%\\%$$;
 in 7.4 or is \ still special there ?

We were discussing that for 7.5 not 7.4.  But yeah, it would work that
way because \ won't be treated specially in dollar-quoted literals.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Beta4 Tag'd and Bundled ...

2003-10-04 Thread Tom Lane
Adam Witney [EMAIL PROTECTED] writes:
 Many of the regression tests are failing on my OSX 10.2.6 machine. I have
 put the regression.diffs file here
 http://bugs.sghms.ac.uk/downloads/regression.diffs

Seems to be all

! psql: FATAL:  sorry, too many clients already

What did initdb set your max_connections to?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Beta4 Tag'd and Bundled ...

2003-10-04 Thread Adam Witney
On 4/10/03 5:16 pm, Tom Lane [EMAIL PROTECTED] wrote:

 Adam Witney [EMAIL PROTECTED] writes:
 Many of the regression tests are failing on my OSX 10.2.6 machine. I have
 put the regression.diffs file here
 http://bugs.sghms.ac.uk/downloads/regression.diffs
 
 Seems to be all
 
 ! psql: FATAL:  sorry, too many clients already
 
 What did initdb set your max_connections to?
 
 regards, tom lane

From src/test/regress/log/initdb.log

selecting default max_connections... 10

Is this the info you are asking for?

adam


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] timestamp.c is broken (probably by ecpg) in 7.4

2003-10-04 Thread Bruno Wolff III
On Fri, Oct 03, 2003 at 08:57:49 -0500,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 I get the error message below when trying to 'make' current cvs:
 gcc -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -fpic 
 -I../../../../src/interfaces/ecpg/include -I../../../../src/include/utils 
 -I../../../../src/include -D_GNU_SOURCE   -g  -c timestamp.c -o timestamp.o
 In file included from ../../../../src/interfaces/ecpg/include/compatlib.h:7,
  from ../../../../src/interfaces/ecpg/include/datetime.h:4,
  from timestamp.c:14:
 ../../../../src/interfaces/ecpg/include/ecpglib.h:9: libpq-fe.h: No such file or 
 directory
 make[4]: *** [timestamp.o] Error 1
 
 My config is:
 ./configure --prefix=/usr/local/pgsql --enable-integer-datetimes --with-openssl 
 --with-pgport=5433
 
 This is on a redhat 6.1 system.

It looks like this has been fixed in the current cvs.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Beta4 Tag'd and Bundled ...

2003-10-04 Thread Tom Lane
Adam Witney [EMAIL PROTECTED] writes:
 On 4/10/03 5:16 pm, Tom Lane [EMAIL PROTECTED] wrote:
 What did initdb set your max_connections to?

 From src/test/regress/log/initdb.log
 selecting default max_connections... 10

Hm.  The parallel regression tests require at least 20.  I deliberately
allowed initdb to select values as small as 10 on the theory that
installing and not being able to run the parallel regression tests is
better than not installing at all.  Does anyone want to argue the
opposite?  Perhaps we just need a note in the documentation --- but if
so, where?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Beta4 Tag'd and Bundled ...

2003-10-04 Thread Rod Taylor
 Hm.  The parallel regression tests require at least 20.  I deliberately
 allowed initdb to select values as small as 10 on the theory that
 installing and not being able to run the parallel regression tests is
 better than not installing at all.  Does anyone want to argue the
 opposite?  Perhaps we just need a note in the documentation --- but if
 so, where?

Another alternative is to have the regression suite discover the max
connections, and defer tests when there are (max_connections - 1)
connections already.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Beta4 Tag'd and Bundled ...

2003-10-04 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 Hm.  The parallel regression tests require at least 20.  I deliberately
 allowed initdb to select values as small as 10 on the theory that
 installing and not being able to run the parallel regression tests is
 better than not installing at all.

 Another alternative is to have the regression suite discover the max
 connections, and defer tests when there are (max_connections - 1)
 connections already.

Maybe.  After thinking a bit more, I seem to recall one of the reasons
for having wide parallel sets in the regression tests is that we
*wanted* to consider inability to support a dozen or two connections as
a serious problem.  If we still believe that old logic, then indeed the
right thing to do is for initdb to insist on setting max_connections no
smaller than 20.  (Pre-7.4, the default setting was generally 32, so
this is still more flexible than before from a portability standpoint.)

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump bug in 7.4

2003-10-04 Thread Bruno Wolff III
The following is still a problem in current cvs (as of 2 hours ago).
Normally I wouldn't bug people about this again this soon, but with talk of
a release candidate next week I wanted to make sure that it wasn't forgotten.

On Sun, Sep 28, 2003 at 20:14:03 -0500,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 If you have a check constraint that tests if a boolean column is not
 false by just using the column name, pg_dump doesn't include parens
 around the check constraint which causes a syntax error when reloading
 the database.
 
 Using the following to create a table:
 create table test (col1 boolean constraint test check (col1));
 
 pg_dump -c produced the following:
 --
 -- PostgreSQL database dump
 --
 
 SET SESSION AUTHORIZATION 'postgres';
 
 SET SESSION AUTHORIZATION 'bruno';
 
 SET search_path = public, pg_catalog;
 
 DROP TABLE public.test;
 SET SESSION AUTHORIZATION 'postgres';
 
 --
 -- TOC entry 3 (OID 2200)
 -- Name: public; Type: ACL; Schema: -; Owner: postgres
 --
 
 REVOKE ALL ON SCHEMA public FROM PUBLIC;
 GRANT ALL ON SCHEMA public TO PUBLIC;
 
 
 SET SESSION AUTHORIZATION 'bruno';
 
 --
 -- TOC entry 4 (OID 605016)
 -- Name: test; Type: TABLE; Schema: public; Owner: bruno
 --
 
 CREATE TABLE test (
 col1 boolean,
 CONSTRAINT test CHECK col1
 );
 
 
 --
 -- Data for TOC entry 5 (OID 605016)
 -- Name: test; Type: TABLE DATA; Schema: public; Owner: bruno
 --
 
 COPY test (col1) FROM stdin;
 \.
 
 
 SET SESSION AUTHORIZATION 'postgres';
 
 --
 -- TOC entry 2 (OID 2200)
 -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres
 --
 
 COMMENT ON SCHEMA public IS 'Standard public schema';
 
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump bug in 7.4

2003-10-04 Thread Robert Creager
When grilled further on (Sat, 4 Oct 2003 12:50:27 -0500),
Bruno Wolff III [EMAIL PROTECTED] confessed:

 The following is still a problem in current cvs (as of 2 hours ago).
 Normally I wouldn't bug people about this again this soon, but with talk of
 a release candidate next week I wanted to make sure that it wasn't forgotten.
 

I just hit the same problem (with 7.4b4).

CREATE TABLE processing (
index integer NOT NULL,
time timestamp with time zone DEFAULT now() NOT NULL,
archname_index integer,
CONSTRAINT archname_index CHECK NULL::boolean
);


Cheers,
Rob

-- 
 11:49:30 up 64 days,  4:19,  4 users,  load average: 4.37, 3.83, 3.53


pgp0.pgp
Description: PGP signature


Re: [HACKERS] hr translations

2003-10-04 Thread Peter Eisentraut
Darko Prenosil writes:

 I see that hr translation for libpq is applied, but not that one for
 backend that I send few days ago.
 Is there any problem with it ?

It might have gotten caught in my mail filters.  I've taken it from the
archives and applied it now.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_dump bug in 7.4

2003-10-04 Thread Bruce Momjian

I have added this to the 7.4 open items list:

Fix pg_dump of CREATE TABLE test (col1 BOOLEAN CONSTRAINT test
CHECK (col1))


---

Bruno Wolff III wrote:
 The following is still a problem in current cvs (as of 2 hours ago).
 Normally I wouldn't bug people about this again this soon, but with talk of
 a release candidate next week I wanted to make sure that it wasn't forgotten.
 
 On Sun, Sep 28, 2003 at 20:14:03 -0500,
   Bruno Wolff III [EMAIL PROTECTED] wrote:
  If you have a check constraint that tests if a boolean column is not
  false by just using the column name, pg_dump doesn't include parens
  around the check constraint which causes a syntax error when reloading
  the database.
  
  Using the following to create a table:
  create table test (col1 boolean constraint test check (col1));
  
  pg_dump -c produced the following:
  --
  -- PostgreSQL database dump
  --
  
  SET SESSION AUTHORIZATION 'postgres';
  
  SET SESSION AUTHORIZATION 'bruno';
  
  SET search_path = public, pg_catalog;
  
  DROP TABLE public.test;
  SET SESSION AUTHORIZATION 'postgres';
  
  --
  -- TOC entry 3 (OID 2200)
  -- Name: public; Type: ACL; Schema: -; Owner: postgres
  --
  
  REVOKE ALL ON SCHEMA public FROM PUBLIC;
  GRANT ALL ON SCHEMA public TO PUBLIC;
  
  
  SET SESSION AUTHORIZATION 'bruno';
  
  --
  -- TOC entry 4 (OID 605016)
  -- Name: test; Type: TABLE; Schema: public; Owner: bruno
  --
  
  CREATE TABLE test (
  col1 boolean,
  CONSTRAINT test CHECK col1
  );
  
  
  --
  -- Data for TOC entry 5 (OID 605016)
  -- Name: test; Type: TABLE DATA; Schema: public; Owner: bruno
  --
  
  COPY test (col1) FROM stdin;
  \.
  
  
  SET SESSION AUTHORIZATION 'postgres';
  
  --
  -- TOC entry 2 (OID 2200)
  -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres
  --
  
  COMMENT ON SCHEMA public IS 'Standard public schema';
  
  
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump bug in 7.4

2003-10-04 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 The following is still a problem in current cvs (as of 2 hours ago).

Not any more ;-)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pg_dump bug in 7.4

2003-10-04 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I have added this to the 7.4 open items list:

   Fix pg_dump of CREATE TABLE test (col1 BOOLEAN CONSTRAINT test
   CHECK (col1))

Fixed now.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd and Bundled ...)

2003-10-04 Thread Tom Lane
I said:
 Hm.  The parallel regression tests require at least 20.  I deliberately
 allowed initdb to select values as small as 10 on the theory that
 installing and not being able to run the parallel regression tests is
 better than not installing at all.

Actually, after trying to reproduce the problem on my own OS X machine,
I realize that it's a little more subtle than I thought.  The Darwin
port does not use SysV semaphores at all (it uses Posix semaphores) and
so the resource restriction you're hitting must actually be the
max-shared-memory limit, rather than number-of-semaphores as I first
assumed.  max_connections does have an impact on shared memory size,
though not as large as shared_buffers has.

Therefore, the real problem is that initdb initially probes for a
workable number of shared_buffers while using max_connections=5, and
then it selects max_connections while holding shared_buffers constant.
I was thinking that a small max_connections would prevent us from
mistakenly choosing tiny shared_buffers when the system's real
restriction is on number of semaphores.  But what we seem to have here
is that the selected number of buffers was just a little under the
system's max-shared-memory limit, so that max_connections could be
raised to 10 but not to 20.

(BTW, on my OS X machine, with out-of-the-box configuration, initdb
selects shared_buffers 400 and max_connections 20.  I'm guessing that
you had either a nondefault shared memory limit, or some other process
using shared memory.)

Perhaps we should avoid all attempts at cuteness and just run the
initial probes for workable shared_buffers with max_connections=20,
as well as making that be the minimum max_connections value probed for.

Anyone see a better way?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Index/Function organized table layout

2003-10-04 Thread James Rogers
On 10/2/03 11:34 PM, Hannu Krosing [EMAIL PROTECTED] wrote:
 James Rogers kirjutas N, 02.10.2003 kell 23:44:
 Not exactly. What you are describing is more akin to partitioning or
 hash-organized tables i.e. sorting insert/update tuples to various pages
 according to some hash function.
 
 What I actually thought I was describing is how CLUSTER should work in a
 postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
 whole feature.


Yeah, I can see this.  Clustering doesn't really imply ordering, just
tuple proximity in the heap.  Loosely implemented by default (i.e. only
grouping a tuple if it is cheap to do so at insert/update time) on a table's
primary key might give measurable query performance improvement in the
typical case without impacting the average performance of queries that use
other indexes.  

Even if the tuples were not ordered in the heap, tuple proximity in the heap
would solve a significant percentage of the performance issue that
index-organized tables solve i.e. greatly improved cache efficiency.

 
 AFAICS we could resolve this problem (querying indexes only) by keeping
 a copy of visibility info (tmin,tmax,...) in index tuples. This would
 make index updates bigger and thus slower, so this should be optional.


It wouldn't be a feature you would want to use anyway if actually indexing a
table.  But yes, adding the necessary support to make an index page
masquerade as a proper heap page would be most of the effort.  That and
making the SQL execution efficiently make use of the fact that the index and
the table are the same relation with two different interfaces, such that you
don't have to modify or access both when using either one.

 
 If you then put all fields in primary key, then the main table could be
 dropped. If there is no data table then no other indexes would then be
 allowed, or they must be double-indexes referencing the primary key,
 not tuple and thus even bigger ...


Putting an index on an index is pretty horrible.  But quite frankly, if you
actually require a second index on a table, then B-Tree organized tables are
not what you need.

 
 But if we had clustered the table on (id, timestamp), then the data
 would be in right order for queries, if cluster worked well.


Right, and it doesn't even have to be in perfect index order really, as what
I'm really trying to do is significantly decrease the amount of buffers
required for a typical query.  Having to order the tuples later is a small
matter.

 
 Yeah, index-organized tables seems exact fit for your problem, but then
 my abstract idea of what clustering should do is exactly that - keep
 tuples in roughly the same order as an index ;)


It would be a good approximation of what index-organizing aims to
accomplish.

 
 So what really is needed is a smart tuple-placer which can keep tuples
 that are close (as defined by index) together in a small number of
 pages. These pages themselves need not be coninuous, they can be
 sprinkled around in the whole data table, but they need to stay clusters
 of index-close tuples.


More or less, yes.

Cheers,

-James Rogers
 [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] initdb

2003-10-04 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I now have a C implementation of initdb, ...
 There's a little work still to go (see below), but I'd appreciate some 
 eyeballs on it to see if I have made any major booboos, or could have 
 done things better. What's the best way to proceed?

Put it on a web page and post the link, or if you don't have a website,
send it to pgsql-patches (clearly marked as not ready to apply).

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

2003-10-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
 A while back I outlined how this would have to be done, and for it to
 be done efficiently, it would be anything BUT simple.  

 Could this be made a TODO item, perhaps with your attack plan. 

If I recall that discussion correctly, no one including Christopher
thought the attack plan was actually reasonable.

What this keeps coming down to is that an optimization that helps only
COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
development and maintenance effort to justify its existence.

At least if you insist on an exact, MVCC-correct answer.  So far as I've
seen, the actual use cases for unqualified COUNT(*) could be handled
equally well by an approximate answer.  What we should be doing rather
than wasting large amounts of time trying to devise exact solutions is
telling people to look at pg_class.reltuples for approximate answers.
We could also be looking at beefing up support for that approach ---
maybe provide some syntactic sugar for the lookup, maybe see if we can
update reltuples in more places than we do now, make sure that the
autovacuum daemon includes keep reltuples accurate as one of its
design goals, etc etc.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function

2003-10-04 Thread Hannu Krosing
Tom Lane kirjutas L, 04.10.2003 kell 19:07:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
  A while back I outlined how this would have to be done, and for it to
  be done efficiently, it would be anything BUT simple.  
 
  Could this be made a TODO item, perhaps with your attack plan. 
 
 If I recall that discussion correctly, no one including Christopher
 thought the attack plan was actually reasonable.
 
 What this keeps coming down to is that an optimization that helps only
 COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
 development and maintenance effort to justify its existence.

Please read further in my email ;)

The point I was trying to make was that faster count(*)'s is just a side
effect. If we could (conditionally) keep visibility info in indexes,
then this would also solve the problem fo much more tricky question of
index-structured tables.

Count(*) is *not* the only query that could benefit from not needing to
go to actual data table for visibilty info, The much more needed case
would be the inveres time series type of queries, which would
otherways trash cache pages badly.


Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd and Bundled ...)

2003-10-04 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 (BTW, on my OS X machine, with out-of-the-box configuration, initdb
 selects shared_buffers 400 and max_connections 20.  I'm guessing that
 you had either a nondefault shared memory limit, or some other process
 using shared memory.)

This points out another issue with this approach of probing for the maximum
shared memory. There might be another program using shared memory when the
probe is done -- or worse when the database is started but *not* when the
probe is run.

Perhaps the shared_buffers should only be set to 50% of the maximum size
probed? That would a) give postgres a decent chance of starting even if some
other program uses some amount of the shared memory between initdb and
database starting. and b) leave enough memory for a reasonable
max_connections?

 Anyone see a better way?

Switch everything to mmap and pthreads and dump all this antiquated SysV IPC
and semaphore junk? *DUCK*

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Index/Function organized table layout

2003-10-04 Thread Greg Stark

James Rogers [EMAIL PROTECTED] writes:

 On 10/2/03 11:34 PM, Hannu Krosing [EMAIL PROTECTED] wrote:
  
  What I actually thought I was describing is how CLUSTER should work in a
  postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
  whole feature.
 
 
 Yeah, I can see this.  Clustering doesn't really imply ordering, just
 tuple proximity in the heap.  

So I'm a bit confused about the term Clustering. It seems Postgres uses it
to mean simply ordering the tuple storage within an otherwise normal table.

However in other databases it seems to mean something more complex. I've never
used it in Oracle, but from what I read it seems Oracle thinks clustering
means storing the tuples for one table in the heap for *another* table
entirely.

The typical usage scenario envisioned is to store the tuples for child records
near the parent record in a related table. Ie, say you have a users table with
an ACL list, each user has possibly many ACL entries on the ACL list. You
could cluster the ACL entry list onto the users table so that the entries for
a given user would be stored *in the users table heap* near the record for
that user.

I've never seen anyone use this feature, and I never seriously considered it
myself. It sort of has the feel of an antiquated feature that traded too much
flexibility and abstraction for raw performance on very slow disk hardware. 
However I wonder if the nested tables feature doesn't use it under the hood
though. It seems they would both be useful for the same types of tables.

I'm not sure what this means for Postgres. I'm not sure if Postgres should use
a different name to avoid confusion and possibly to leave room in the future
for the possibility of supporting something like this. Or perhaps something
like this would be useful for Postgres now or in the near future? Or perhaps
the consensus is as I said, that this is an old idea that no longer gets any
respect and postgres should just pretend it doesn't exist?

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd and Bundled ...)

2003-10-04 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Perhaps the shared_buffers should only be set to 50% of the maximum size
 probed?

I think it's reasonable to expect the DBA to make any adjustments needed
for changes in environment.  Partly this is because I don't see any
defensible way to do otherwise --- your 50% slop figure is without
foundation in terms of what might really be going on --- and partly
because we'd be handicapping ourselves unnecessarily if there *aren't*
any subsequent changes in environment.

On machines where shared memory actually gets used for anything by
default, I think that the default limits are likely to be fairly sane.
If shared memory is tight, then very likely Postgres is the only thing
on the machine that's going to want it.  We might as well use what we
can get.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Index/Function organized table layout

2003-10-04 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 So I'm a bit confused about the term Clustering. It seems Postgres uses it
 to mean simply ordering the tuple storage within an otherwise normal table.

 However in other databases it seems to mean something more complex.

My take is that clustering means not only placing related tuples near
each other, but taking steps to preserve that organization over time.
PG is really misusing the term because our current CLUSTER command does
the first but not the second.

If tuple insertions were to try to preserve the heap ordering induced
by the latest CLUSTER command, then I think we'd have something closer
to what is usually meant by the term.

 I've never used it in Oracle, but from what I read it seems Oracle
 thinks clustering means storing the tuples for one table in the heap
 for *another* table entirely.

I think that's an implementation detail rather than the fundamental
meaning.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd

2003-10-04 Thread Joshua D. Drake


Anyone see a better way?
   

Switch everything to mmap and pthreads and dump all this antiquated SysV IPC
and semaphore junk? *DUCK*
 

You are a brave soult. I salute you.

Sincerely,

Joshua D. Drake



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC - S/JDBC
Postgresql support, programming, shared hosting and dedicated hosting.
+1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL.Org - Editor-N-Chief - http://www.postgresql.org


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

2003-10-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 The point I was trying to make was that faster count(*)'s is just a side
 effect. If we could (conditionally) keep visibility info in indexes,

I think that's not happening, conditionally or otherwise.  The atomicity
problems alone are sufficient reason why not, even before you look at
the performance issues.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Uses for Index/Function organizing

2003-10-04 Thread James Rogers
On 10/4/03 2:00 AM, Hannu Krosing [EMAIL PROTECTED] wrote:
 
 If the WHERE clause could use the same index (or any index with
 visibility info) then there would be no need for walking through the
 tuples in data relation.
 
 the typical usecase cited on [HACKERS] was time series data, where
 inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
 order. Now if the index would include all relevant fields
 (id,timestamp,data1,data2,...,dataN) then the query could run on index
 only touching just a few pages and thus vastly improving performance. I
 agree that this is not something everybody needs, but when it is needed
 it is needed bad.



I would add that automatically index-organizing tuples isn't just useful for
time-series data (though it is a good example), but can be used to
substantially improve the query performance of any really large table in a
number of different and not always direct ways.  Once working sets routinely
exceed the size of physical RAM, buffer access/utilization efficiency often
becomes the center of performance tuning, but not one that many people know
much about.

One of the less direct ways of using btree-organized tables for improving
scalability is to materialize table indexes of tables that *shouldn't* be
btree-organized.  Not only can you turn tables into indexes, but you can
also turn indexes into tables, which can have advantages in some cases.


For example, I did some scalability consulting at a well-known movie rental
company with some very large Oracle databases running on big Sun boxen.  One
of the biggest problems was that their rental history table, which had a
detailed record of every movie ever rented by every customer, had grown so
large that the performance was getting painfully slow.  To make matters
worse, it and a few related tables had high concurrent usage, a mixture of
many performance-sensitive queries grabbing windows of a customer's history
plus a few broader OLAP queries which were not time sensitive.  Everything
was technically optimized in a relational and basic configuration sense, and
the database guys at the company were at a loss on how to fix this problem.
Performance of all queries was essentially bound by how fast pages could be
moved between the disk and buffers.

Issue #1:  The history rows had quite a lot of columns and the OLAP
processes used non-primary indexes, so the table was not particularly
suitable for btree-organizing.

Issue #2:  Partitioning was not an option because it would have exceeded
certain limits in Oracle (at that time, I don't know if that has changed).

Issue #3:  Although customer histories were being constantly queried, data
needed most was really an index view of the customers history, not the
details of the history itself.


The solution I came up with was to use a synced btree-organized partial
clone of the main history table that only contained a small number of key
columns that mattered for generating customer history indexes in the
applications that used them.  While this substantially increased the disk
space footprint for the same data (since we were cloning it), it greatly
reduced the total number of cache misses for the typical query, only
fetching the full history row pages when actually needed.  In other words,
basically concentrating more buffer traffic into a smaller number of page
buffers.  What we had was an exceedingly active but relatively compact
materialized index of the history table that could essentially stay resident
in RAM, and a much less active history table+indexes that while less likely
to be buffered than before, had pages accessed at such a reduced frequency
that there was a huge net performance gain because disk access plummeted.

Average performance improvement for the time sensitive queries: 50-70x

So btree-organized tables can do more than make tables behave like indexes.
They can also make indexes behave like tables.  Both are very useful in some
cases when your working set exceeds the physical buffer space.  For smaller
databases this has much less utility and users need to understand the
limitations, nonetheless when tables and databases get really big it becomes
an important tool in the tool belt.

Cheers,

-James Rogers
 [EMAIL PROTECTED]


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [pgsql-hackers-win32] [HACKERS] initdb

2003-10-04 Thread Andrew Dunstan

Excellent idea.

Here's the URL: http://www.dunslane.net/~andrew/Initdb_In_C.html

cheers

andrew

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]

 Andrew Dunstan [EMAIL PROTECTED] writes:
  I now have a C implementation of initdb, ...
  There's a little work still to go (see below), but I'd appreciate some 
  eyeballs on it to see if I have made any major booboos, or could have 
  done things better. What's the best way to proceed?
 
 Put it on a web page and post the link, or if you don't have a website,
 send it to pgsql-patches (clearly marked as not ready to apply).
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Alvaro Herrera
On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Do we move empty index pages to the end before truncation during vacuum
  full?
 
 No.  You'd be better off using REINDEX for that, I think.  IIRC we have
 speculated about making VAC FULL fix the indexes via REINDEX rather than
 indexbulkdelete.

I can't agree with that idea.  Imagine having to VACUUM FULL a huge
table.  Not only it will take the lot required to do the VACUUM in the
heap itself, it will also have to rebuild all indexes from scratch.  I
think there are scenarios where the REINDEX will be much worse, say when
there are not too many deleted tuples (but in that case, why is the user
doing VACUUM FULL in the first place?).  Of course there are also
scenario where the opposite is true.

I wonder if VACUUM FULL could choose what method to use based on some
statistics.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte (Andre Breton)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Do we move empty index pages to the end before truncation during vacuum
  full?
 
 No.  You'd be better off using REINDEX for that, I think.  IIRC we have
 speculated about making VAC FULL fix the indexes via REINDEX rather than
 indexbulkdelete.

I guess my point is that if you forget to run regular vacuum for a
month, then realize the problem, you can just do a VACUUM FULL and the
heap is back to a perfect state as if you had been running regular
vacuum all along.  That is not true of indexes.  It would be nice if it
would.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Alvaro Herrera
On Sat, Oct 04, 2003 at 11:17:09PM -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   Do we move empty index pages to the end before truncation during vacuum
   full?
  
  No.  You'd be better off using REINDEX for that, I think.  IIRC we have
  speculated about making VAC FULL fix the indexes via REINDEX rather than
  indexbulkdelete.
 
 I guess my point is that if you forget to run regular vacuum for a
 month, then realize the problem, you can just do a VACUUM FULL and the
 heap is back to a perfect state as if you had been running regular
 vacuum all along.  That is not true of indexes.  It would be nice if it
 would.

In this scenario, the VACUUM FULL-does-REINDEX idea would be the perfect
fit because it will probably be much faster than doing indexbulkdelete.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Endurecerse, pero jamás perder la ternura (E. Guevara)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote:
 No.  You'd be better off using REINDEX for that, I think.  IIRC we have
 speculated about making VAC FULL fix the indexes via REINDEX rather than
 indexbulkdelete.

 I can't agree with that idea.

Why not?  There is plenty of anecdotal evidence in the archives saying
that it's faster to drop indexes, VACUUM FULL, recreate indexes than
to VACUUM FULL with indexes in place.  Most of those reports date from
before we had the lazy-vacuum alternative, but I don't think that
renders them less relevant.

 Imagine having to VACUUM FULL a huge
 table.  Not only it will take the lot required to do the VACUUM in the
 heap itself, it will also have to rebuild all indexes from scratch.

A very large chunk of VACUUM FULL's runtime is spent fooling with the
indexes.  Have you looked at the code in any detail?  It goes like this:

1. Scan heap looking for dead tuples and free space.

2. Make a pass over the indexes to delete index entries for dead tuples.

3. Copy remaining live tuples to lower-numbered pages to compact heap.
3a.  Every time we copy a tuple, make new index entries pointing to its
 new location.  (The old index entries still remain, though.)

4. Commit transaction so that new copies of moved tuples are good and
   old ones are not.

5. Make a pass over the indexes to delete index entries for old copies
   of moved tuples.

When there are only a few tuples being moved, this isn't too bad of a
strategy.  But when there are lots, steps 2, 3a, and 5 represent a huge
amount of work.  What's worse, step 3a swells the index well beyond its
final size.  This used to mean permanent index bloat.  Nowadays step 5
will be able to recover some of that space --- but not at zero cost.

I think it's entirely plausible that dropping steps 2, 3a, and 5 in
favor of an index rebuild at the end could be a winner.

 I think there are scenarios where the REINDEX will be much worse, say when
 there are not too many deleted tuples (but in that case, why is the user
 doing VACUUM FULL in the first place?).

Yeah, I think that's exactly the important point.  These days there's
not a lot of reason to do VACUUM FULL unless you have a major amount of
restructuring to do.  I would once have favored maintaining two code
paths with two strategies, but now I doubt it's worth the trouble.
(Or I should say, we have two code paths, the other being lazy VACUUM
--- do we need three?)

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 No.  You'd be better off using REINDEX for that, I think.

 I guess my point is that if you forget to run regular vacuum for a
 month, then realize the problem, you can just do a VACUUM FULL and the
 heap is back to a perfect state as if you had been running regular
 vacuum all along.  That is not true of indexes.  It would be nice if it
 would.

A VACUUM FULL that invoked REINDEX would accomplish that *better* than
one that didn't, because of the problem of duplicate entries for moved
tuples.  See my response just now to Alvaro.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized

2003-10-04 Thread Bruce Momjian
Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  The point I was trying to make was that faster count(*)'s is just a side
  effect. If we could (conditionally) keep visibility info in indexes,
 
 I think that's not happening, conditionally or otherwise.  The atomicity
 problems alone are sufficient reason why not, even before you look at
 the performance issues.

What are the atomicity problems of adding a create/expire xid to the
index tuples?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-04 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  No.  You'd be better off using REINDEX for that, I think.
 
  I guess my point is that if you forget to run regular vacuum for a
  month, then realize the problem, you can just do a VACUUM FULL and the
  heap is back to a perfect state as if you had been running regular
  vacuum all along.  That is not true of indexes.  It would be nice if it
  would.
 
 A VACUUM FULL that invoked REINDEX would accomplish that *better* than
 one that didn't, because of the problem of duplicate entries for moved
 tuples.  See my response just now to Alvaro.

Right, REINDEX is closer to what you expect VACUUM FULL to be doing ---
it mimicks the heap result of full compaction.

I think Alvero's point is that if you are doing VACUUM FULL on a large
table with only a few expired tuples, the REINDEX could take a while,
which would seem strange considering you only have a few expired tuples
--- maybe we should reindex only if +10% of the heap rows are expired,
or the index contains +10% empty space, or something like that.

Of course, that is very abitrary, but only VACUUM knows how many rows it
is moving --- the user typically will not know that.

In an extreme case with always REINDEX, I can imagine a site that is
doing only VACUUM FULL at night, but no regular vacuums, and they find
they can't do VACUUM FULL at night anymore because it is taking too
long.  By doing REINDEX always, we eliminate some folks are are happy
doing VACUUM FULL at night, because very few tuples are expired.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] Open 7.4 items

2003-10-04 Thread Bruce Momjian
   P O S T G R E S Q L

  7 . 4  O P E NI T E M S


Current at ftp://momjian.postgresql.org/pub/postgresql/open_items.

Changes
---
Fix REVOKE ALL ON FUNCTION error when removing owner permissions
Improve speed of building of constraints during restore
What to do about exposing the list of possible SQLSTATE error codes

Documentation Changes
-
Move release notes to SGML
Freeze message strings



-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Open 7.4 items

2003-10-04 Thread Stephan Szabo

On Sun, 5 Oct 2003, Bruce Momjian wrote:

 Improve speed of building of constraints during restore

Did we get consensus on what to do with this, whether we're doing
only the superuser option to not check, only speeding up fk constraint
checks by using a statement instead of the repeated calls, both, or
something else?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html