Re: [HACKERS] Where to hook my custom access control module?

2007-05-10 Thread Karel Gardas

On Wed, 9 May 2007, Andrew Dunstan wrote:


Karel Gardas wrote:




[...]


I'd like to look at a possibility of integrating OpenPMF
(http://www.openpmf.org) with the PostgreSQL.


[...]


Exactly what is open about this product other than the name? It looks
closed and proprietary to me.


It was free software (GPL) at the beginning, but we were not able to 
attract community around it, it seems it was put to public too early, so 
we decided to put it back to closed source and especially improve its 
platform support. Hence I'm asking here for any authorization hook inside 
PGSQL.


Thanks,
Karel
--
Karel Gardas  [EMAIL PROTECTED]
ObjectSecurity Ltd.   http://www.objectsecurity.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Zoltan Boszormenyi

Tom Lane írta:

After some more study of the SQL spec, the distinction between GENERATED
ALWAYS AS IDENTITY and GENERATED BY DEFAULT AS IDENTITY is not what
I thought it was.

* As far as I can find from the spec, there is *no* difference between
the two cases for INSERT commands.  The rule is that you ignore any
user-supplied data and use the default (ie, nextval()) unless OVERRIDING
SYSTEM VALUE is specified.  It is not an error to try to insert data
into an identity column, it's just ignored unless OVERRIDING SYSTEM
VALUE.

* The difference for UPDATE commands is that you can update a BY DEFAULT
identity column to anything you want, whereas for an ALWAYS identity
it's an error to update to anything but DEFAULT (which causes a fresh
nextval() to be assigned).  Both behaviors are different from a
generated column, which is updated whether you mention it or not.
  


The quoted SIGMOD paper mentioned that specifying a value
for a generated column should raise an error in INSERT but
this behaviour is not mentioned by the standard.
BTW, do you know what's a self-referencing column?
I haven't found a definition of it and there are places where the standard
uses this term on behaviour that would be natural for generated columns.
E.g. page 860 in latest drafts, section 10.14, or SQL:2003, section 14.8,
about INSERT statement:  the value the user specified should be stored if
some underlying column of Ci is a self-referencing column and
OVERRIDING SYSTEM VALUE is specified.


This means that GENERATED BY DEFAULT AS IDENTITY is not at all
equivalent to our historical behavior for SERIAL columns and hence we
cannot merge the two cases.
  


Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf
or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts.
(The latter seems to be misnamed considering that www.wiscorp.com
refreshed the sql200n.zip on 2007-03-11.) Page 860, section 14.10, INSERT.
The value the user provides should be accepted for storage if:
- the column is an identity column and you provide
 OVERRIDING SYSTEM VALUE, or
- the column is an GENERATED BY DEFAULT AS IDENTITY
 and you provide neither OVERRIDING USER VALUE nor
 the DEFAULT specification for the column.

I think the babble about OVERRIDING USER VALUE
is somewhat controversial. Why would you want to do e.g.
INSERT INTO tabname (id, ...) OVERRIDING USER VALUE (N, ...);
where N is an explicit constant? And I haven't even implemented
handling it. Anyway, without specifying OVERRIDING USER VALUE
the GENERATED BY DEFAULT AS IDENTITY is
equivalent with traditional SERIAL in PostgreSQL.
Implementing OVERRIDING USER VALUE behaviour means
that GENERATED BY DEFAULT AS IDENTITY (or SERIAL) would be
marked as an identity as well, not as a column simply having a DEFAULT 
clause.

Otherwise OVERRIDING USER VALUE would override every
user-specified value for regular columns having a DEFAULT expression.


The lack of any behavioral difference for INSERT seems surprising
and counterintuitive; have I just missed something in the spec?
  


No, I was just ahead of the times and read newer drafts than SQL:2003.


BTW, I found what they did about the problem that generated columns
are out of sync with their underlying columns during BEFORE-trigger
execution: in 11.39

	12)If BEFORE is specified, then: 
	   ...

   c) The triggered action shall not contain a field
  reference that references a field in the new transition
  variable corresponding to a generated column of T.
  


I vaguely remember reading it, although the idea seem to have remained
in my mind. :-)


IOW they just pretend you can't look.  So I think we need not worry
about leaving the values out-of-date until after the triggers fire.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

  



--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

The quoted SIGMOD paper mentioned that specifying a value
for a generated column should raise an error in INSERT but
this behaviour is not mentioned by the standard.


I found it now, I haven't read hard enough before.
SQL:2003, section 14.8, syntax rules:

10) If contextually typed table value constructor CTTVC is specified, 
then every contextually typed row
   value constructor element simply contained in CTTVC whose 
positionally corresponding column name
   in insert column list references a column of which some underlying 
column is a generated column shall

   be a default specification.

So, I can only omit the generated column or specify DEFAULT.
Anything else should raise an error. Should it be done in analyze.c
or in rewriteHandler.c?

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [PERFORM] Cannot make GIN intarray index be used by the planner

2007-05-10 Thread Valentine Gogichashvili

Hello again,

I got the opclass for the index and it looks like it is a default one

myvideoindex=# select pg_opclass.*, pg_type.typname
myvideoindex-#   from pg_index, pg_opclass, pg_type
myvideoindex-#  where pg_index.indexrelid =
'idx_nonnulls_myintarray_int4_gin'::regclass
myvideoindex-#and pg_opclass.oid = any (pg_index.indclass::oid[] )
myvideoindex-#and pg_type.oid = pg_opclass.opcintype;

opcamid |  opcname  | opcnamespace | opcowner | opcintype | opcdefault |
opckeytype | typname
-+---+--+--+---+++-
   2742 | _int4_ops |   11 |   10 |  1007 | t
| 23 | _int4
(1 row)

The search_path is set to the following

myvideoindex=# show search_path;
   search_path

versionA, public
(1 row)

With best regards,

-- Valentine

On 5/9/07, Tom Lane [EMAIL PROTECTED] wrote:


[cc'ing to pgsql-hackers since this is looking like a contrib/intarray
bug]

Valentine Gogichashvili [EMAIL PROTECTED] writes:
 here is the DT

That works fine for me in 8.2:

regression=#  explain SELECT id, (myintarray_int4)
  FROM myintarray_table_nonulls
WHERE ARRAY[8] @ myintarray_int4;
QUERY PLAN

--
Index Scan using idx_nonnulls_myintarray_int4_gin on
myintarray_table_nonulls  (cost=0.00..8.27 rows=1 width=36)
   Index Cond: ('{8}'::integer[] @ myintarray_int4)
(2 rows)

What I am betting is that you've installed contrib/intarray in this
database and that's bollixed things up somehow.  In particular, intarray
tries to take over the position of default gin opclass for int4[],
and the opclass that it installs as default has operators named just
like the built-in ones.  If somehow your query is using pg_catalog.@
instead of intarray's public.@, then the planner wouldn't think the
index is relevant.

In a quick test your example still works with intarray installed, because
what it's really created is public.@ (integer[], integer[]) which is
an exact match and therefore takes precedence over the built-in
pg_catalog.@ (anyarray, anyarray).  But if for example you don't have
public in your search_path then the wrong operator would be chosen.

Please look at the pg_index entry for your index, eg

select * from pg_index where indexrelid =
'versionA.idx_nonnulls_myintarray_int4_gin'::regclass;

and see whether the index opclass is the built-in one or not.

Note to hackers: we've already discussed that intarray shouldn't be
trying to take over the default gin opclass, but I am beginning to
wonder if it still has a reason to live at all.  We should at least
consider removing the redundant operators to avoid risks like this one.

regards, tom lane





--
ვალენტინ გოგიჩაშვილი
Valentine Gogichashvili


Re: [HACKERS] Windows Vista support (Buildfarm Vaquita)

2007-05-10 Thread Michael Meskes
On Wed, May 09, 2007 at 12:46:52PM +0100, Dave Page wrote:
 Oh, hang on... Vista's new 'security' features include popups that ask 
 permission from the user before running any installers. One of the more 
 basic checks they use is the filename - *anything* called setup.exe will 
 cause user confirmation to be required before it will run. I believe for 
 non-interactive sessions it'll just refuse to run. I just tried running 
 update.exe myself, and yes, you guessed it, a user confirmation dialog 
 popped up :-(

Seems to be a little bit braindead to me. But anyway, I renamed it and
just committed the changes. Let's see if this works. 

Michael

P.S.: More on the other problem later.
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Windows Vista support (Buildfarm Vaquita)

2007-05-10 Thread Dave Page
Michael Meskes wrote:
 On Wed, May 09, 2007 at 12:46:52PM +0100, Dave Page wrote:
 Oh, hang on... Vista's new 'security' features include popups that ask 
 permission from the user before running any installers. One of the more 
 basic checks they use is the filename - *anything* called setup.exe will 
 cause user confirmation to be required before it will run. I believe for 
 non-interactive sessions it'll just refuse to run. I just tried running 
 update.exe myself, and yes, you guessed it, a user confirmation dialog 
 popped up :-(
 
 Seems to be a little bit braindead to me. 

Yeah - according to Microsoft we should include a manifest with the
executable these days that can prevent the check by specifying that
administrative privileges won't be needed by the executable - but that
involves us adding version resources to the exe, and generating the
manifest during build which seems somewhat over the top for a quick
regression test.

 But anyway, I renamed it and
 just committed the changes. Let's see if this works. 

Thanks.

 P.S.: More on the other problem later.

OK.

Regards, Dave.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Zeugswetter Andreas ADI SD

 In reference to the seq scans roadmap, I have just submitted 
 a patch that addresses some of the concerns.
 
 The patch does this:
 
 1. for small relation (smaller than 60% of bufferpool), use 
 the current logic 2. for big relation:
   - use a ring buffer in heap scan
   - pin first 12 pages when scan starts
   - on consumption of every 4-page, read and pin the next 4-page
   - invalidate used pages of in the scan so they do not 
 force out other useful pages

A few comments regarding the effects:

I do not see how this speedup could be caused by readahead, so what are
the effects ?
(It should make no difference to do the CPU work for count(*) inbetween
reading each block when the pages are not dirtied)
Is the improvement solely reduced CPU because no search for a free
buffer is needed and/or L2 cache locality ?

What effect does the advance pinnig have, avoid vacuum ?

A 16 x 8k page ring is too small to allow the needed IO blocksize of
256k.
The readahead is done 4 x one page at a time (=32k).
What is the reasoning behind 1/4 ring for readahead (why not 1/2), is
3/4 the trail for followers and bgwriter ?

I think in anticipation of doing a single IO call for more that one
page, the KillAndReadBuffer function should be split into two parts. One
that does the killing
for n pages, and one that does the reading for n pages.
Killing n before reading n would also have the positive effect of
grouping perhaps needed writes (not interleaving them with the reads).

I think the 60% Nbuffers is a very good starting point. I would only
introduce a GUC when we see evidence that it is needed (I agree with
Simon's partitioning comments, but I'd still wait and see). 

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas

Zeugswetter Andreas ADI SD wrote:
In reference to the seq scans roadmap, I have just submitted 
a patch that addresses some of the concerns.


The patch does this:

1. for small relation (smaller than 60% of bufferpool), use 
the current logic 2. for big relation:

- use a ring buffer in heap scan
- pin first 12 pages when scan starts
- on consumption of every 4-page, read and pin the next 4-page
	- invalidate used pages of in the scan so they do not 
force out other useful pages


A few comments regarding the effects:

I do not see how this speedup could be caused by readahead, so what are
the effects ?


I was wondering that as well. We'd really need to test all the changes 
separately to see where the improvements are really coming from.


Also, that patch doesn't address the VACUUM issue at all. And using a 
small fixed size ring with scans that do updates can be devastating. I'm 
experimenting with different ring sizes for COPY at the moment. Too 
small ring leads to a lot of WAL flushes, it's basically the same 
problem we have with VACUUM in CVS HEAD.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Zeugswetter Andreas ADI SD

 Also, that patch doesn't address the VACUUM issue at all. And 
 using a small fixed size ring with scans that do updates can 
 be devastating. I'm experimenting with different ring sizes 
 for COPY at the moment. Too small ring leads to a lot of WAL 
 flushes, it's basically the same problem we have with VACUUM 
 in CVS HEAD.

My first take on that would be to simply abandon any dirty (and actually
also any still pinned) buffer from the ring and replace the ring slot
with a buffer from the freelist.
If the freelist is empty and LSN allows writing the buffer, write it
(and maybe try to group these).
If the LSN does not allow the write, replace the slot with a buffer from
LRU.

Andreas 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas

Zeugswetter Andreas ADI SD wrote:
Also, that patch doesn't address the VACUUM issue at all. And 
using a small fixed size ring with scans that do updates can 
be devastating. I'm experimenting with different ring sizes 
for COPY at the moment. Too small ring leads to a lot of WAL 
flushes, it's basically the same problem we have with VACUUM 
in CVS HEAD.


My first take on that would be to simply abandon any dirty (and actually
also any still pinned) buffer from the ring and replace the ring slot
with a buffer from the freelist.
If the freelist is empty and LSN allows writing the buffer, write it
(and maybe try to group these).
If the LSN does not allow the write, replace the slot with a buffer from
LRU.


That would effectively disable the ring for COPY and the 2nd phase of 
VACUUM.


One problem with looking at the LSN is that you need the content lock to 
read it, and I wouldn't want to add any new locking. It could be done 
inside FlushBuffer when we hold the lock anyway, but I'm afraid the 
changes would be pretty invasive.


I'm struggling to get a grip of what the optimal ring size is under 
various circumstances. Some thoughts I have this far:

- a small ring gives better L2 cache behavior
- for read-only queries, and for queries that just hint bits, 1 buffer 
is enough
- small ring with query that writes WAL (COPY, mass updates, 2nd phase 
of VACUUM) leads to a lot of WAL flushes, which can become bottleneck.


But all these assumptions need to be validated. I'm setting up tests 
with different ring sizes and queries to get a clear picture of this:

- VACUUM on a clean table
- VACUUM on a table with 1 dead tuple per page
- read-only scan, large table
- read-only scan, table fits in OS cache
- COPY

In addition, I'm going to run VACUUM in a DBT-2 test to see the affect 
on other queries running concurrently.


I think a ring that grows when WAL flushes occur covers all the use 
cases reasonably well, but I need to do the testing...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-10 Thread Alvaro Herrera
Hiroshi Inoue wrote:
 Alvaro Herrera wrote:
 Robert Treat wrote:
 On Monday 07 May 2007 15:52, Joshua D. Drake wrote:
 Andrew Dunstan wrote:
 Hiroshi Inoue wrote:
 Maybe it's BSD which is different from the license of psqlodbc (LGPL).
 Is there no problem with their coexistence ?
 Or is it possible for psqlodbc to be LGPL entirely ?
 I am having difficulty in understanding what the problem is. My
 understanding is that using BSD licensed code is ok in an LGPL project,
 but (probably) not vice versa.
 To my knowledge you can do it either way, as long as you remember that
 any changes to the lgpl code have to be released.
 It's generally a very bad idea for a BSD licensed project to include lgpl 
 licensed code because people who try and use your work in thier own 
 projects,  under the assumption that it really is bsd licensed, get 
 bitten when they find out that they have now illegally included code that 
 is licensed via some other license.  
 
 Of course, the developer who owns the LGPL-licensed copyright is free to
 relicense his work under a different license, so if the ODBC developers
 want to contribute code to Postgres they can give their work under the
 Postgres license.  (They must obtain permission from all the involved
 developers, obviously).
 
 There are no original developers in the project now and I don't know
 where or how they are now. I personally am not so eager to change the
 license to BSD because it has been LGPL too long.

Yes, that is a problem for releasing old code whose developers are long
gone.  (What I was thinking was copying *new* code from psqlodbc into
Postgres).

 Oppositely I thought
 we can implement the BSD licensed autoconf macros by ourselves but I'm
 not sure how it can be considered as *not derived*.

ISTM it would be necessary to get legal advice to be sure that it would
be considered not derived, but one would think that that's too much
hassle for something that can be done much more simply by including the
differently-licensed files in the first place, which is legal anyway.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-10 Thread Hiroshi Inoue
Tom Lane wrote:
 Hiroshi Inoue [EMAIL PROTECTED] writes:
 Robert Treat wrote:
 It's generally a very bad idea for a BSD licensed project to include lgpl 
 licensed code
 
 Psqlodbc package is LGPL licensed and seems to have little problem to
 include copy of BSD licensed code as a part of it.
 
 Right, that direction is fine, it's the other way around that's
 problematic (because adding some BSD code adds no new restrictions on
 what users can do with an overall-LGPL project, whereas having some LGPL
 components in a supposedly BSD project does limit what they can do with
 it).  I don't see any reason why you shouldn't include those PG autoconf
 macros in psqlodbc.  You just need to document that they have a BSD
 license, in case someone wants to use them by themselves.

Yes I am in that direction. You may see some confusing words in my other
 posts but they are all I once thought.
Thanks for your clarification.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Tom Lane
Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 Tom Lane írta:
 This means that GENERATED BY DEFAULT AS IDENTITY is not at all
 equivalent to our historical behavior for SERIAL columns and hence we
 cannot merge the two cases.

 Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf
 or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts.

Hm.  So what we've got here is that the committee has decided the 2003
spec is broken, and they may someday come out with a revised definition
that might, or might not, bear any resemblance to the current 200n
working papers.  According to some off-list discussion, nobody is
entirely sure what the current draft is trying to say anyway.

That about tears it for me: I think we should reject at least the
IDENTITY parts of this patch, and very likely the entire thing.  I've
spent more than three days now trying to get it into a committable form,
time I can't really afford to spend right now on a patch that adds such
a marginal feature.

AFAICS the only actual new feature that IDENTITY adds is the ability to
make the default expression silently override user-specified insertion
data, as in fact was the use-case argued by you here:
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00038.php

Now we find that as the spec is actually written, it doesn't work that
way: if you try to specify a value other than DEFAULT for an inserted
column, you get a syntax error, unless you specify OVERRIDING SYSTEM
VALUE or OVERRIDING USER VALUE, neither of which are going to be found
in legacy apps ported from other DBMSes, and neither of which add any
actual new functionality (if you have to write OVERRIDING USER VALUE,
you could just as well not specify the column).  So I'm seeing a lot
of complexity and a lot of confusion added for not much, not even
considering the risks of trying to track a moving standards target.

As for GENERATED ALWAYS AS (expr), now that we understand that it's not
supposed to define a virtual column, what's the point?  You can get the
same behavior with a trivial BEFORE INSERT/UPDATE trigger that
recomputes the derived value, and you don't have to buy into the rather
ill-defined spec behavior (in particular the point that the generated
column is effectively undefined during trigger firing seems really
poorly done).  In fact, given that the only plausible use-cases involve
expressions that are expensive to compute, a trigger can probably do
*better* than the built-in feature, since it can make use of application
knowledge about when a recomputation is really necessary.  The current
patch recomputes the expression on every UPDATE, and would have a hard
time being any brighter than that, given that we don't know what BEFORE
triggers might do to the row.

So at this point I'm feeling that we've poured a lot of effort down a
hole :-(.  We are not really close to having a patch that implements
the current 200n draft (in particular note that OVERRIDING USER VALUE
is not a no-op according to this morning's understanding of the draft).
Even if we were, I'd vote against implementing a part of the draft
that's clearly still in flux --- if they change it again, we'd be stuck.
But the real bottom line is that I don't see enough use-case for these
features to justify the work done already, let alone a bunch more work.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-10 Thread Hiroshi Inoue

Alvaro Herrera wrote:

Hiroshi Inoue wrote:

Alvaro Herrera wrote:

Robert Treat wrote:

On Monday 07 May 2007 15:52, Joshua D. Drake wrote:

Andrew Dunstan wrote:

Hiroshi Inoue wrote:


snip


Of course, the developer who owns the LGPL-licensed copyright is free to
relicense his work under a different license, so if the ODBC developers
want to contribute code to Postgres they can give their work under the
Postgres license.  (They must obtain permission from all the involved
developers, obviously).

There are no original developers in the project now and I don't know
where or how they are now. I personally am not so eager to change the
license to BSD because it has been LGPL too long.


Yes, that is a problem for releasing old code whose developers are long
gone.  (What I was thinking was copying *new* code from psqlodbc into
Postgres).


What do you mean by *new* code?
New line?, word? or other kind of classification?


Oppositely I thought
we can implement the BSD licensed autoconf macros by ourselves but I'm
not sure how it can be considered as *not derived*.


ISTM it would be necessary to get legal advice to be sure that it would
be considered not derived, but one would think that that's too much
hassle for something that can be done much more simply by including the
differently-licensed files in the first place, which is legal anyway.


OK I understand.
Thanks.

regards,
Hiroshi Inoue


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Windows Vista support (Buildfarm Vaquita)

2007-05-10 Thread Michael Meskes
On Thu, May 10, 2007 at 11:05:39AM +0100, Dave Page wrote:
  P.S.: More on the other problem later.
 
 OK.

Dave, I just committed some small changes to get additional error
logging. Hopefully this enables me to find out where exactly the error
is coming up. If possible could you please restart a run on vista?

Thanks

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [pgsql-es-ayuda] Error SSL en Postgresql

2007-05-10 Thread Alvaro Herrera
Just got the report below in pgsql-es-ayuda.  He says his logs files are
filled with 50 MB of the error message

LOG:  SSL SYSCALL error: Unknown winsock error 10004

I looked up 10004 and apparently it's WSAEINTR, according to
http://msdn2.microsoft.com/en-us/library/aa924071.aspx

Interrupted function call. This error is returned when a socket
is closed or a process is terminated, on a pending Winsock
operation for that socket.

I wonder if the SSL code should be rather interpreting this error code
as one of those for which it must retry.  It seems strange that it be
named EINTR (which sounds to me like operation interrupted) and then
talk about closed sockets and terminated processes.

This is Windows XP, Postgres 8.2.3.


Henry escribió:
 hola ke tal,
 estoy en una bd de prueba con postgresql 8.2.3 y Winxp, y he generado un 
 certificado SSL con nombre server.crt y server.key en la carpeta Data, y 
 establecido la opcion a  SSL=True 
 no se como de la nada comenzo a darme malos resultados las funciones de 
 postgres, me refiero a los valores de retorno
 
 cuando doy una ojeada en los log y me doy con la sorpresa que tengo como 5 
 archivos .txt que pesan 10MB
 y cada linea que ocupa un archivo dice:
 
 LOG:  SSL SYSCALL error: Unknown winsock error 10004
 LOG:  SSL SYSCALL error: Unknown winsock error 10004
 LOG:  SSL SYSCALL error: Unknown winsock error 10004
 LOG:  SSL SYSCALL error: Unknown winsock error 10004

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Windows Vista support (Buildfarm Vaquita)

2007-05-10 Thread Dave Page
Michael Meskes wrote:
 On Thu, May 10, 2007 at 11:05:39AM +0100, Dave Page wrote:
 P.S.: More on the other problem later.
 OK.
 
 Dave, I just committed some small changes to get additional error
 logging. Hopefully this enables me to find out where exactly the error
 is coming up. If possible could you please restart a run on vista?

Running now - I won't have access to the machine again until Monday now
though.

Regards, Dave

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
But all these assumptions need to be validated. I'm setting up tests 
with different ring sizes and queries to get a clear picture of this:

- VACUUM on a clean table
- VACUUM on a table with 1 dead tuple per page
- read-only scan, large table
- read-only scan, table fits in OS cache
- COPY


Just to keep you guys informed, here's my results on a read-only scan on 
a table bigger than shared_buffers but smaller than RAM:


 select-1| 00:00:10.853831
 select-1| 00:00:10.380667
 select-1| 00:00:11.530528
 select-2| 00:00:08.634105
 select-2| 00:00:02.674084
 select-4| 00:00:02.65664
 select-8| 00:00:02.662922
 select-16   | 00:00:02.682475
 select-32   | 00:00:02.693163
 select-64   | 00:00:02.722031
 select-128  | 00:00:02.873645
 select-256  | 00:00:03.185586
 select-512  | 00:00:03.534285
 select-1024 | 00:00:03.741867

lshw utility tells me that this server has 32KB of L1 cache and 4MB of 
L2 cache. The performance starts to drop between 64-128 buffers, which 
is 512 - 1024 KB, so I'm not sure how it's related to cache size but 
using a small number of buffers is clearly better than using a large number.


However, it caught me by total surprise that the performance with 1 
buffer is so horrible. Using 2 buffers is enough to avoid whatever the 
issue is with just 1 buffer. I have no idea what's causing that. There 
must be some interaction that I don't understand.


All the numbers are quite repeatable, I ran the same test script many 
times. The runtime of the first select-2 test however varied between 
3-10 seconds, somehow the bad karma from using just 1 buffer in the 
earlier test carries over to the next test.


I'm not sure what to think about this, but I'll set up more test 
scenarios with VACUUM and COPY.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Question concerning failed installation of Postgres 8.2.3 on Windows XP

2007-05-10 Thread Sven Janson

Dear mailing list,

I have got a problem installing Postgres 8.2.3 on Windows XP Pro having 
all the latest updates.
All postgres installations, folders and the serviceaccount have been 
removed before so that it was clean installation.


The following steps lead to a reproducible result on a few machines but 
not on all:


I chose English as installer language and deselected all database 
drivers (JDBC, NPSQL, ...) for installation.
Then I just clicked Next in the following installer pages leaving all 
options as is. I entered passwords for both the superuser and the 
serviceaccount.

I selected German (Germany) as  locale and UTF8 as encoding.

Near the end of the installation a message box appears reporting the 
following error:
Failed to connect to the database. Procedural languages files are 
installed, but are not activated in any databases.


These are the critical lines written by msiexec in its log file when 
encountering the error:



MSI (s) (C0:E0) [15:17:00:531]: Executing op: 
ActionStart(Name=InstallPl,Description=Activating procedural languages...,)

Action 15:17:00: InstallPl. Activating procedural languages...
MSI (s) (C0:E0) [15:17:00:531]: Executing op: 
CustomActionSchedule(Action=InstallPl,ActionType=1025,Source=BinaryData,[EMAIL PROTECTED],CustomActionData=1033;postgres;kdsahp;5432;C:\Programme\PostgreSQL\8.2\share\;1;;;)
MSI (s) (C0:48) [15:17:00:593]: Invoking remote custom action. DLL: 
C:\WINDOWS\Installer\MSI77.tmp, Entrypoint: [EMAIL PROTECTED]

MSI (c) (D8:DC) [15:17:15:095]: Note: 1: 2205 2:  3: Error
MSI (c) (D8:DC) [15:17:15:095]: Note: 1: 2228 2:  3: Error 4: SELECT 
`Message` FROM `Error` WHERE `Error` = 2867

DEBUG: Error 2867:  The error dialog property is not set
The installer has encountered an unexpected error installing this 
package. This may indicate a problem with this package. The error code 
is 2867. The arguments are: , ,
MSI (c) (D8:DC) [15:17:15:095]: Font created.  Charset: Req=0, Ret=0, 
Font: Req=MS Shell Dlg, Ret=MS Shell Dlg


MSI (c) (D8:DC) [15:18:24:198]: Note: 1: 2205 2:  3: Error
MSI (c) (D8:DC) [15:18:24:214]: Note: 1: 2228 2:  3: Error 4: SELECT 
`Message` FROM `Error` WHERE `Error` = 2888

DEBUG: Error 2888:  Executing the TextStyle view failed
The installer has encountered an unexpected error installing this 
package. This may indicate a problem with this package. The error code 
is 2888. The arguments are: TextStyle, ,

MSI (c) (D8:DC) [15:18:38:028]: Note: 1: 2205 2:  3: Error
MSI (c) (D8:DC) [15:18:38:028]: Note: 1: 2228 2:  3: Error 4: SELECT 
`Message` FROM `Error` WHERE `Error` = 2888

DEBUG: Error 2888:  Executing the TextStyle view failed
The installer has encountered an unexpected error installing this 
package. This may indicate a problem with this package. The error code 
is 2888. The arguments are: TextStyle, ,

MSI (c) (D8:DC) [15:18:51:531]: Note: 1: 2205 2:  3: Error
MSI (c) (D8:DC) [15:18:51:531]: Note: 1: 2228 2:  3: Error 4: SELECT 
`Message` FROM `Error` WHERE `Error` = 2888

DEBUG: Error 2888:  Executing the TextStyle view failed
The installer has encountered an unexpected error installing this 
package. This may indicate a problem with this package. The error code 
is 2888. The arguments are: TextStyle, ,

MSI (c) (D8:DC) [15:21:42:346]: Note: 1: 2205 2:  3: Error
MSI (c) (D8:DC) [15:21:42:346]: Note: 1: 2228 2:  3: Error 4: SELECT 
`Message` FROM `Error` WHERE `Error` = 2888

DEBUG: Error 2888:  Executing the TextStyle view failed
The installer has encountered an unexpected error installing this 
package. This may indicate a problem with this package. The error code 
is 2888. The arguments are: TextStyle, ,
Failed to connect to the database. Procedural languages files are 
installed, but are not activated in any databases.


--

Could you please tell me how to work around or avoid this error?  Thank 
you very much in advance for your help!


Best regards, Sven


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
However, it caught me by total surprise that the performance with 1 
buffer is so horrible. Using 2 buffers is enough to avoid whatever the 
issue is with just 1 buffer. I have no idea what's causing that. There 
must be some interaction that I don't understand.


Ok, I found the reason for that. I was using this query for the selects:
SELECT COUNT(*) FROM (SELECT 1 FROM stock_copytest LIMIT 1000) AS a;

Stock_copytest is larger than RAM size, that's why I used the LIMIT to 
make the result set memory resident. That had the side effect that 
apparently the limit-node kept the single buffer pinned which defeated 
the buffer ring completely. To avoid issues like that we apparently want 
to use 2-4 buffers instead of just 1.


I'll review my test methodology and keep testing...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread CK Tan
The patch has no effect on scans that do updates. The  
KillAndReadBuffer routine does not force out a buffer if the dirty  
bit is set. So updated pages revert to the current performance  
characteristics.


-cktan
GreenPlum, Inc.

On May 10, 2007, at 5:22 AM, Heikki Linnakangas wrote:


Zeugswetter Andreas ADI SD wrote:
In reference to the seq scans roadmap, I have just submitted a  
patch that addresses some of the concerns.


The patch does this:

1. for small relation (smaller than 60% of bufferpool), use the  
current logic 2. for big relation:

- use a ring buffer in heap scan
- pin first 12 pages when scan starts
- on consumption of every 4-page, read and pin the next 4-page
	- invalidate used pages of in the scan so they do not force out  
other useful pages

A few comments regarding the effects:
I do not see how this speedup could be caused by readahead, so  
what are

the effects ?


I was wondering that as well. We'd really need to test all the  
changes separately to see where the improvements are really coming  
from.


Also, that patch doesn't address the VACUUM issue at all. And using  
a small fixed size ring with scans that do updates can be  
devastating. I'm experimenting with different ring sizes for COPY  
at the moment. Too small ring leads to a lot of WAL flushes, it's  
basically the same problem we have with VACUUM in CVS HEAD.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Planning large IN lists

2007-05-10 Thread Neil Conway
When planning queries with a large IN expression in the WHERE clause,
the planner transforms the IN list into a scalar array expression. In
clause_selectivity(), we estimate the selectivity of the ScalarArrayExpr
by calling scalararraysel(), which in turn estimates the selectivity of
*each* array element in order to determine the selectivity of the array
expression as a whole.

This is quite inefficient when the IN list is large. In a test case that
someone sent me privately, a simple query involving two cheap joins and
a ~1800 element IN list in the WHERE clause requires about 100ms to plan
but only ~10 ms to execute -- about 85% of the total runtime is spent in
scalararraysel(). (I'd include the profiling data, but KCacheGrind seems
stubbornly opposed to providing a textual summary of its results...)

Clearly, the current approach is fine when the array is small -- perhaps
for arrays above a certain number of elements, we could switch to
randomly sampling array elements, estimating their selectivities, and
then using that information to infer the estimated selectivity of the
entire array expression. That seems fairly crude, though: does anyone
have any better ideas?

-Neil



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread CK Tan
Sorry, 16x8K page ring is too small indeed. The reason we selected 16  
is because greenplum db runs on 32K page size, so we are indeed  
reading 128K at a time. The #pages in the ring should be made  
relative to the page size, so you achieve 128K per read.


Also agree that KillAndReadBuffer could be split into a  
KillPinDontRead(), and ReadThesePinnedPages() functions. However, we  
are thinking of AIO and would rather see a ReadNPagesAsync() function.


-cktan
Greenplum, Inc.

On May 10, 2007, at 3:14 AM, Zeugswetter Andreas ADI SD wrote:




In reference to the seq scans roadmap, I have just submitted
a patch that addresses some of the concerns.

The patch does this:

1. for small relation (smaller than 60% of bufferpool), use
the current logic 2. for big relation:
- use a ring buffer in heap scan
- pin first 12 pages when scan starts
- on consumption of every 4-page, read and pin the next 4-page
- invalidate used pages of in the scan so they do not
force out other useful pages


A few comments regarding the effects:

I do not see how this speedup could be caused by readahead, so what  
are

the effects ?
(It should make no difference to do the CPU work for count(*)  
inbetween

reading each block when the pages are not dirtied)
Is the improvement solely reduced CPU because no search for a free
buffer is needed and/or L2 cache locality ?

What effect does the advance pinnig have, avoid vacuum ?

A 16 x 8k page ring is too small to allow the needed IO blocksize of
256k.
The readahead is done 4 x one page at a time (=32k).
What is the reasoning behind 1/4 ring for readahead (why not 1/2), is
3/4 the trail for followers and bgwriter ?

I think in anticipation of doing a single IO call for more that one
page, the KillAndReadBuffer function should be split into two  
parts. One

that does the killing
for n pages, and one that does the reading for n pages.
Killing n before reading n would also have the positive effect of
grouping perhaps needed writes (not interleaving them with the reads).

I think the 60% Nbuffers is a very good starting point. I would only
introduce a GUC when we see evidence that it is needed (I agree with
Simon's partitioning comments, but I'd still wait and see).

Andreas





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Planning large IN lists

2007-05-10 Thread Lukas Kahwe Smith

Neil Conway wrote:


Clearly, the current approach is fine when the array is small -- perhaps
for arrays above a certain number of elements, we could switch to
randomly sampling array elements, estimating their selectivities, and
then using that information to infer the estimated selectivity of the
entire array expression. That seems fairly crude, though: does anyone
have any better ideas?


Optimizer hints in SQL
/me ducks and runs for cover.

regards,
Lukas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Planning large IN lists

2007-05-10 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 When planning queries with a large IN expression in the WHERE clause,
 the planner transforms the IN list into a scalar array expression. In
 clause_selectivity(), we estimate the selectivity of the ScalarArrayExpr
 by calling scalararraysel(), which in turn estimates the selectivity of
 *each* array element in order to determine the selectivity of the array
 expression as a whole.

 This is quite inefficient when the IN list is large.

That's the least of the problems.  We really ought to convert such cases
into an IN (VALUES(...)) type of query, since often repeated indexscans
aren't the best implementation.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Planning large IN lists

2007-05-10 Thread Dann Corbit
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Thursday, May 10, 2007 11:53 AM
 To: Neil Conway
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Planning large IN lists
 
 Neil Conway [EMAIL PROTECTED] writes:
  When planning queries with a large IN expression in the WHERE
clause,
  the planner transforms the IN list into a scalar array expression.
In
  clause_selectivity(), we estimate the selectivity of the
ScalarArrayExpr
  by calling scalararraysel(), which in turn estimates the selectivity
of
  *each* array element in order to determine the selectivity of the
array
  expression as a whole.
 
  This is quite inefficient when the IN list is large.
 
 That's the least of the problems.  We really ought to convert such
cases
 into an IN (VALUES(...)) type of query, since often repeated
indexscans
 aren't the best implementation.

It seems to me that if you have a unique index on the in list column,
then the problem is simplified.  In that case, you just have to estimate
how many index seeks cost more than a table scan.  Usually, it's around
5-10% of the table size for the average database.  Not sure how it works
out in PostgreSQL.

So in the special case of an in list on a unique indexed column, compare
the cardinality of the table with the number of in list items and decide
to table scan or index seek based on that.

For arbitrary queries, it seems that it would be necessary to keep
histograms for the columns in question.  Perhaps it could be collected
with an advanced analyze option.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Windows Vista support (Buildfarm Vaquita)

2007-05-10 Thread Dave Page
Dave Page wrote:
 Michael Meskes wrote:
 On Thu, May 10, 2007 at 11:05:39AM +0100, Dave Page wrote:
 P.S.: More on the other problem later.
 OK.
 Dave, I just committed some small changes to get additional error
 logging. Hopefully this enables me to find out where exactly the error
 is coming up. If possible could you please restart a run on vista?
 
 Running now - I won't have access to the machine again until Monday now
 though.

It's passing what was the update test now which is good - thanks :-).

Here's the results of the run, showing just the dt_test failure:
 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquitadt=2007-05-10%2015:55:16

Regards, Dave.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Feature lists for 8.3 and 8.4

2007-05-10 Thread Andrew Hammond
On May 8, 7:16 am, [EMAIL PROTECTED] (Abraham, Danny) wrote:
 Hi,

 I am  migrating from Sybase to Postgres.

 3. Who is the guy to ask about a feature like startup migrate in ORACLE?

You could check out EnterpriseDB, which is based on Postgres and
provides an Oracle compatibility layer. That gives you room to breath
while you migrate to a PostgreSQL native implementation. I've never
used it, but I hear good things.

Andrew



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Simon Riggs
On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote:

 As for GENERATED ALWAYS AS (expr), now that we understand that it's not
 supposed to define a virtual column, what's the point?  You can get the
 same behavior with a trivial BEFORE INSERT/UPDATE trigger that
 recomputes the derived value, and you don't have to buy into the rather
 ill-defined spec behavior (in particular the point that the generated
 column is effectively undefined during trigger firing seems really
 poorly done).  In fact, given that the only plausible use-cases involve
 expressions that are expensive to compute, a trigger can probably do
 *better* than the built-in feature, since it can make use of application
 knowledge about when a recomputation is really necessary.  The current
 patch recomputes the expression on every UPDATE, and would have a hard
 time being any brighter than that, given that we don't know what BEFORE
 triggers might do to the row.

We do need virtual columns, whether the spec requires them or not. They
would allow us to completely remove the column value when using
value-list based partitioning, giving considerable space savings for
VLDBs.

ISTM that we should interpret this as a requirement for a virtual
column. We can always move from that to a stored column if the spec
becomes more specific, though it would be harder to move the other way.
And as you point out, storing the value would make no sense.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Feature lists for 8.3 and 8.4

2007-05-10 Thread Simon Riggs
On Tue, 2007-05-08 at 17:16 +0300, Abraham, Danny wrote:

 I am  migrating from Sybase to Postgres.
 
 trying to get a hold on the issue of future releases feature list.
 
 1. Where can I find the feature list for 8.3 ? When is it expected?
 2. And for 8.4?
 3. Who is the guy to ask about a feature like startup migrate in ORACLE?

There is some planning done for each release, though reacting to
requirements is also a large part of how things work here. 

What do you need?

In-place upgrades are possible, but only when the on-disk format stays
the same. It is set to change for 8.3

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote:
 As for GENERATED ALWAYS AS (expr), now that we understand that it's not
 supposed to define a virtual column, what's the point?

 We do need virtual columns, whether the spec requires them or not.

Agreed, they seem more useful than what the spec's got in mind.  You can
fake a virtual column using a view, but it's a whole lot more painful
than faking a GENERATED column using a trigger (at least if you wish the
view to be updatable).

 ISTM that we should interpret this as a requirement for a virtual
 column. We can always move from that to a stored column if the spec
 becomes more specific, though it would be harder to move the other way.

If you're suggesting commandeering the spec's GENERATED ALWAYS syntax
to represent virtual columns, when the committee has made it clear that
that's not what they intend, I say that's sheer folly.  What will you do
when they tweak the spec to the point where a virtual column clearly
doesn't satisfy it?  If we want a nonstandard feature we should use a
nonstandard syntax for it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly