Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Dimitri Fontaine
Hi,

Le mercredi 27 février 2008, Florian G. Pflug a écrit :
 Upon reception of a COPY INTO command, a backend would
 .) Fork off a dealer and N worker processes that take over the
 client connection. The dealer distributes lines received from the
 client to the N workes, while the original backend receives them
 as tuples back from the workers.

This looks so much like what pgloader does now (version 2.3.0~dev2, release 
candidate) at the client side, when configured for it, that I can't help 
answering the mail :)
 http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
  section_threads = N
  split_file_reading = False

Of course, the backends still have to parse the input given by pgloader, which 
only pre-processes data. I'm not sure having the client prepare the data some 
more (binary format or whatever) is a wise idea, as you mentionned and wrt 
Tom's follow-up. But maybe I'm all wrong, so I'm all ears!

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Two Coverity Scan volunteers needed

2008-02-27 Thread Martijn van Oosterhout
On Tue, Feb 26, 2008 at 02:57:12PM -0800, Joshua D. Drake wrote:
  If we get volunteers set up, they will start running it daily.
 
 Would there be a way to script the responses to flag us for things
 that are important? 

There was (briefly) a way for them to send emails whenever something
new was detected. That was kinda useful. However, the number of false
positives is quite large. Maybe it got better but last time I checked
(a while back admittedly) it didn't notice the ereport(ERROR,...) never
returned.

It is possible to export results, and I did that once for all the ECPG
errors so the developers could fix them. Looking at the latest results
it has a lot of warnings about dead-code in libstemmer, which is not
entirely surprising given that it's generated code.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump additional options for performance

2008-02-27 Thread Simon Riggs
On Tue, 2008-02-26 at 20:14 +, Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Simon Riggs [EMAIL PROTECTED] writes:
  I've not been advocating improving pg_restore, which is where the -Fc
  tricks come in.
  ...
  I see you thought I meant pg_restore. I don't thinking extending
  pg_restore in that way is of sufficiently generic use to make it
  worthwhile. Extending psql would be worth it, since not all psql scripts
  come from pg_dump.
 
  OK, the reason I didn't grasp what you are proposing is that it's insane.
 
  We can easily, and backwards-compatibly, improve pg_restore to do
  concurrent restores.  Trying to make psql do something like this will
  require a complete rewrite, and there is no prospect that it will work
  for any input that didn't come from (an updated version of) pg_dump
  anyway.  
 
 I didn't read everything in the thread previously so I'm not sure if this is
 what Simon had in mind. But I think one thing that could be done in parallel
 even in psql scripts is index builds. That doesn't help speed up COPY but it
 does speed up a case where we currently are limited by only occupying a single
 cpu. And I would expect it to play well With synchronized scans too.
 
 The complete rewrite in this case would be the concurrent psql patch I
 submitted a while back. I think it's a bit of a mess right now because I was
 trying to chase down some bugs with sigint handling so I've been thinking of
 rewriting it.
 
 I think this is a low-hanging fruit which would help a lot of users. The
 ability to load multiple COPY dumps would be the other piece of the puzzle but
 personally I'm not sure how to tackle that.

The current design for concurrent psql includes commands that say which
session a command should be run on. Switches between sessions are
explicit. That is good, but prevents us from easily saying use N
sessions to make it go faster because we already hardwired the commands
to the sessions.

If we able to express dependency info then we would be able to alter the
amount of parallelism. That would require us to 
* identify each command
* identify its dependents

possibly like this

psql id=5 absolute-dependents=3,4
Some SQL...
/psql

the current default behaviour is this

psql id=5 relative-dependents=-1 ...

That's a leap ahead of concurrent psql.

I'd rather we had concurrent psql as it is now than attempt to leap
ahead too far, but the dependent definition approach seems likely to
yield benefits in the long run.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] pg_dump additional options for performance

2008-02-27 Thread Dimitri Fontaine
Le mardi 26 février 2008, Tom Lane a écrit :
  Or in more practical terms in this case, we have to balance
 speed against potentially-large costs in maintainability, datatype
 extensibility, and suchlike issues if we are going to try to get more
 than percentage points out of straight COPY.

Could COPY begin with checking the table type involved and use some internal 
knowledge about -core types to avoid extensibility costs, if any? Ok that 
sounds as a maintainability cost :)

Or maybe just provide an option to pg_dump to force usage of binary COPY 
format, which then allow pg_restore to skip alltogether the data parsing. If 
that's not the case (no data parsing), maybe it's time for another COPY 
format to be invented?

On the binary compatibility between architectures, I'm wondering whether using 
pg_dump in binary format from the new architecture couldn't be a solution.
Of course, when you only have the binary archives, lost server A and need to 
get the data to server B which do not share the A architecture, you're not in 
a comfortable situation. But pg_dump binary option would make clear you don't 
want to use it for your regular backups...
And it wouldn't help the case when data is not coming from PostgreSQL. It 
could still be a common enough use case to bother?

Just trying to put some ideas in the game, hoping this is more helpful than 
not,
-- 
dim

They did not know it was impossible, so they did it! -- Mark Twain


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Proposed changes to DTrace probe implementation

2008-02-27 Thread Magnus Hagander
On Tue, Feb 26, 2008 at 03:48:28PM -0600, Robert Lor wrote:
 Gregory Stark wrote:
 I think both types of probes are useful to different people.
   
 I think certain higher level probes can be really useful to DBAs.
 Perhaps looking at the standard database SNMP MIB counters would give us a
 place to start for upward facing events people want to trace for databases 
 in
 general.
   
 Great idea. I found this link for public RDBMS MIB 
 http://www.mnlab.cs.depaul.edu/cgi-bin/sbrowser.cgi?HOST=OID=RDBMS-MIB!rdbmsMIB
 
 The stats in rdbmsSrvInfoTable is quite useful, and it's one of the 
 tables that Oracle implements in their SNMP support. 
 http://download-east.oracle.com/docs/cd/B14099_19/manage.1012/b16244/appdx_d_rdbms.htm

Incidentally, most of that's already supported by the pg snmp provider,
through the stats system.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] win32 build problem (cvs, msvc 2005 express)

2008-02-27 Thread Magnus Hagander
On Wed, Feb 27, 2008 at 02:02:29AM -0800, craigp wrote:
 I'm having trouble compiling the current cvs version on windows xp (msvc 2005 
 express). Compile errors below.

Did you by any chance use a tree that's been sitting around for a long
time? Like sometime earlier in the 8.3 series. We had a problem like that
which was fixed, but therem ight be an old file sitting around since then.
Try deleting parse.h in include/parser - it's not supposed to be there.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] win32 build problem (cvs, msvc 2005 express)

2008-02-27 Thread craigp
I'm having trouble compiling the current cvs version on windows xp (msvc 2005 
express). Compile errors below.

I have bison 1.875 (I can't find 2.2+ for windows) and flex 2.5.4. These tools 
seem to generate correct outputs.

It looks like it might be including parse.h from include/parser/parse.h instead 
of src/backend/parser/parse.h. Are these two files supposed to be identical? 
NULLS_P, etc is defined in the latter, but not the former. If I manually copy 
the file over, compilation succeeds (still struggling with linking).

Here are the compile errors:

Build started: Project: postgres, Configuration: Release|Win32
Running bison on src\backend\parser\gram.y
1 file(s) copied.
Running flex on src\backend\parser\scan.l
Compiling...
parser.c
.\src\backend\parser\parser.c(100): error C2065: 'NULLS_P' : undeclared 
identifier
.\src\backend\parser\parser.c(100): error C2051: case expression not 
constant
.\src\backend\parser\parser.c(111): error C2065: 'NULLS_FIRST' : 
undeclared identifier
.\src\backend\parser\parser.c(114): error C2065: 'NULLS_LAST' : 
undeclared identifier
keywords.c
.\src\backend\parser\keywords.c(45): error C2065: 'ALWAYS' : undeclared 
identifier
.\src\backend\parser\keywords.c(45): error C2099: initializer is not a 
constant
.\src\backend\parser\keywords.c(90): error C2065: 'CONFIGURATION' : 
undeclared identifier
.\src\backend\parser\keywords.c(90): error C2099: initializer is not a 
constant
.\src\backend\parser\keywords.c(94): error C2065: 'CONTENT_P' : 
undeclared identifier
.\src\backend\parser\keywords.c(94): error C2099: initializer is not a 
constant

thanks!
--craig





  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_dump additional options for performance

2008-02-27 Thread Dimitri Fontaine
Le mardi 26 février 2008, Joshua D. Drake a écrit :
  Think 100GB+ of data that's in a CSV or delimited file.  Right now
  the best import path is with COPY, but it won't execute very fast as
  a single process.  Splitting the file manually will take a long time
  (time that could be spend loading instead) and substantially increase
  disk usage, so the ideal approach would figure out how to load in
  parallel across all available CPUs against that single file.

 You mean load from position? That would be very, very cool.

Did I mention pgloader now does exactly this when configured like this:
 http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
  section_threads = N
  split_file_reading = True

IIRC, Simon and Greg Smith asked for pgloader to get those parallel loading 
features in order to have some first results and ideas about the performance 
gain, as a first step in the parallel COPY backend implementation design.

Hope this helps,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Simon Riggs
On Wed, 2008-02-27 at 09:09 +0100, Dimitri Fontaine wrote:
 Hi,
 
 Le mercredi 27 février 2008, Florian G. Pflug a écrit :
  Upon reception of a COPY INTO command, a backend would
  .) Fork off a dealer and N worker processes that take over the
  client connection. The dealer distributes lines received from the
  client to the N workes, while the original backend receives them
  as tuples back from the workers.
 
 This looks so much like what pgloader does now (version 2.3.0~dev2, release 
 candidate) at the client side, when configured for it, that I can't help 
 answering the mail :)
  http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
   section_threads = N
   split_file_reading = False
 
 Of course, the backends still have to parse the input given by pgloader, 
 which 
 only pre-processes data. I'm not sure having the client prepare the data some 
 more (binary format or whatever) is a wise idea, as you mentionned and wrt 
 Tom's follow-up. But maybe I'm all wrong, so I'm all ears!

ISTM the external parallelization approach is more likely to help us
avoid bottlenecks, so I support Dimitri's approach.

We also need error handling which pgloader also has. 

Writing error handling and parallelization into COPY isn't going to be
easy, and not very justifiable either if we already have both.

There might be a reason to re-write it in C one day, but that will be
fairly easy task if we ever need to do it.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] OSSP can be used in the windows environment now!

2008-02-27 Thread Magnus Hagander
What exactly is needed for building the required libuuid files? rom what I
can tell, the authorh as no binaries available, correct?

It builds with mingw only? Or with msvc? does the mingw build generate all
the required libraries for the msvc build as well? (sorry, I'm on a win64
box right now, so mingw doesn't work at all, so I can't test that right
now)

Also, documentation updates will be needed, but I can handle those.

//Magnus

On Mon, Feb 25, 2008 at 06:44:07PM +0900, Hiroshi Saito wrote:
 Hi.
 
 Please check it.
 build is successful for it in my environment. 
 Thanks!
 
 Regards,
 Hiroshi Saito
 
 - Original Message - 
 From: Hiroshi Saito [EMAIL PROTECTED]
 
 
 Hi.
 
 From: Magnus Hagander [EMAIL PROTECTED]
 
 
 we can include in the next release.:-)
 Thanks!
 
 Good news. Can you provide a patch for the msvc build system to build
 with it? We can't really ship it in the next release if we can't build
 with it ;-)
 
 Ahh Ok, I  try it first and need to check clear environment.
 Probably, Monday will come by the reason I'm very busy.
 Thanks!
 
 Regards,
 Hiroshi Saito
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
   
   http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Full text search - altering the default parser

2008-02-27 Thread Richard Huxton

The default parser doesn't allow commas in numbers (I can see why, I think).

SELECT ts_parse('default', '123,000');
 ts_parse
--
 (22,123)
 (12,,)
 (22,000)

One option of course is to pre-process the text, but since we can 
support custom parsers I thought I'd take a look at the code to teach it 
some flexibility on numbers. I'm guessing this would be of interest to 
anyone wanting to support European-style , decimal indicators too.


My C is horribly rusty, so can I check I've got this right? Before I 
start exploring compiler errors I've not seen for decades ;-)



The parser functions (prsd_xxx) are all defined in 
backend/tsearch/wparser_def.c


The state machine is driven through the TParserStateActionItem 
definitions on lines 644 - 1263. Changing one of these will change the 
definition of the corresponding token-type.


To add a new token-type, I'd add it to the various lists line 30-194, 
then add the relevant TParserStateActionItems.


Thanks

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Proposal: wildcards in pg_service.conf

2008-02-27 Thread Albe Laurenz
I'd like to extend the libpq service file by allowing
wildcards, e.g. like this:

[%]
host=dbhost.mycompany.com
dbname=%

Such an entry would match all service parameters,
and all ocurrences of the wildcard right of a = would
be replaced with the service parameter.

That implies that a [%] entry is only useful as last entry
in pg_service.conf.

I'd like to keep it simple and only accept standalone
wildcards (not things like 'prefix%suffix'), but that's
debatable (at the cost of more work).

As a special case, I would want to allow wildcards in LDAP URLs
like this:

[%]
ldap://ldap.mycompany.com/cn=%,cn=databases?connectstring?one?objectclass=*

This also explains my preference of '%' over '*' as a
wildcard to avoid problems with * in LDAP filter expressions.

The added value is that you don't need one line per database
in the configuration file.

The two examples above illustrate two cases where I think that
such a setup could be useful:
a) You have most of your databases in one cluster.
b) You use LDAP to store your connection parameters centrally.

What do you think?

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

...
Neither the dealer, nor the workers would need access to the either
the shared memory or the disk, thereby not messing with the one backend
is one transaction is one session dogma.
...


Unfortunately, this idea has far too narrow a view of what a datatype
input function might do.  Just for starters, consider enum input,
which certainly requires catalog access.  We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.


Hm... how many in-core datatypes are there which need catalog access in
their input or output functions? Maybe we could change the API for
i/o functions in a way that allows us to request all needed information
to be cached?

regards, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] OSSP can be used in the windows environment now!

2008-02-27 Thread Hiroshi Saito

Hi.

- Original Message - 
From: Magnus Hagander [EMAIL PROTECTED]




What exactly is needed for building the required libuuid files? rom what I
can tell, the authorh as no binaries available, correct?


Yes, both can be built. however, msvc-build is not official though regrettable. 
but, It can be built without source-code changing. It is created as our module.

http://winpg.jp/~saito/pg_work/OSSP_win32/msvc/
I have checked by examination. 



It builds with mingw only? Or with msvc? does the mingw build generate all
the required libraries for the msvc build as well? (sorry, I'm on a win64
box right now, so mingw doesn't work at all, so I can't test that right
now)


Therefore, It has built in the environment of both now. 



Also, documentation updates will be needed, but I can handle those.


Yeah:-)
Thanks.

Regards,
Hiroshi Saito

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Required make version

2008-02-27 Thread Alvaro Herrera
Peter Eisentraut wrote:

 Using the order-only prerequisites feature, which is what is failing with the 
 old make version, solves item 1).
 
 The alternative is your suggestion
 
 If the dependencies
 need to stay as they are, maybe we could avoid the annoyance by having
 make not print the echo command.
 
 but I'm not a friend of hiding commands because you don't know what is going 
 on and it will come back to bite you.

How about we use order-only prerequisite only if present, and use the
ugly or undesirable way as fallback?  I see that you can find out if
your Make version supports it by checking .FEATURES.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Florian G. Pflug

Dimitri Fontaine wrote:
Of course, the backends still have to parse the input given by pgloader, which 
only pre-processes data. I'm not sure having the client prepare the data some 
more (binary format or whatever) is a wise idea, as you mentionned and wrt 
Tom's follow-up. But maybe I'm all wrong, so I'm all ears!


As far as I understand, pgloader starts N threads or processes that open 
up N individual connections to the server. In that case, moving then 
text-binary conversion from the backend into the loader won't give any

additional performace I'd say.

The reason that I'd love some within-one-backend solution is that I'd 
allow you to utilize more than one CPU for a restore within a *single* 
transaction. This is something that a client-side solution won't be able 
to deliver, unless major changes to the architecture of postgres happen 
first...


regards, Florian Pflug


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Required make version

2008-02-27 Thread Alvaro Herrera
Alvaro Herrera wrote:

 How about we use order-only prerequisite only if present, and use the
 ugly or undesirable way as fallback?  I see that you can find out if
 your Make version supports it by checking .FEATURES.

I think this can be used with a conditional like

ifneq (,$(findstring order-only,$(.FEATURES)))
...
endif

I am not sure I understand the problem being complained about to propose
a more specific patch.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] select avanced

2008-02-27 Thread cristianopintado
I have the following table:

Objeto  Valor

ob1 10
ob1 20
ob2 50
ob2 10
ob3 50

With the following command:

select distinct Objeto, sum(valor) from tb
group by Objeto;


I have to return:

Objeto  Valor

ob1 30
ob2 60
ob3 50

What you need to do is add ob2 and ob3 in one field,
Leading me the following table:


Objeto  Valor

ob1 30
ob2e3   110


It can do this only with Select?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Brian Hurt

Tom Lane wrote:


Florian G. Pflug [EMAIL PROTECTED] writes:
 


...
Neither the dealer, nor the workers would need access to the either
the shared memory or the disk, thereby not messing with the one backend
is one transaction is one session dogma.
...
   



Unfortunately, this idea has far too narrow a view of what a datatype
input function might do.  Just for starters, consider enum input,
which certainly requires catalog access.  We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.

regards, tom lane

 

Would it be possible to determine when the copy is starting that this 
case holds, and not use the parallel parsing idea in those cases?


I'm a big user of copy, generally into very simple tables- few indexes, 
simple column types (numeric, varchar, and int almost exclusively), no 
fancy features.  A parallel copy input in the simple cases would be of 
great advantage to me, even if it doesn't parallelize complicated cases.


Brian



Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Alvaro Herrera
A.M. wrote:

 On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:

 The reason that I'd love some within-one-backend solution is that I'd 
 allow you to utilize more than one CPU for a restore within a *single* 
 transaction. This is something that a client-side solution won't be 
 able to deliver, unless major changes to the architecture of postgres 
 happen first...

 It seems like multiple backends should be able to take advantage of 2PC 
 for transaction safety.

Yeah, but it wouldn't take advantage of, say, the hack to disable WAL
when the table was created in the same transaction.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] select avanced

2008-02-27 Thread Richard Huxton

[EMAIL PROTECTED] wrote:

I have the following table:


The hackers list is for development of the PostgreSQL database itself. 
Please try reposting on the general or sql mailing lists.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] idea: simple variadic functions in SQL and PL/pgSQL

2008-02-27 Thread Pavel Stehule
Hello

On 26/02/2008, Andrew Dunstan [EMAIL PROTECTED] wrote:


  Pavel Stehule wrote:
   Hello,
  
   I found easy implementation of variadic functions. It's based on
   adapation FuncnameGetCandidates. When I found variadic function, then
   I should create accurate number of last arguments (diff between
   pronargs and nargs). Variadic function can be signed via flag or via
   some pseudotype. Flag is better - allows variadic arguments of any
   type. In static languages (like SQL or PL/pgSQL) variadic variables
   can ba accessed via array (variadic arguments can be only nonarray).
   This isn't problem in C language, there are arguments available
   directly.
  


 There are a whole slew of questions around this, ISTM.

sure. It's time to thing about it.


  For example: What will be the type inferred for the array of variadic
  args in plpgsql?

SQL and PL/SQL should be limited.  In my prototype I use fragment:

/* replace ANYNONARRAYS argument with ARRAY OF some */
   argtype = get_array_type(argtype);
   if (!OidIsValid(argtype))
ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg(could not determine
actual array type for variadic argument)));

SPI is unlimited (C language). I can use list of arrays in perl or
python - there I don't see complications too. Domains and arrays can
be problem. Using variadic array variables isn't typical. So I will to
solve only domains. But it's more general problem. Maybe 8.4 can
support array of domains.


  If we are going to do this I think we need some discussion on design
  before we rush into it.


yes, I agree. Variadic functions are simple. And I belive it can
simplify life to some programmers that have to manage functions like:

foo(int),
foo(int, int)
foo(int, int, int)...

regards
Pavel



  cheers


  andrew





---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] proposal: plpgsql return execute ...

2008-02-27 Thread Pavel Stehule
Hello

I thing RETURN QUERY is successful idea. It should be completed with
support  of dynamic SQL.

Syntax:

RETURN EXECUTE sqlstring [USING];

This is shortcut for
FOR r IN EXECUTE sqlstring USING LOOP
  RETURN NEXT r;
END LOOP;

Regards
Pavel Stehule

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] OSSP can be used in the windows environment now!

2008-02-27 Thread Magnus Hagander
On Wed, Feb 27, 2008 at 09:46:14PM +0900, Hiroshi Saito wrote:
 What exactly is needed for building the required libuuid files? rom what I
 can tell, the authorh as no binaries available, correct?
 
 Yes, both can be built. however, msvc-build is not official though 
 regrettable. but, It can be built without source-code changing. It is 
 created as our module.
 http://winpg.jp/~saito/pg_work/OSSP_win32/msvc/
 I have checked by examination. 

Ok.
Do you know if there are any plans to include this in the distribution? I
would make life a whole lot easier. If not, perhaps we should include the
win32.mak file in a subdir to our uuid module?

I found a couple of other problems with your patch, but i'ev been able to
fix those. Building a test with msvc now, and will write documentation.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread A.M.


On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:


Dimitri Fontaine wrote:
Of course, the backends still have to parse the input given by  
pgloader, which only pre-processes data. I'm not sure having the  
client prepare the data some more (binary format or whatever) is a  
wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm  
all wrong, so I'm all ears!


As far as I understand, pgloader starts N threads or processes that  
open up N individual connections to the server. In that case, moving  
then text-binary conversion from the backend into the loader won't  
give any

additional performace I'd say.

The reason that I'd love some within-one-backend solution is that  
I'd allow you to utilize more than one CPU for a restore within a  
*single* transaction. This is something that a client-side solution  
won't be able to deliver, unless major changes to the architecture  
of postgres happen first...


It seems like multiple backends should be able to take advantage of  
2PC for transaction safety.


Cheers,
M

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Heikki Linnakangas

A.M. wrote:


On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:


Dimitri Fontaine wrote:
Of course, the backends still have to parse the input given by 
pgloader, which only pre-processes data. I'm not sure having the 
client prepare the data some more (binary format or whatever) is a 
wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm 
all wrong, so I'm all ears!


As far as I understand, pgloader starts N threads or processes that 
open up N individual connections to the server. In that case, moving 
then text-binary conversion from the backend into the loader won't 
give any

additional performace I'd say.

The reason that I'd love some within-one-backend solution is that I'd 
allow you to utilize more than one CPU for a restore within a *single* 
transaction. This is something that a client-side solution won't be 
able to deliver, unless major changes to the architecture of postgres 
happen first...


It seems like multiple backends should be able to take advantage of 2PC 
for transaction safety.


Yes, whatever is coordinating the multiple backends (a master backend? i 
haven't followed this thread closely) would then have to have logic to 
finish the prepared transactions if you crash after you've committed one 
but not all of them. IOW, it would need a mini transaction log of its own.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: wildcards in pg_service.conf

2008-02-27 Thread Tom Lane
Albe Laurenz [EMAIL PROTECTED] writes:
 I'd like to extend the libpq service file by allowing
 wildcards, e.g. like this:

 [%]
 host=dbhost.mycompany.com
 dbname=%

 Such an entry would match all service parameters,
 and all ocurrences of the wildcard right of a = would
 be replaced with the service parameter.

This seems poorly thought out.  How could you have any other service
entries besides this one?  What is the point of doing it like this
and not simply overriding the service's database selection?

The special case for LDAP makes it even more obvious that this is
a kluge.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Florian G. Pflug

Brian Hurt wrote:

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

...
Neither the dealer, nor the workers would need access to the either
the shared memory or the disk, thereby not messing with the one backend
is one transaction is one session dogma.
...

Unfortunately, this idea has far too narrow a view of what a datatype
input function might do.  Just for starters, consider enum input,
which certainly requires catalog access.  We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.


Would it be possible to determine when the copy is starting that this 
case holds, and not use the parallel parsing idea in those cases?


In theory, yes. In pratice, I don't want to be the one who has to answer 
to an angry user who just suffered a major drop in COPY performance 
after adding an ENUM column to his table.


I was thinking more along the line of letting a datatype specify a
function void* ioprepare(typmod) which returns some opaque object
specifying all that the input and output function needs to know.
We could than establish the rule that input/output functions may not 
access the catalog, and instead pass them a pointer to that opaque object.


All pretty pie-in-the-sky at the moment, though...

regards, Florian Pflug


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposed changes to DTrace probe implementation

2008-02-27 Thread Paul van den Bogaard
but putting these and other counters in context is what could be  
missing. Correlating a given (set of) stats with others (possible  
outside of the application domain) is one of the assets offered by  
DTrace.  Besides the generic transaction begin/start/end it could  
also be helpful to see the number of parses,binds,executes per  
transaction, user, connection etc.
And yes, I feel Tom is right in fearing that these things can be used  
in creative ways. However is this not true for most benchmarks/ 
results when ones objective is  to show how perfect/better/whatever  
product/platform A behaves/is compared to B, C, etc...


One benefit for generalizing a subset of the DTrace probes is the  
possibility of creating generic DTrace scripts that can be used for  
many database installations. DTrace is great, programming DTrace  
scripts is fun (my view, and sure I am biased as a Sun employee :-),  
but it is not likely to be something a dba would like to master. The  
availability of generic scripts does add value.



BTW I wonder if we could somehow combine DTrace as a contextual tool  
with the counters provided through the stats interface. Any insight/ 
ideas?


--Paul.






On 27-feb-2008, at 10:28, Magnus Hagander wrote:


On Tue, Feb 26, 2008 at 03:48:28PM -0600, Robert Lor wrote:

Gregory Stark wrote:

I think both types of probes are useful to different people.


I think certain higher level probes can be really useful to DBAs.
Perhaps looking at the standard database SNMP MIB counters would  
give us a
place to start for upward facing events people want to trace for  
databases

in
general.


Great idea. I found this link for public RDBMS MIB
http://www.mnlab.cs.depaul.edu/cgi-bin/sbrowser.cgi? 
HOST=OID=RDBMS-MIB!rdbmsMIB


The stats in rdbmsSrvInfoTable is quite useful, and it's one of the
tables that Oracle implements in their SNMP support.
http://download-east.oracle.com/docs/cd/B14099_19/manage.1012/ 
b16244/appdx_d_rdbms.htm


Incidentally, most of that's already supported by the pg snmp  
provider,

through the stats system.

//Magnus

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


 
-
Paul van den Bogaard
[EMAIL PROTECTED]

ISV-E  -- ISV Engineering, Opensource Engineering group

Sun Microsystems, Inc  phone:+31  
334 515 918
Saturnus 1  
extentsion: x (70)15918
3824 ME Amersfoort mobile:   +31  
651 913 354
The Netherlands 
fax:+31 334 515 001



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Tom Dunstan
On Wed, Feb 27, 2008 at 9:26 PM, Florian G. Pflug [EMAIL PROTECTED] wrote:
  I was thinking more along the line of letting a datatype specify a
  function void* ioprepare(typmod) which returns some opaque object
  specifying all that the input and output function needs to know.
  We could than establish the rule that input/output functions may not
  access the catalog, and instead pass them a pointer to that opaque object.

Callers of IO functions don't always know which type they're dealing
with - we had to go to some lengths to pass type information along
with the enum value itself so that it could be looked up in the
syscache in the output function. I think the main culprits are the
P/Ls, but I think there was a security related concern about passing
the type through to the IO function as well. If you want to do
something like this, it would certainly be possible to cache the enum
info for a particular type, but you might want to have a separate set
of io functions just for this case.

On the plus side, if such a cache were to be used by IO generally, we
could reimplement enums to just store the ordinal on disk and save a
couple of bytes, like I wanted but was unable to do the first time
around. :)

Enums are an easy case, though, as there's very little data to deal
with. I don't know about other UDTs out there - do any require more
extensive catalog access?

Cheers

Tom

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] OSSP can be used in the windows environment now!

2008-02-27 Thread Hiroshi Saito

Hi.

- Original Message - 
From: Magnus Hagander [EMAIL PROTECTED]




Ok.
Do you know if there are any plans to include this in the distribution? I
would make life a whole lot easier. If not, perhaps we should include the
win32.mak file in a subdir to our uuid module?


Ahh, I don't have a good idea...  
build of MinGW is required before win32.mak. 
Probably, it needs to be written to a document. 



I found a couple of other problems with your patch, but i'ev been able to
fix those. Building a test with msvc now, and will write documentation.


Thanks!

Regards,
Hiroshi Saito

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Yeah, but it wouldn't take advantage of, say, the hack to disable WAL
 when the table was created in the same transaction.

In the context of a parallelizing pg_restore this would be fairly easy
to get around.  We could probably teach the thing to combine table
creation and loading steps into one action (transaction) in most cases.
If that couldn't work because of some weird dependency chain, the
data loading transaction could be done as

BEGIN;
TRUNCATE table;
COPY table FROM stdin;
...
COMMIT;

which I believe already invokes the no-WAL optimization, and could
certainly be made to do so if not.

Obviously, pg_restore would have to be aware of whether or not it had
created that table in the current run (else it mustn't TRUNCATE),
but it would be tracking more or less exactly that info anyway to handle
dependency ordering.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Andrew Dunstan



Florian G. Pflug wrote:


Would it be possible to determine when the copy is starting that this 
case holds, and not use the parallel parsing idea in those cases?


In theory, yes. In pratice, I don't want to be the one who has to 
answer to an angry user who just suffered a major drop in COPY 
performance after adding an ENUM column to his table.





I am yet to be convinced that this is even theoretically a good path to 
follow. Any sufficiently large table could probably be partitioned and 
then we could use the parallelism that is being discussed for pg_restore 
without any modification to the backend at all. Similar tricks could be 
played by an external bulk loader for third party data sources.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Brian Hurt

Andrew Dunstan wrote:




Florian G. Pflug wrote:



Would it be possible to determine when the copy is starting that 
this case holds, and not use the parallel parsing idea in those cases?



In theory, yes. In pratice, I don't want to be the one who has to 
answer to an angry user who just suffered a major drop in COPY 
performance after adding an ENUM column to his table.





I am yet to be convinced that this is even theoretically a good path 
to follow. Any sufficiently large table could probably be partitioned 
and then we could use the parallelism that is being discussed for 
pg_restore without any modification to the backend at all. Similar 
tricks could be played by an external bulk loader for third party data 
sources.




I was just floating this as an idea- I don't know enough about the 
backend to know if it was a good idea or not, it sounds like not.


Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] ResourceOwners for Snapshots? holdable portals

2008-02-27 Thread Alvaro Herrera
Hi,

I'm toying around with the idea of tracking snaphots more accurately to
be able to advance Xmin for read committed transactions.

I think it's relatively easy to do it in the straightforward way, which
is to just add destroy snapshots in the spots where a snapshot
variable goes out of scope.  However, I've been thinking in doing it in
a little more elaborate (and, AFAICS, better) way: having the
ResourceOwner code be responsible for keeping track of snapshots.

Offhand I don't see any big problem with that, althought I admit I
haven't yet tried any code.  One thing that jumps at me, however, is the
handling of holdable portals.

We currently just copy the portal's content into a Materialize node, and
let the snapshot go away at transaction's end.  This works, but ISTM we
could improve that by keeping track of the portal's snapshot separately
from the transaction -- that is to say, to hang it from the portal's
ResourceOwner.  This would allow us to avoid the Materialize node
altogether, and just keep the xmin back until the portal's gone.  Vacuum
would, of course, not be able to clean up rows needed by the portal.  I
don't see this as a problem, but rather as an improvement.

Thoughts?


Also, is there anything else on the whole Snapshots-on-ResourceOwners
idea that could be a showstopper?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Florian G. Pflug

Andrew Dunstan wrote:

Florian G. Pflug wrote:
Would it be possible to determine when the copy is starting that this 
case holds, and not use the parallel parsing idea in those cases?


In theory, yes. In pratice, I don't want to be the one who has to 
answer to an angry user who just suffered a major drop in COPY 
performance after adding an ENUM column to his table.


I am yet to be convinced that this is even theoretically a good path to 
follow. Any sufficiently large table could probably be partitioned and 
then we could use the parallelism that is being discussed for pg_restore 
without any modification to the backend at all. Similar tricks could be 
played by an external bulk loader for third party data sources.


That assumes that some specific bulkloader like pg_restore, pgloader
or similar is used to perform the load. Plain libpq-users would either 
need to duplicate the logic these loaders contain, or wouldn't be able 
to take advantage of fast loads.


Plus, I'd see this as a kind of testbed for gently introducing 
parallelism into postgres backends (especially thinking about sorting 
here). CPU gain more and more cores, so in the long run I fear that we 
will have to find ways to utilize more than one of those to execute a 
single query.


But of course the architectural details need to be sorted out before any 
credible judgement about the feasability of this idea can be made...


regards, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Plus, I'd see this as a kind of testbed for gently introducing 
 parallelism into postgres backends (especially thinking about sorting 
 here).

This thinking is exactly what makes me scream loudly and run in the
other direction.  I don't want threads introduced into the backend,
whether gently or otherwise.  The portability and reliability hits
that we'll take are too daunting.  Threads that invoke user-defined
code (as anything involved with datatype-specific operations must)
are especially fearsome, as there is precisely 0 chance of that code
being thread-safe.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:
Plus, I'd see this as a kind of testbed for gently introducing 
parallelism into postgres backends (especially thinking about sorting 
here).


This thinking is exactly what makes me scream loudly and run in the
other direction.  I don't want threads introduced into the backend,
whether gently or otherwise.  The portability and reliability hits
that we'll take are too daunting.  Threads that invoke user-defined
code (as anything involved with datatype-specific operations must)
are especially fearsome, as there is precisely 0 chance of that code
being thread-safe.


Exactly my thinking. That is why I was looking for a way to introduce 
parallelism *without* threading. Though it's not so much the 
user-defined code that scares me, but rather the portability issues. The 
differences between NPTL and non-NPTL threads on linux alone make me 
shudder...


Was I was saying is that there might be a chance to get some parallelism 
without threading, by executing well-defined pieces of code with 
controlled dependencies in separate processes. COPY seemed like an ideal 
testbed for that idea, since the conversion of received lines into 
tuples seemed reasonable self-contained, and with little outside 
dependencies. If the idea can't be made to work there, it probably won't 
work anywhere. If it turns out that it does (with an API change for 
input/output functions) however, then it *might* be possible to apply it 
to other relatively self-contained parts in the future...


To restate, I don't want threaded backends. Not in the foreseeable 
future at least. But I'd still love to see a single transaction using 
more than one core.


regards, Florian Pflug




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ResourceOwners for Snapshots? holdable portals

2008-02-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 We currently just copy the portal's content into a Materialize node, and
 let the snapshot go away at transaction's end.  This works, but ISTM we
 could improve that by keeping track of the portal's snapshot separately
 from the transaction -- that is to say, to hang it from the portal's
 ResourceOwner.  This would allow us to avoid the Materialize node
 altogether, and just keep the xmin back until the portal's gone.

That's a pretty horrid idea: what if the query being executed by the
portal has side-effects?  You can't get away with not executing it
to completion before you close the transaction.  Not to mention that
it depends on locks not just snapshots.

As far as the general point goes, I had been thinking of managing
snapshots in a central cache, because if you want to advance xmin
intratransaction then some piece of code has to be aware of *all* the
open snapshots in the backend; and the ResourceOwners can't do that
conveniently because they're fairly independent.  Or were you meaning
that you would do that and on top of it have the ResourceOwners track
references into the cache?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] proposal: plpgsql return execute ...

2008-02-27 Thread Neil Conway
On Wed, 2008-02-27 at 15:24 +0100, Pavel Stehule wrote:
 I thing RETURN QUERY is successful idea. It should be completed with
 support  of dynamic SQL.

Yeah, I can see that being useful.

 RETURN EXECUTE sqlstring [USING];

What is the USING clause for?

-Neil



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ResourceOwners for Snapshots? holdable portals

2008-02-27 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  We currently just copy the portal's content into a Materialize node, and
  let the snapshot go away at transaction's end.  This works, but ISTM we
  could improve that by keeping track of the portal's snapshot separately
  from the transaction -- that is to say, to hang it from the portal's
  ResourceOwner.  This would allow us to avoid the Materialize node
  altogether, and just keep the xmin back until the portal's gone.
 
 That's a pretty horrid idea: what if the query being executed by the
 portal has side-effects?  You can't get away with not executing it
 to completion before you close the transaction.

Ah, excellent point -- I guess that's what I was missing.

 As far as the general point goes, I had been thinking of managing
 snapshots in a central cache, because if you want to advance xmin
 intratransaction then some piece of code has to be aware of *all* the
 open snapshots in the backend; and the ResourceOwners can't do that
 conveniently because they're fairly independent.  Or were you meaning
 that you would do that and on top of it have the ResourceOwners track
 references into the cache?

Yeah, I think there needs to be a separate list either way, but having
references to it from ResourceOwners means there's no need to have extra
cleanup calls at (sub)transaction commit/abort.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Required make version

2008-02-27 Thread Peter Eisentraut
Alvaro Herrera wrote:
 I think this can be used with a conditional like

 ifneq (,$(findstring order-only,$(.FEATURES)))
 ...
 endif

Yes, that was my thought.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] OSSP can be used in the windows environment now!

2008-02-27 Thread Magnus Hagander

Hiroshi Saito wrote:

Hi.

Ok.
Do you know if there are any plans to include this in the distribution? I
would make life a whole lot easier. If not, perhaps we should include the
win32.mak file in a subdir to our uuid module?


Ahh, I don't have a good idea...  build of MinGW is required before 
win32.mak. Probably, it needs to be written to a document.


I take it you are in contact with them, since you helped them with the 
port? Can you ask them if they are interested in distributing that file?


If not, what do other people think about adding this Makefile and a 
README file to our contrib directory?


//Magnus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] proposal: plpgsql return execute ...

2008-02-27 Thread Merlin Moncure
On Wed, Feb 27, 2008 at 1:58 PM, Neil Conway [EMAIL PROTECTED] wrote:
 On Wed, 2008-02-27 at 15:24 +0100, Pavel Stehule wrote:
   I thing RETURN QUERY is successful idea. It should be completed with
   support  of dynamic SQL.

  Yeah, I can see that being useful.

   RETURN EXECUTE sqlstring [USING];

  What is the USING clause for?

parameter binding.

see: http://archives.postgresql.org/pgsql-patches/2006-08/msg00267.php

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Idea for minor tstore optimization

2008-02-27 Thread Neil Conway
I notice that several of the call sites of tuplestore_puttuple() start
with arrays of datums and nulls, call heap_form_tuple(), and then switch
into the tstore's context and call tuplestore_puttuple(), which
deep-copies the HeapTuple into the tstore. ISTM it would be faster and
simpler to provide a tuplestore_putvalues(), which just takes the datum
+ nulls arrays and avoids the additional copy.

-Neil



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] 2WRS [WIP]

2008-02-27 Thread manolo.espa

Referring to tuplesort.c   andtuplestore.c

BACKGROUND: Starting from dumptuples()  [ tuplesort.c ] write functions move 
the tuple from a buffer to another in order to finally write it in a logical 
tape. Is there a way (even the most inefficient way) to use current 
read/write functions provided by PostgreSQL in order to retrieve the first 
tuple of a certain run while performing External Sorting?


NOTE: I need the first tuple in order to manipulate the whole corresponding 
run, tuple by tuple since they are written sequentially in a run.


Thanks for your attention.
Regards, Manolo.


--
From: [EMAIL PROTECTED]
Sent: Tuesday, February 26, 2008 4:10 PM
To: Jaime Casanova [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: Decibel! [EMAIL PROTECTED]; David Fetter [EMAIL PROTECTED]; 
[EMAIL PROTECTED]; pgsql-hackers@postgresql.org

Subject: Re: [HACKERS] [PATCHES] 2WRS [WIP]


For the joy of all of you: that's the correct WIP patch.
At the moment it only tries to create runs uding two heaps. Hope you can
help me with writing those runs on tapes.

I'd be very pleased to give you more details.

Thenks for your time.
Regards, Manolo.


--
From: Jaime Casanova [EMAIL PROTECTED]
Sent: Friday, February 22, 2008 5:30 AM
To: [EMAIL PROTECTED]
Cc: Decibel! [EMAIL PROTECTED]; Manolo _ [EMAIL PROTECTED];
David Fetter [EMAIL PROTECTED]; [EMAIL PROTECTED];
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [PATCHES] 2WRS [WIP]


On Thu, Feb 21, 2008 at 6:44 AM,  [EMAIL PROTECTED] wrote:

Hi.

That's the last release and refers to 8.3.0 and not to 8.2.5 as before.
Hope
you can tell me if I created it correctly please.



no, it doesn't...


! /* GUC variables */
  #ifdef TRACE_SORT
  bool trace_sort = false;
  #endif
- #ifdef DEBUG_BOUNDED_SORT
- bool optimize_bounded_sort = true;
- #endif


it's seems you're removing something added in 8.3

--
regards,
Jaime Casanova

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning.
Richard Cook

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] proposal: plpgsql return execute ...

2008-02-27 Thread Pavel Stehule
On 27/02/2008, Merlin Moncure [EMAIL PROTECTED] wrote:
 On Wed, Feb 27, 2008 at 1:58 PM, Neil Conway [EMAIL PROTECTED] wrote:
   On Wed, 2008-02-27 at 15:24 +0100, Pavel Stehule wrote:
 I thing RETURN QUERY is successful idea. It should be completed with
 support  of dynamic SQL.
  
Yeah, I can see that being useful.
  
 RETURN EXECUTE sqlstring [USING];
  
What is the USING clause for?


 parameter binding.

  see: http://archives.postgresql.org/pgsql-patches/2006-08/msg00267.php


  merlin


I sent modernized version
http://archives.postgresql.org/pgsql-patches/2007-10/msg00161.php

Pavel

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] new warning message

2008-02-27 Thread Jeff Davis
On IRC today someone brought up a problem in which users were still able
to connect to a database after a REVOKE CONNECT ... FROM theuser. The
reason theuser is still able to connect is because PUBLIC still has
privileges to connect by default (AndrewSN was the one who answered
this).

Would it be reasonable to throw a warning if you revoke a privilege from
some role, and that role inherits the privilege from some other role (or
PUBLIC)?

Regards,
Jeff Davis




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Varlena Type Creation

2008-02-27 Thread Martijn van Oosterhout
On Tue, Feb 26, 2008 at 06:19:48PM +0100, Dimitri Fontaine wrote:
 So... where do I start to create a varlena datatype which has to store the 3 
 following values: text prefix, char start, char end.
 
 It's not clear for me whether this is what I need to provide:
 
 typedef struct

I see no-one responded to this: a varlena has no fixed header size, so
you can't fit it in a structure anyway. Once you're passed a pointer
you use the LEN/PTR macros to extract what you want.

Not sure what the chars are for, but perhaps it would be easiest to
treat it as a single text object with the two leading characters
signifying something?

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [HACKERS] new warning message

2008-02-27 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 Would it be reasonable to throw a warning if you revoke a privilege from
 some role, and that role inherits the privilege from some other role (or
 PUBLIC)?

This has been suggested and rejected before --- the consensus is it'd
be too noisy.

Possibly the REVOKE manual page could be modified to throw more stress
on the point.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Snapshot Reuse

2008-02-27 Thread Simon Riggs
In Read Committed transactions we take snapshots much more frequently
than transactions begin and commit. It would be help scalability if we
didn't need to re-take a snapshot. That's only helpful if the chances of
seeing the snapshot is relatively high.

Now that we have virtual transactions we may more frequently find
ourselves taking identical snapshots.

If we had a counter that incremented each time the main snapshot altered
in a meaningful way we could set that atomically. We could then read
this when we take a snapshot to see if it matches our existing snapshot;
if so then drop the lock quickly and continue with what we already have.

I can see some downsides to this as well as potential benefits:

* we ping the counter across CPUs - yes, we will, but that's probably
better than pinging the whole procarray

* this relies upon the rate of change of snapshots - need to do the math
to see how often this might apply

Not sure yet myself, but it seems worth recording in case it spurs an
idea from someone else.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Idea for minor tstore optimization

2008-02-27 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 I notice that several of the call sites of tuplestore_puttuple() start
 with arrays of datums and nulls, call heap_form_tuple(), and then switch
 into the tstore's context and call tuplestore_puttuple(), which
 deep-copies the HeapTuple into the tstore. ISTM it would be faster and
 simpler to provide a tuplestore_putvalues(), which just takes the datum
 + nulls arrays and avoids the additional copy.

Seems reasonable.  Check whether tuplesort should offer the same, while
you are at it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Dawid Kuroczko
Hello.

I am currently playing with UUID data type and try to use it to store provided
by third party (Hewlett-Packard) application.  The problem is they
format UUIDs as
-------, so I have to
replace(text,'-','')::uuid for
this kind of data.

Nooow, the case is quite simple and it might be that there are other
applications
formatting UUIDs too liberally.

I am working on a patch to support this format (yes, it is a simple
modification).

And in the meanwhile I would like to ask you what do you think about it?

Cons: Such format is not standard.

Pros: This will help UUID data type adoption. [1]  While good
applications format
their data well, there are others which don't follow standards.  Also
I think it is
easier for a human being to enter UUID as 8 times 4 digits.

Your thoughts?  Should I submit a patch?

   Regards,
 Dawid

[1]: My first thought when I received the error message was hey! this
is not an UUID,
it is too long/too short!, only later did I check that they just
don't format it too well.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Josh Berkus
Dawid,

 I am working on a patch to support this format (yes, it is a simple
 modification).

I'd suggest writing a formatting function for UUIDs instead.  Not sure what 
it should be called, though.  to_char is pretty overloaded right now.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Gevik Babakhani

  I am working on a patch to support this format (yes, it is a simple 
  modification).

There was a proposal and a discussion regarding how this datatype would be
before I started developing it. We decided to go with the format proposed in
RFC. Unless there is strong case, I doubt any non standard formatting will
be accepted into core. IIRC we where also opposed to support java like
formatted uuid's back then. This is no different.

Regards,
Gevik.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I am working on a patch to support this format (yes, it is a simple
 modification).

 I'd suggest writing a formatting function for UUIDs instead.

That seems like overkill, if not outright encouragement of people to
come up with yet other nonstandard formats for UUIDs.

I think the question we have to answer is whether we want to be
complicit in the spreading of a nonstandard UUID format.  Even if
we answer yes for this HP case, it doesn't follow that we should
create a mechanism for anybody to do anything with 'em.  That way
lies the madness people already have to cope with for datetime
data :-(

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] OSSP can be used in the windows environment now!

2008-02-27 Thread Hiroshi Saito

Hi.

- Original Message - 
From: Magnus Hagander [EMAIL PROTECTED]



I take it you are in contact with them, since you helped them with the 
port? Can you ask them if they are interested in distributing that file?


Yes, However, It is not discussing about MSVC. It is because it needed 
to think with correspondence of the generation process (xx.in) of the 
source code. Furthermore, uuid_cli had a problem more...
Although I don't have the margin time now, it is taken as my TODO. 



If not, what do other people think about adding this Makefile and a 
README file to our contrib directory?


If there is no contrary opinion, I believe that it is help to many people's

Regards,
Hiroshi Saito

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Buildfarm member gypsy_moth seems not to like alignment patch

2008-02-27 Thread Tom Lane
It looks like gypsy_moth has been failing like this:

creating directory 
/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.21325/src/test/regress/./tmp_check/data
 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 32MB/204800
creating configuration files ... ok
creating template1 database in 
/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.21325/src/test/regress/./tmp_check/data/base/1
 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... Bus Error - core dumped
child process exited with exit code 138
initdb: data directory 
/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.21325/src/test/regress/./tmp_check/data
 not removed at user's request

since I put in this patch:
http://archives.postgresql.org/pgsql-committers/2008-02/msg00270.php

This is unfortunate and surprising, since that patch was intended to
prevent compilers from making unsafe alignment assumptions, but it sure
looks like this compiler has instead added a new one.  Could you poke
into it --- at least get a stack trace from the core dump?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Producer/Consumer Issues in the COPY across network

2008-02-27 Thread Simon Riggs
On Tue, 2008-02-26 at 12:29 +0100, Martijn van Oosterhout wrote:

  When we're running a COPY over a high latency link then network time is
  going to become dominant, so potentially, running COPY asynchronously
  might help performance for loads or initial Slony configuration. This is
  potentially more important on Slony where we do both a PQgetCopyData()
  and PQputCopyData() in a tight loop.
 
 When you check the packets being sent, are you showing only one record
 being sent per packet? If so, there's your problem.

I've not inspected the packet flow. It seemed easier to ask.

  I also note that PQgetCopyData always returns just one row. Is there an
  underlying buffering between the protocol (which always sends one
  message per row) and libpq (which is one call per row)? It seems
  possible for us to request a number of rows from the server up to a
  preferred total transfer size.
 
 AIUI the server merely streams the rows to you, the client doesn't get
 to say how many :)

Right, but presumably we generate a new message per PQgetCopyData()
request? So my presumption is we need to wait for that to be generated
each time?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Batch update of indexes on data loading

2008-02-27 Thread ITAGAKI Takahiro

Simon Riggs [EMAIL PROTECTED] wrote:

 The LOCK is only required because we defer the inserts into unique
 indexes, yes?

No, as far as present pg_bulkload. It creates a new relfilenode like REINDEX,
therefore, access exclusive lock is needed. When there is violations of
unique constraints, all of the loading is rollbacked at the end of loading.

BTW, why REINDEX requires access exclusive lock? Read-only queries
are forbidden during the operation now, but I feel they are ok
because REINDEX only reads existing tuples. Can we do REINDEX
holding only shared lock on the index?


 I very much like the idea of index merging, or put another way: batch
 index inserts. How big do the batch of index inserts have to be for us
 to gain benefit from this technique?

Hmm, we might need to know *why* COPY with indexes is slow. If the major
cause is searching position to insert, batch inserts will work well.
However, if the cause is index splitting and following random i/o,
batch insertion cannot solve the problem; rebuild is still required.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] ES7000 Windows 2003 server 64bit processor

2008-02-27 Thread Babu, Gabriel Suresh
Hello All,

 

We are facing some  problems  while downloading the Postgresql 8.2.4
version of 64bit processor for both Windows and SUSE Linux ES-7000
partitions. The entire download URL's of PostgreSQL 8.2.4 version are
displaying the error message The webpage can't found. 

Can you please let me know any archives for Postgresql 8.2.4 version of
64bit processor.

 

Postgresql 8.2.4 Win32 Application

I am trying to install postgresql-8.2.4 on ES7000 with Windows Server
2003 (pack 1).

I tried to install on ES-7000 using   postgresql-8.2.msi installer with:

I executed the installation as Administrator (user group
Administrators).

I got the error Failed to run initdb: 128! With an error log that is
empty

 

Though PostgreSQL claims that PostgreSQL 8.2.4 win32 application
supports Windows 64bit processor. We are assuming PostgreSQL 8.2.4 win32
bit application does not support the ES7000 Windows 2003 server 64bit
processor. 

 

Can you please help me on this.

 

Regards

Suresh