Re: [HACKERS] [WIP] Performance Improvement by reducing WAL for Update Operation

2012-08-06 Thread Heikki Linnakangas

On 06.08.2012 06:10, Amit Kapila wrote:

Currently the solution for fixed length columns cannot handle the case of
variable length columns and NULLS. The reason is for fixed length columns
there is no need of diff technology between old and new tuple, however for
other cases it will be required.
For fixed length columns, if we just note the OFFSET, LENGTH, VALUE of
changed columns of new tuple in WAL, it will be sufficient to do the replay
of WAL. However to handle other cases we need to use diff mechanism.

Can we do something like if the changed columns are fixed length and doesn't
contain NULL's, then store [OFFSET, LENGTH, VALUE] format in WAL and for
other cases store diff format.

This has advantage that for Updates containing only fixed length columns
don't have to pay penality of doing diff between new and old tuple. Also we
can do the whole work in 2 parts, one for fixed length columns and second to
handle other cases.


Let's keep it simple and use the same diff format for all tuples, at 
least for now. If it turns out that you can indeed get even more gain 
for fixed length tuples by something like that, then let's do that later 
as a separate patch.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 9 bind address for replication

2012-08-06 Thread Magnus Hagander
On Mon, Jul 23, 2012 at 8:45 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 07/23/2012 02:23 PM, Adam Crews wrote:

 On Mon, Jul 23, 2012 at 8:34 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Sat, Jul 21, 2012 at 1:24 AM, Adam Crews adam.cr...@gmail.com wrote:

 Hello,

 I'm sorry for cross-posting, however I originally posted this to
 pgsql-general list, but didnt get any replies.

 Then I posted to pgsql-cluster-hackers..., and now here.


 I’m using pg 9.1.3 on CentOS 5 and have a few slave databases setup
 using the built in streaming replication.

 On the slaves I set the “listen_addresses” config option to an ip
 address for a virtual alias on my network interfaces.  The host has an
 address of 10.1.1.10, and there is a virtual alias of 10.1.1.40 that
 the slave postmaster binds to.

 When the slave makes it connection to the master to start replication
 the source address for the connection is the host address, not the
 virtual alias address.  Connections appear to come from 10.1.1.10,
 instead of the slave postmaster address of 10.1.1.40.

 This seems like a bug to me.  I could understand that if the
 postmaster is listening on all interfaces, then it should use whatever
 the IP is for the for the host, but in an instance where the
 postmaster has been configured to listen to a specific address it
 seems like the call to start the replication should be passed that
 address so connections come from the slave postmaster’s IP, instead of
 the host.

 Is there a config option that can be used to adjust this?  I've looked
 in the docs, but haven't found one yet.

 Is this perhaps a bug, or lack of feature?

 I don't think it's a bug, because the behavior you're hoping for might
 not be what everyone would want in a similar situation.  It might
 qualify as an unimplemented feature.

 This mailing list isn't heavily used and this seems a bit off-topic
 for it anyway; you might want to try a different one for further
 discussion of this issue.

 So, I think this, as Robert states, an unimplemented feature.

 For my situation it would be very useful to have an option to be able
 to specify the source address for replication.

 I discovered this because I bind the listen address for postgres to a
 single address even though the host system may have multiple
 addresses.  I then use that single address in iptables rules on other
 systems.  Since I expect the slave to be at a .40 address, but the
 replication comes from the primary address of the interface (in this
 case .10), my iptables rules were missing the access for the slave to
 connect to the master.

 This site http://linux-ip.net/html/routing-saddr-selection.html
 describes the behavior I'm seeing.

 How do I go about requesting a config option that would allow me to
 specify the source address for the replication connections?


 You just have :-)

 You could just add an iptables rule redirecting .10 packets on port 5432 (or
 whatever you're using) appropriately.

 We don't have any provision for binding the local end of any connection
 AFAIK. So the first question is Do we want to? and the second is If yes,
 when and how? I don't see that replication should be a special case - if
 this is worth providing for it should be applicable to all clients, ISTM.

I have an ugly patch lying around that implemented this as a libpq
connection option. It was just a quick hack to work around a situation
just like this (though not for replication), and ISTM that's the
proper place to put it.

I'll stick it on my TODO to try to clean that one up and submit for 9.3..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Performance Improvement by reducing WAL for Update Operation

2012-08-06 Thread Amit Kapila
From: Heikki Linnakangas [mailto:heikki.linnakan...@enterprisedb.com] 
Sent: Monday, August 06, 2012 2:32 PM
To: Amit Kapila
Cc: 'Bruce Momjian'; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [WIP] Performance Improvement by reducing WAL for
Update Operation
On 06.08.2012 06:10, Amit Kapila wrote:
 Currently the solution for fixed length columns cannot handle the case of
 variable length columns and NULLS. The reason is for fixed length columns
 there is no need of diff technology between old and new tuple, however
for
 other cases it will be required.
 For fixed length columns, if we just note the OFFSET, LENGTH, VALUE of
 changed columns of new tuple in WAL, it will be sufficient to do the
replay
 of WAL. However to handle other cases we need to use diff mechanism.

 Can we do something like if the changed columns are fixed length and
doesn't
 contain NULL's, then store [OFFSET, LENGTH, VALUE] format in WAL and for
 other cases store diff format.

 This has advantage that for Updates containing only fixed length columns
 don't have to pay penality of doing diff between new and old tuple. Also
we
 can do the whole work in 2 parts, one for fixed length columns and second
to
 handle other cases.

 Let's keep it simple and use the same diff format for all tuples, at 
 least for now. If it turns out that you can indeed get even more gain 
 for fixed length tuples by something like that, then let's do that later 
 as a separate patch.

Okay, I shall first try to design and implement the same format for all
tuples
and discuss the results of same with community.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] several problems in pg_receivexlog

2012-08-06 Thread Magnus Hagander
On Tue, Jul 31, 2012 at 5:06 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Magnus Hagander's message of jue jul 12 07:35:11 -0400 2012:
 On Tue, Jul 10, 2012 at 6:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

  When an error happens after replication connection has been established,
  pg_receivexlog doesn't close an open file descriptor and release an 
  allocated
  memory area. This was harmless before 
  16282ae688de2b320cf176e9be8a89e4dfc60698
  because pg_receivexlog exits immediately when an error happens. But
  currently in an error case, pg_receivexlog tries reconnecting to the server
  infinitely, so file descriptors and memory would leak. I think this is 
  problem
  and should be fixed. The patch which I submitted yesterday changes
  pg_receivexlog so that it closes the open file and frees the memory area
  before reconnecting to the server.

 Thanks. I get it now, and this explains why I didn't see it before - I
 didn't check properly after we added the loop mode. Patch applied with
 minor changes (e.g. there's no point in doing PQfinish(tmpconn) right
 after you've verified tmpconn is NULL)

 For some reason, Magnus neglected to backpatch this to 9.2, so I just
 did.

Thanks. I believe that was just an oversight.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] several problems in pg_receivexlog

2012-08-06 Thread Magnus Hagander
On Tue, Jul 31, 2012 at 6:50 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Aug 1, 2012 at 12:09 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 Excerpts from Fujii Masao's message of mar jul 17 13:58:38 -0400 2012:

  You're right. If the error is detected, that function always returns 
  false
  and the error message is emitted (but I think that current error message
  pg_basebackup: child process exited with error 1 is confusing),
  so it's OK. But if walsender in the server is terminated by SIGTERM,
  no error is detected and pg_basebackup background process gets out
  of the loop in ReceiveXlogStream() and returns true.
 
  Oh. Because the server does a graceful shutdown. D'uh, of course.
 
  Then yes, your suggested fix seems like a good one.

 Attached patch adds the fix.

 Also I found I had forgotten to set the file descriptor to -1 at the end of
 ReceiveXlogStream(), in previously-committed my patch. Attached patch
 fixes this problem.

 This hasn't been committed yet AFAICT, and it probably needs a refresh
 now after my changes to pg_basebackup.  Please update the patch.

 I attached the updated version.

Thanks, applied.


  Also,
 if this is not in the Open Items list, please put it there so that we
 don't forget it before the 9.2 release.

 Yep, done.

And I'll go take it off :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-08-06 Thread Heikki Linnakangas

On 20.07.2012 10:13, Jan Urbański wrote:

On 20/07/12 08:59, Jan Urbański wrote:

On 18/07/12 17:17, Heikki Linnakangas wrote:

On 14.07.2012 17:50, Jan Urbański wrote:

If pg_do_encoding_conversion() throws an error, you don't get a chance
to call Py_DECREF() to release the string. Is that a problem?

If an error occurs in PLy_traceback(), after incrementing
recursion_depth, you don't get a chance to decrement it again. I'm not
sure if the Py* function calls can fail, but at least seemingly trivial
things like initStringInfo() can throw an out-of-memory error.


Of course you're right (on both accounts).

Here's a version with a bunch of PG_TRies thrown in.


Silly me, playing tricks with postincrements before fully waking up.

Here's v3, with a correct inequality test for exceeding the traceback
recursion test.


Committed the convert-via-UTF-8 part of this. I'll take a closer look at 
the recursion check next.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Robert Haas
On Sun, Aug 5, 2012 at 5:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've been idly amusing myself by trying to hack up support for
 SQL-standard LATERAL subqueries.

Cool!

 Currently the patch only implements the syntax called out in the standard,
 namely that you can put LATERAL in front of a derived table, which is
 to say a parenthesized sub-SELECT in FROM.  It strikes me that it might be
 worth allowing LATERAL with a function-in-FROM as well.  So basically
 LATERAL func(args) alias
 would be an allowed abbreviation for
 LATERAL (SELECT * FROM func(args)) alias
 Since the standard doesn't have function-in-FROM, it has nothing to say
 about whether this is sane or not.  The argument for this is mainly that
 SRFs are one of the main use-cases for LATERAL (replacing SRF-in-the-
 SELECT-list usages), so we might as well make it convenient.  Any opinions
 pro or con about that?

Apparently Sybase and Microsoft SQL server use a slightly different
syntax, CROSS APPLY, for this.

http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

It may make sense to consider mimicking that instead of inventing our
own way of doing it, but I haven't investigated much so it's also
possible that it doesn't make sense.

 While fooling around in the planner I realized that I have no idea what
 outer-level aggregates mean in a LATERAL subquery, and neither does
 Postgres:
 regression=# select 1 from tenk1 a, lateral (select * from int4_tbl b where 
 f1 = max(a.unique1)) x;
 ERROR:  plan should not reference subplan's variable
 I don't see anything prohibiting this in SQL:2008, but ordinarily this
 would be taken to be an outer-level aggregate, and surely that is not
 sensible in the LATERAL subquery.  For the moment it seems like a good
 idea to disallow it, though I am not sure where is a convenient place
 to test for such things.  Has anyone got a clue about whether this is
 well-defined, or is it simply an oversight in the spec?

My mental picture of LATERAL (which might be inaccurate) is that it
has the semantics that you'd get from a parameterized nestloop.  So I
can't assign any meaning to that either.

 Comments, better ideas?

Thanks for working on this - sorry I don't have more thoughts right at
the moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-06 Thread Robert Haas
On Sun, Aug 5, 2012 at 10:41 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I think file_fdw is useful for managing log files such as PG CSV logs.  Since
 often, such files are sorted by timestamp, I think the patch can improve the
 performance of log analysis, though I have to admit my demonstration was not
 realistic.

Hmm, I guess I could buy that as a plausible use case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows Streaming replication -- Windows 2008 servers

2012-08-06 Thread Robert Haas
On Mon, Aug 6, 2012 at 1:41 AM, chinnaobi chinna...@gmail.com wrote:
 Hi All, It would be helpful if someone tell me, how to verify the streaming
 replication started in standby server successfully, just after the service
 is started ?? Like knowing the exit code of the service started...

It doesn't really work that way.  The service starts up and begins
recovery, and only when it finds that it needs more write-ahead log
records does it begin streaming (or restoring from the archive).  So
although it's normally quick, in theory there could be a very long
delay between the time the service is started and the time streaming
begins; so it's not something we can really report via an exit code at
startup time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlock-pthread_mutex : real world results

2012-08-06 Thread Robert Haas
On Sun, Aug 5, 2012 at 7:19 PM, Nils Goroll sl...@schokola.de wrote:
 meanwhile we're using the patch in production (again, this is 9.1.3) and
 after running it under full load for one week I believe it is pretty safe to
 say that replacing the spinlock code with pthread_mutexes on Linux (which
 basically are a futex wrapper) has solved the scalability issue and all
 stability/performance problems on this system are simply gone.

 While the improved pgbench run had already given a clear indication
 regarding the optimization potential, we can now be pretty certain that
 spinlock contention had really been the most significant root cause for the
 issues I had described in my early postings (why roll-your-own s_lock? /
 improving scalability / experimental: replace s_lock spinlock code with
 pthread_mutex on linux).

 I am attaching annotated graphs showing the load averages and cpu statistics
 of the respective machine. Please note the fact that the highest spikes have
 been averaged out in these graphs. As I had mentioned before, with the
 original code in place we had seen saturation of 64 cores and load averages
 in excess of 300.


 I fully agree that improvements in more recent pgsql code to reduce the
 number of required locks or, even better, lockless data structures are the
 way to go, but for the remaining cases it should now have become apparent
 that favoring efficient mutex implementations is advantageous for large
 SMPs, where they exist (e.g. futexes on Linux).

Interesting data.  I guess the questions in my mind are:

1. How much we're paying for this in the uncontended case?

2. Should we be modifying our spinlock implementation on Linux to use
futexes rather than pulling pthreads into the mix?

Anyone have data on the first point, or opinions on the second one?

I certainly think there is some potential here in terms of preventing
the worst-case situation where the entire machine ends up spending a
major portion of its CPU time in s_lock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Amit Kapila
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
Sent: Monday, August 06, 2012 6:16 PM
On Sun, Aug 5, 2012 at 5:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Currently the patch only implements the syntax called out in the
standard,
 namely that you can put LATERAL in front of a derived table, which is
 to say a parenthesized sub-SELECT in FROM.  It strikes me that it might
be
 worth allowing LATERAL with a function-in-FROM as well.  So basically
 LATERAL func(args) alias
 would be an allowed abbreviation for
 LATERAL (SELECT * FROM func(args)) alias
 Since the standard doesn't have function-in-FROM, it has nothing to say
 about whether this is sane or not.  The argument for this is mainly that
 SRFs are one of the main use-cases for LATERAL (replacing SRF-in-the-
 SELECT-list usages), so we might as well make it convenient.  Any
opinions
 pro or con about that?

 Apparently Sybase and Microsoft SQL server use a slightly different
 syntax, CROSS APPLY, for this.

 http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

 It may make sense to consider mimicking that instead of inventing our
 own way of doing it, but I haven't investigated much so it's also
 possible that it doesn't make sense.

There are certain differences mentioned in the link due to which I am 
not sure it can be mimicked exactly, and may be that's why Sybase also has
both syntaxes.

Differences
-
The LATERAL keyword cannot NULL-supply rows, whereas an OUTER APPLY will.
Additionally, with a LATERAL derived table, the derived table and the outer
reference must be separated by a comma. For an APPLY operator, the table
expression on the right and the outer reference cannot be separated by a
comma, but they can be separated by any other join operator. In other words,
the APPLY operator allows references to any table within the left table
expression, whereas the LATERAL keyword allows references to tables outside
the current table expression.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-06 Thread Heikki Linnakangas

On 04.08.2012 12:31, Alexander Korotkov wrote:

Hackers,

attached patch is for collecting statistics and selectivity estimation for
ranges.

In order to make our estimations accurate for every distribution of
ranges, we would collect 2d-distribution of lower and upper bounds of range
into some kind of 2d-histogram. However, this patch use some simplification
and assume distribution of lower bound and distribution of length to be
independent.


Sounds reasonable. Another possibility would be to calculate the average 
length for each lower-bound bin. So you would e.g know the average 
length of values with lower bound between 1-10, and the average length 
of values with lower bound between 10-20, and so forth. Within a bin, 
you would have to assume that the distribution of the lengths is fixed.


PS. get_position() should guard against division by zero, when subdiff 
returns zero.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Amit Kapila
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, August 06, 2012 3:28 AM
To: pgsql-hackers@postgreSQL.org
Subject: [HACKERS] WIP patch for LATERAL subqueries

 I've been idly amusing myself by trying to hack up support for
SQL-standard LATERAL subqueries.  I've got something 
 that turns over, more or less:

 regression=# select * from int4_tbl a, lateral (select unique1,unique2
from tenk1 b where a.f1 = unique1) x;
 f1 | unique1 | unique2 
 +-+-
  0 |   0 |9998
 (1 row)

 regression=# explain select * from int4_tbl a, lateral (select
unique1,unique2 from tenk1 b where a.f1 = unique1) x;
QUERY PLAN



---
 Nested Loop  (cost=0.00..42.55 rows=5 width=12)
   -  Seq Scan on int4_tbl a  (cost=0.00..1.05 rows=5 width=4)
   -  Index Scan using tenk1_unique1 on tenk1 b  (cost=0.00..8.28 rows=1
width=8)
 Index Cond: (a.f1 = unique1)
 (4 rows)

 but there's a good deal of work left to do, some of which could use some
discussion.

 Feature/semantics issues:

 Currently the patch only implements the syntax called out in the standard,
namely that you can put LATERAL in front of  a derived table, which is to
say a parenthesized sub-SELECT in FROM.  It strikes me that it might be
worth allowing 
 LATERAL with a function-in-FROM as well.  So basically
   LATERAL func(args) alias
 would be an allowed abbreviation for
   LATERAL (SELECT * FROM func(args)) alias Since the standard
doesn't have function-in-FROM, it has nothing to say  about whether this is
sane or not.  The argument for this is mainly that SRFs are one of the main
use-cases for LATERAL  (replacing SRF-in-the- SELECT-list usages), so we
might as well make it convenient.  Any opinions pro or con about 
 that?

I have checked Sybase also has similar syntax for functions by other keyword
APPLY. So this should be good way to specify.

 While fooling around in the planner I realized that I have no idea what
outer-level aggregates mean in a LATERAL 
 subquery, and neither does
 Postgres:
 regression=# select 1 from tenk1 a, lateral (select * from int4_tbl b
where f1 = max(a.unique1)) x;
 ERROR:  plan should not reference subplan's variable I don't see anything
prohibiting this in SQL:2008, but ordinarily  this would be taken to be an
outer-level aggregate, and surely that is not sensible in the LATERAL
subquery.  For the 
 moment it seems like a good idea to disallow it, though I am not sure
where is a convenient place to test for such 
 things.  Has anyone got a clue about whether this is well-defined, or is
it simply an oversight in the spec?

I have checked in Oracle and it gives error in such query:
SQL select * from t1, Lateral (select * from t2 where t2.c2 = max(t1.c1)); 
select * from t1, Lateral (select * from t2 where t2.c2 = max(t1.c1)) 
  * 
ERROR at line 1: 
ORA-00934: group function is not allowed here


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tzdata2012d

2012-08-06 Thread Magnus Hagander
On Tue, Jul 31, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from David Fetter's message of lun jul 30 17:27:46 -0400 2012:
 Can we slide this into the upcoming point release?  When would that
 be?

 Usually, Tom installs the latest timezone data just before each point
 release.

 It's part of the release checklist.

 In practice, people who need the latest TZ data shouldn't be relying
 on our copy anyway.  I think pretty much all distros build with

That's a dangerous thing to say. All the Windows users out there
have no choice. I'm not sure about Mac, but AFAIK certainly the ones
downloading the binaries will get our set of TZ files. That
represents a very non-trival portion of our users.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Aug 5, 2012 at 10:41 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I think file_fdw is useful for managing log files such as PG CSV logs.  Since
 often, such files are sorted by timestamp, I think the patch can improve the
 performance of log analysis, though I have to admit my demonstration was not
 realistic.

 Hmm, I guess I could buy that as a plausible use case.

In the particular case of PG log files, I'd bet good money against them
being *exactly* sorted by timestamp.  Clock skew between backends, or
varying amounts of time to construct and send messages, will result in
small inconsistencies.  This would generally not matter, until the
planner relied on the claim of sortedness for something like a mergejoin
... and then it would matter a lot.

In general I'm quite suspicious of the idea of believing that externally
supplied data is sorted in exactly the way that PG thinks it should
sort.  If we implement this you can bet that people will screw up, for
instance by using the wrong locale/collation to sort text data.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Robert Haas
On Mon, Aug 6, 2012 at 10:07 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Currently the patch only implements the syntax called out in the
 standard,
 namely that you can put LATERAL in front of a derived table, which is
 to say a parenthesized sub-SELECT in FROM.  It strikes me that it might
 be
 worth allowing LATERAL with a function-in-FROM as well.  So basically
 LATERAL func(args) alias
 would be an allowed abbreviation for
 LATERAL (SELECT * FROM func(args)) alias
 Since the standard doesn't have function-in-FROM, it has nothing to say
 about whether this is sane or not.  The argument for this is mainly that
 SRFs are one of the main use-cases for LATERAL (replacing SRF-in-the-
 SELECT-list usages), so we might as well make it convenient.  Any
 opinions
 pro or con about that?

 Apparently Sybase and Microsoft SQL server use a slightly different
 syntax, CROSS APPLY, for this.

 http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

 It may make sense to consider mimicking that instead of inventing our
 own way of doing it, but I haven't investigated much so it's also
 possible that it doesn't make sense.

 There are certain differences mentioned in the link due to which I am
 not sure it can be mimicked exactly, and may be that's why Sybase also has
 both syntaxes.

 Differences
 -
 The LATERAL keyword cannot NULL-supply rows, whereas an OUTER APPLY will.
 Additionally, with a LATERAL derived table, the derived table and the outer
 reference must be separated by a comma. For an APPLY operator, the table
 expression on the right and the outer reference cannot be separated by a
 comma, but they can be separated by any other join operator. In other words,
 the APPLY operator allows references to any table within the left table
 expression, whereas the LATERAL keyword allows references to tables outside
 the current table expression.

I think you can always simulate CROSS APPLY using LATERAL.  The syntax
is different but the functionality is the same.  However, OUTER APPLY
allows you to do something that I don't think is possible using
LATERAL.  While it would be nice to have both CROSS APPLY and OUTER
APPLY, my main point was to suggest supporting CROSS APPLY rather than
the extension to the LATERAL syntax Tom proposed.  That is, the spec
allows:

FROM x,  LATERAL (SELECT * FROM srf(x.a)) y

...and Tom proposed allowing this to be shortened to:

FROM x, LATERAL srf(x.a)

...and what I'm saying is maybe we should instead allow it to be shortened to:

FROM x CROSS APPLY srf(x.a)

...as some other database systems are already doing.  I can't think of
any particular reason why Tom's proposed shorthand would be
problematic; I'm just suggesting that it may be better to support the
same shorthand that other people already support rather than inventing
our own, idiosyncratic shorthand.

That having been said, I get paid the same either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Apparently Sybase and Microsoft SQL server use a slightly different
 syntax, CROSS APPLY, for this.

 http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

Well, this is only a blog entry and not their manual, but AFAICT that is
just a useless deviation from SQL-standard syntax; it does nothing that
CROSS JOIN LATERAL or LEFT JOIN LATERAL ... ON true wouldn't do.
I can't tell if the blogger simply doesn't know that LATERAL can be used
in a JOIN nest, or if that's actually a misfeature of the DBMS.

I'm not in favor of duplicating this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-08-06 Thread Heikki Linnakangas
Just to check where we stand on this: Are you going to send a finalized 
version of this patch, based on the one I sent earlier, or should I pick 
up that version and try to get it into committable state?


On 23.07.2012 10:37, Alexander Korotkov wrote:

On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:


On 13.07.2012 02:00, Alexander Korotkov wrote:


Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.



Looking at the SP-GiST patch now..

It would be nice to have an introduction, perhaps as a file comment at the
top of rangetypes_spgist.c, explaining how the quad tree works. I have a
general idea of what a quad tree is, but it's not immediately obvious how
it maps to SP-GiST. What is stored on a leaf node and an internal node?
What is the 'prefix' (seems to be the centroid)? How are ranges mapped to
2D points? (the function comment of getQuadrant() is a good start for that
last one)



I've added some comments at the top of rangetypes_spgist.c.

In spg_range_quad_inner_**consistent(), if in-hasPrefix == true, ISTM that

in all cases where 'empty' is true, 'which' is set to 0, meaning that there
can be no matches in any of the quadrants. In most of the case-branches,
you explicitly check for 'empty', but even in the ones where you don't, I
think you end up setting which=0 if empty==true. I'm not 100% sure about
the RANGESTRAT_ADJACENT case, though. Am I missing something?



Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true,
while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.

It would be nice to avoid the code duplication between the new

bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move
bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal()
too?



Done.

--
With best regards,
Alexander Korotkov.




--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think you can always simulate CROSS APPLY using LATERAL.  The syntax
 is different but the functionality is the same.  However, OUTER APPLY
 allows you to do something that I don't think is possible using
 LATERAL.

Uh, what exactly?  AFAICT from that blog entry, x OUTER APPLY y is
exactly the same as x LEFT JOIN LATERAL y ON true.  Okay, so you
saved three words, but is that a good enough reason to invent a
nonstandard syntax?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-06 Thread Robert Haas
On Mon, Aug 6, 2012 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Aug 5, 2012 at 10:41 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I think file_fdw is useful for managing log files such as PG CSV logs.  
 Since
 often, such files are sorted by timestamp, I think the patch can improve the
 performance of log analysis, though I have to admit my demonstration was not
 realistic.

 Hmm, I guess I could buy that as a plausible use case.

 In the particular case of PG log files, I'd bet good money against them
 being *exactly* sorted by timestamp.  Clock skew between backends, or
 varying amounts of time to construct and send messages, will result in
 small inconsistencies.  This would generally not matter, until the
 planner relied on the claim of sortedness for something like a mergejoin
 ... and then it would matter a lot.

Hmm, true.

 In general I'm quite suspicious of the idea of believing that externally
 supplied data is sorted in exactly the way that PG thinks it should
 sort.  If we implement this you can bet that people will screw up, for
 instance by using the wrong locale/collation to sort text data.

I think that optimizations like this are going to be essential for
things like pgsql_fdw (or other_rdms_fdw).  Despite the thorny
semantic issues, we're just not going to be able to get around it.
There will even be people who want SELECT * FROM ft ORDER BY 1 to
order by the remote side's notion of ordering rather than ours,
despite the fact that the remote side has some insane-by-PG-standards
definition of ordering.  People are going to find ways to do that kind
of thing whether we condone it or not, so we might as well start
thinking now about how we're going to live with it.  But that doesn't
answer the question of whether or not we ought to support it for
file_fdw in particular, which seems like a more arguable point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Robert Haas
On Mon, Aug 6, 2012 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think you can always simulate CROSS APPLY using LATERAL.  The syntax
 is different but the functionality is the same.  However, OUTER APPLY
 allows you to do something that I don't think is possible using
 LATERAL.

 Uh, what exactly?  AFAICT from that blog entry, x OUTER APPLY y is
 exactly the same as x LEFT JOIN LATERAL y ON true.  Okay, so you
 saved three words, but is that a good enough reason to invent a
 nonstandard syntax?

I wasn't sure that x LEFT JOIN LATERAL (SELECT * FROM y) ON true is
valid syntax.  I thought that perhaps LATERAL() was only allowed
around a top-level FROM-list item.

However, if it is allowed, then I agree that the extra syntax isn't
adding any functionality; it's just a question of whether you happen
to like their particular choice of notational shorthand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tzdata2012d

2012-08-06 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Tue, Jul 31, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In practice, people who need the latest TZ data shouldn't be relying
 on our copy anyway.  I think pretty much all distros build with

 That's a dangerous thing to say. All the Windows users out there
 have no choice. I'm not sure about Mac, but AFAIK certainly the ones
 downloading the binaries will get our set of TZ files. That
 represents a very non-trival portion of our users.

Well, I'm certainly not suggesting that we shouldn't update those files
whenever we make a release.  I'm just pointing out that in many cases,
the OS will provide a copy that's updated on a faster cycle than that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 6, 2012 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Uh, what exactly?  AFAICT from that blog entry, x OUTER APPLY y is
 exactly the same as x LEFT JOIN LATERAL y ON true.  Okay, so you
 saved three words, but is that a good enough reason to invent a
 nonstandard syntax?

 I wasn't sure that x LEFT JOIN LATERAL (SELECT * FROM y) ON true is
 valid syntax.  I thought that perhaps LATERAL() was only allowed
 around a top-level FROM-list item.

No.  LATERAL is allowed in a table reference, which can be either
a top-level FROM item or a component of a JOIN nest.  (My current
patch doesn't actually work for the latter case, but I'm going to
work on fixing that next.)  What's curious about that Sybase blog
is that the blogger seems to think that LATERAL can only be used
at top level ... but I'm not sure if that's actually a restriction
in Sybase, or just a gap in his knowledge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: Statistics and selectivity estimation for ranges

2012-08-06 Thread Matthias
Having statistics on ranges was really missing! The planner was doing
some really, really bad choices on bigger tables regarding seq/random
scans, nested loop/other joins etc.

Is there any chance this makes it into 9.2 final? It would really
round-off the introduction of range types and maybe avoid problems
like the new range types are slow (just due to the bad row
estimates).

Thanks for implementing this feature,
-Matthias

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Statistics and selectivity estimation for ranges

2012-08-06 Thread Josh Berkus

 Is there any chance this makes it into 9.2 final? It would really
 round-off the introduction of range types and maybe avoid problems
 like the new range types are slow (just due to the bad row
 estimates).

Nope, that's strictly a 9.3 feature.  9.2 is in beta2.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Mailsystem maintenance/migration announcement

2012-08-06 Thread Stefan Kaltenbrunner
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all!

We are currently planning to finalize the ongoing work on the mailsystem
migration we started earlier this year by migrating the
two remaining components of the postgresql.org mailsystem infrastructure
to new systems.
Those parts (listserver and mailbox hosting) will be moved to new
systems in an maintenance window on:

Friday, 10th of august starting 15:00 GMT

The migration is expected to take about 2 hours, in that time period all
mails will be held queued on our inbound systems (which are already on
the new infrastructure) and no outbound mails will be sent (or can be
sent using the webmail system).
We expect no loss of in-transit emails at all and for the mailbox users
with local storage we are going to complete migrate all the content of
their mailboxes per that date.

People using mailboxes (as in have an @postgresql.org address) and do
NOT have a forwards will have to make modifications to their
configuration and will get a seperate email with appropriate details on
what (and if) they have to change anything.
Apart from that we do not expect any user-visible behaviour changes with
regards to the list-service itself




Stefan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlAf/SYACgkQr1aG+WhhYQGgxACfVDQ+l4K52zoZYUlrD4jRQozK
/0YAn1V5QU99KWEqDl1f2zFAcN2dzkxZ
=frEs
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Docs: Make notes on sequences and rollback more obvious

2012-08-06 Thread Robert Haas
On Sat, Aug 4, 2012 at 12:56 AM, Craig Ringer ring...@ringerc.id.au wrote:
 On 08/04/2012 04:12 AM, Kevin Grittner wrote:
 I haven't reviewed it in detail but noticed an apparent editing error:
 which are used the counters should probably have an as thrown in there.
 Or something.

 Thanks. Editing fail. I revised that spot repeatedly to try to keep it short
 and simple without in any way implying that SEQUENCEs are *only* used for
 SERIAL columns.

 Fixed attached.

In datatype.sgml, I think that adding that important block in the
middle of the existing paragraph is too choppy.  I moved it down a
bit, changed it to a note, expanded it a little, and fixed some typos
and markup.

In func.sgml, I chose to keep the important at the end, instead of
switching the order of the paragraphs as you did, but I moved it up
under nextval instead of having it at the end, as you had it.  I kept
your note in setval() but cleaned it up a bit.

I did not commit the advanced.sgml changes.  I am not sure I believe
the assertion that any function or type with special transactional
behavior will include a documentation mention.  It doesn't seem like a
terribly future-proof assertion at any rate.  With respect to the
mention of autocommit, I think it would be good to add something
there, but maybe it should cross-reference our existing documentation
mentions of autocommit.  Also, it's a bit ambiguous the way it's
worded whether you get the automatic BEGIN/COMMIT with autocommit=on
or with autocommit=off; somehow we should try to clarify what we mean
a little more there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] redundant message?

2012-08-06 Thread Alvaro Herrera
I noticed we have this message in utils/adt/misc.c:176 (9.2 branch):

  (errmsg(must be superuser or have the same role to terminate backends 
running in other server processes;

I think the wording backends running in other server processes is
redundant.

How about this?
  (errmsg(must be superuser or have the same role to terminate other server 
processes;

Other ideas are welcome.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] redundant message?

2012-08-06 Thread Robert Haas
On Mon, Aug 6, 2012 at 2:54 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 I noticed we have this message in utils/adt/misc.c:176 (9.2 branch):

   (errmsg(must be superuser or have the same role to terminate backends 
 running in other server processes;

 I think the wording backends running in other server processes is
 redundant.

Yeah.

 How about this?
   (errmsg(must be superuser or have the same role to terminate other server 
 processes;

Sounds good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlock-pthread_mutex : real world results

2012-08-06 Thread Martijn van Oosterhout
On Mon, Aug 06, 2012 at 08:54:11AM -0400, Robert Haas wrote:
 2. Should we be modifying our spinlock implementation on Linux to use
 futexes rather than pulling pthreads into the mix?
 
 Anyone have data on the first point, or opinions on the second one?

I'm not sure whether pthreads is such a thick layer. Or are you
referring to the fact that you don't want to link against the library
at all?

If we've found a situation where our locks work better than the ones in
pthreads than either (a) we're doing something wrong or (b) the
pthreads implementation could do with improvement.

In either case it might be worth some investigation. If we can improve
the standard pthreads implementation everybody wins.

BTW, I read that some *BSDs have futex implementations (to emulate
linux), it might be an idea to see where they're going.

e.g. http://osdir.com/ml/os.dragonfly-bsd.kernel/2003-10/msg00232.html

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] spinlock-pthread_mutex : real world results

2012-08-06 Thread Nils Goroll

Robert,


1. How much we're paying for this in the uncontended case?


Using glibc, we have the overhead of an additional library function call, which 
we could eliminate by pulling in the code from glibc/nptl or a source of other 
proven reference code.


The pgbench results I had posted before 
http://archives.postgresql.org/pgsql-hackers/2012-07/msg00061.php could give an 
indication on the higher base cost for the simple approach.



I have mentioned this before: While I agree that minimizing the base overhead is 
good, IMHO, optimizing the worst case is the important part here.


Nils

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-06 Thread Amit Kapila
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Monday, August 06, 2012 8:07 PM
On Mon, Aug 6, 2012 at 10:07 AM, Amit Kapila amit.kap...@huawei.com wrote:
 I think you can always simulate CROSS APPLY using LATERAL.  The syntax
 is different but the functionality is the same.  However, OUTER APPLY
 allows you to do something that I don't think is possible using
 LATERAL.  While it would be nice to have both CROSS APPLY and OUTER
 APPLY, my main point was to suggest supporting CROSS APPLY rather than
 the extension to the LATERAL syntax Tom proposed.  That is, the spec
 allows:

 FROM x,  LATERAL (SELECT * FROM srf(x.a)) y

I think in SQL specs it is not clearly mentioned about functions.
The same is mentioned by Tom in his mail
 So basically
LATERAL func(args) alias
   would be an allowed abbreviation for
LATERAL (SELECT * FROM func(args)) alias Since the standard
doesn't have function-in-FROM, it has nothing to say about whether this is
sane or not.

 ...and Tom proposed allowing this to be shortened to:

 FROM x, LATERAL srf(x.a)

 ...and what I'm saying is maybe we should instead allow it to be shortened
to:

FROM x CROSS APPLY srf(x.a)

...as some other database systems are already doing.  

I think if specs doesn't mention clearly  about functions then we can use
LATERAL syntax similar to CROSS APPLY
which is proposed by Tom.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Beta 3

2012-08-06 Thread Craig Ringer

Heya all

It seems like it's a bit trickier to find beta downloads than might be 
ideal.


The beta info page simply reads:

 PostgreSQL 9.2 beta 3 was released on August 6, 2012.

with no information about how to obtain a build or the sources, not even 
a link to the downloads page, despite being linked to with the text:


More information on how to test and report issues: 
http://www.postgresql.org/developer/beta;


in the announcement email.

It should at least link to:

http://www.postgresql.org/download/snapshots/

and the -bugs form or guide to reporting problems page.

The beta announcement email links to the main downloads page from which 
the snapshots/beta downloads page can be reached, but the main beta page 
is a dead-end. Not ideal for encouraging feedback and testing.


--
Craig Ringer


Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-06 Thread Alexander Korotkov
On Mon, Aug 6, 2012 at 6:09 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 04.08.2012 12:31, Alexander Korotkov wrote:

 Hackers,

 attached patch is for collecting statistics and selectivity estimation for
 ranges.

 In order to make our estimations accurate for every distribution of
 ranges, we would collect 2d-distribution of lower and upper bounds of
 range
 into some kind of 2d-histogram. However, this patch use some
 simplification
 and assume distribution of lower bound and distribution of length to be
 independent.


 Sounds reasonable. Another possibility would be to calculate the average
 length for each lower-bound bin. So you would e.g know the average length
 of values with lower bound between 1-10, and the average length of values
 with lower bound between 10-20, and so forth. Within a bin, you would have
 to assume that the distribution of the lengths is fixed.


Interesting idea. AFAICS, if we store average length for each lower-bound
bin, we still have to assume some kind of distribution of range length in
order to do estimates. For example, assume that range length have
exponential distribution. Correspondingly, we've following trade off: we
don't have to assume lower bound distribution to be independent from length
distribution, but we have to assume kind of length distribution. Actually,
I don't know what is better.
Ideally, we would have range length histogram for each lower-bound bin, or
upper-bound histogram for each lower-bound bin. But, storing such amount of
data seems too expensive.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-06 Thread Etsuro Fujita
 From: Robert Haas [mailto:robertmh...@gmail.com]

 On Mon, Aug 6, 2012 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Sun, Aug 5, 2012 at 10:41 PM, Etsuro Fujita
  fujita.ets...@lab.ntt.co.jp wrote:
  I think file_fdw is useful for managing log files such as PG CSV logs.
Since
  often, such files are sorted by timestamp, I think the patch can improve
 the
  performance of log analysis, though I have to admit my demonstration was
 not
  realistic.
 
  Hmm, I guess I could buy that as a plausible use case.
 
  In the particular case of PG log files, I'd bet good money against them
  being *exactly* sorted by timestamp.  Clock skew between backends, or
  varying amounts of time to construct and send messages, will result in
  small inconsistencies.  This would generally not matter, until the
  planner relied on the claim of sortedness for something like a mergejoin
  ... and then it would matter a lot.
 
 Hmm, true.
 
  In general I'm quite suspicious of the idea of believing that externally
  supplied data is sorted in exactly the way that PG thinks it should
  sort.  If we implement this you can bet that people will screw up, for
  instance by using the wrong locale/collation to sort text data.
 
 I think that optimizations like this are going to be essential for
 things like pgsql_fdw (or other_rdms_fdw).  Despite the thorny
 semantic issues, we're just not going to be able to get around it.
 There will even be people who want SELECT * FROM ft ORDER BY 1 to
 order by the remote side's notion of ordering rather than ours,
 despite the fact that the remote side has some insane-by-PG-standards
 definition of ordering.  People are going to find ways to do that kind
 of thing whether we condone it or not, so we might as well start
 thinking now about how we're going to live with it.  But that doesn't
 answer the question of whether or not we ought to support it for
 file_fdw in particular, which seems like a more arguable point.

For file_fdw, I feel inclined to simply implement file_fdw (1) to verify the key
column is sorted in the specified way at the execution phase ie, at the (first)
scan of a data file, only when pathkeys are set, and (2) to abort the
transaction if it detects the data file is not sorted.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers