date:20100328

Re: [HACKERS] join removal

2010-03-28 Thread Dimitri Fontaine


reducejoins.c ?
flattenjoins.c ?
filterjoins.c ?

--  
dim


Le 28 mars 2010 à 22:12, Tom Lane  a écrit :


Robert Haas  writes:

On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane  wrote:

joinremoval.c ?


Maybe, except as I mentioned in the email linked upthread, my plan  
for

implementing inner join removal would also include allowing join
reordering in cases where we currently don't.  So I don't want to
sandbox it too tightly as join removal, per se, though that's
certainly what we have on the table ATM.  It's more like advanced
open-heart join-tree surgery - like prepjointree, but much later in
the process.


Hm.  At this point we're not really working with a join *tree* in any
case --- the data structure we're mostly concerned with is the list of
SpecialJoinInfo structs, and what we're trying to do is weaken the
constraints described by that list.  So I'd rather stay away from  
"tree"

terminology.

planjoins.c would fit with other names in the plan/ directory but it
seems like a misnomer because we're not really "planning" any joins
at this stage.

adjustjoins.c?  loosenjoins.c?  weakenjoins.c?

   regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread Greg Smith


David Christensen wrote:

Enclosed is a patch to add a -C option to initdb to allow you to easily append 
configuration directives to the generated postgresql.conf file for use in 
programmatic generation.


We had a patch not quite make it for 9.0 that switched over the 
postgresql.conf file to make it easy to scan a whole directory looking 
for configuration files:  
http://archives.postgresql.org/message-id/9837222c0910240641p7d75e2a4u2cfa6c1b5e603...@mail.gmail.com


The idea there was to eventually reduce the amount of postgresql.conf 
hacking that initdb and other tools have to do.  Your patch would add 
more code into a path that I'd like to see reduced significantly.


That implementation would make something easy enough for your use case 
too (below untested but show the general idea):


$ for cluster in 1 2 3 4 5 6;
 do initdb -D data$cluster
 (
 cat < data$cluster/conf.d/99clustersetup
done

This would actually work just fine for what you're doing right now if 
you used ">> data$cluster/postgresql.conf" for that next to last line 
there.  There would be duplicates, which I'm guessing is what you wanted 
to avoid with this patch, but the later values set for the parameters 
added to the end would win and be the active ones.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread Takahiro Itagaki

David Christensen  wrote:

> Enclosed is a patch to add a -C option to initdb to allow you to easily
> append configuration directives to the generated postgresql.conf file

Why don't you use just "echo 'options' >> $PGDATA/postgresql.conf" ?
Could you explain where the -C options is better than initdb + echo?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread David Christensen

Hackers,

Enclosed is a patch to add a -C option to initdb to allow you to easily append 
configuration directives to the generated postgresql.conf file for use in 
programmatic generation.  In my case, I'd been creating multiple db clusters 
with a script and would have specific overrides that I needed to make.   This 
patch fell out of the desire to make this a little cleaner.  Please review and 
comment.

From the commit message:

This is a simple mechanism to allow you to provide explicit overrides
to any GUC at initdb time.  As a basic example, consider the case
where you are programmatically generating multiple db clusters in
order to test various configurations:

  $ for cluster in 1 2 3 4 5 6;
  >   do initdb -D data$cluster -C "port = 1234$cluster" -C 
'max_connections = 10' -C shared_buffers=1M;
  > done

A possible future improvement would be to provide some basic
formatting corrections to allow specificications such as -C 'port
1234', -C port=1234, and -C 'port = 1234' to all be ultimately output
as 'port = 1234' in the final output.  This would be consistent with
postmaster's parsing.

The -C flag was chosen to be a mnemonic for "config".

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





0001-Add-C-option-to-initdb-to-allow-invocation-time-GUC-.patch
Description: Binary data


initdb-dash-C.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Hitoshi Harada

2010/3/29 Andrew Dunstan :
> Robert Haas wrote:
>> On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
>>  wrote:
>>> I'm wondering whether the internal representation of JSON should be
>>> plain JSON text, or some binary code that's easier to traverse and
>>> whatnot.  For the sake of code size, just keeping it in text is
>>> probably best.
>>
>> +1 for text.
>
> Agreed.

There's another choice, called BSON.

http://www.mongodb.org/display/DOCS/BSON

I've not researched it yet deeply, it seems reasonable to be stored in
databases as it is invented for MongoDB.

>>> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>>>
>>> It should be built-in, rather than relying on an external library
>>> (like XML does).
>>
>> Why?  I'm not saying you aren't right, but you need to make an
>> argument rather than an assertion.  This is a community, so no one is
>> entitled to decide anything unilaterally, and people want to be
>> convinced - including me.
>
> Yeah, why? We should not be in the business of reinventing the wheel (and
> then maintaining the reinvented wheel), unless the code in question is
> *really* small.

Many implementations in many languages of JSON show that parsing JSON
is not so difficult to code and the needs vary. Hence, I wonder if we
can have it very our own.

Never take it wrongly, I don't disagree text format nor disagree to
use an external library.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
 wrote:
> I apologize; I was just starting the conversation with some of my
> ideas to receive feedback.  I didn't want people to have to wade
> through too many "I think"s .  I'll be sure to use  tags in
> the future :-)

FWIW, I don't care at all whether you say "I think" or "I know"; the
point is that you have to provide backup for any position you choose
to take.

> My reasoning for "It should be built-in" is:
>  * It would be nice to have a built-in serialization format that's
> available by default.
>  * It might be a little faster because it doesn't have to link to an
> external library.

I don't think either of these reasons is valid.

>  * The code to interface between JSON logic and PostgreSQL will
> probably be much larger than the actual JSON encoding/decoding itself.

If true, this is a good argument.

>  * The externally-maintained and packaged libjson implementations I
> saw brought in lots of dependencies (e.g. glib).

As is this.

>  * "Everyone else" (e.g. PHP) uses a statically-linked JSON implementation.

But this isn't.

> Is the code in question "*really*" small?  Well, not really, but it's
> not enormous either.  By the way, I found a bug in PHP's JSON_parser
> (json_decode("true "); /* with a space */ returns null instead of
> true).  I'll have to get around to reporting that.
>
> Now, assuming JSON support is built-in to PostgreSQL and is enabled by
> default, it is my opinion that encoding issues should not be dealt
> with in the JSON code itself, but that the JSON code itself should
> assume UTF-8.  I think conversions should be done to/from UTF-8 before
> passing it through the JSON code because this would likely be the
> smallest way to implement it (not necessarily the fastest, though).
>
> Mike Rylander pointed out something wonderful, and that is that JSON
> code can be stored in plain old ASCII using \u... .  If a target
> encoding supports all of Unicode, the JSON serializer could be told
> not to generate \u escapes.  Otherwise, the \u escapes would be
> necessary.
>
> Thus, here's an example of how (in my opinion) character sets and such
> should be handled in the JSON code:
>
> Suppose the client's encoding is UTF-16, and the server's encoding is
> Latin-1.  When JSON is stored to the database:
>  1. The client is responsible and sends a valid UTF-16 JSON string.
>  2. PostgreSQL checks to make sure it is valid UTF-16, then converts
> it to UTF-8.
>  3. The JSON code parses it (to ensure it's valid).
>  4. The JSON code unparses it (to get a representation without
> needless whitespace).  It is given a flag indicating it should only
> output ASCII text.
>  5. The ASCII is stored in the server, since it is valid Latin-1.
>
> When JSON is retrieved from the database:
>  1. ASCII is retrieved from the server
>  2. If user needs to extract one or more fields, the JSON is parsed,
> and the fields are extracted.
>  3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.
>
> Note that I am being biased toward optimizing code size rather than speed.

Can you comment on my proposal elsewhere on this thread and compare
your proposal to mine?  In what ways are they different, and which is
better, and why?

> Here's a question about semantics: should converting JSON to text
> guarantee that Unicode will be \u escaped, or should it render actual
> Unicode whenever possible (when the client uses a Unicode-complete
> charset) ?

I feel pretty strongly that the data should be stored in the database
in the format in which it will be returned to the user - any
conversion which is necessary should happen on the way in.  I am not
100% sure to what extent we should attempt to canonicalize the input
and to what extend we should simply store it in whichever way the user
chooses to provide it.

> As for reinventing the wheel, I'm in the process of writing yet
> another JSON implementation simply because I didn't find the other
> ones I looked at palatable.  I am aiming for simple code, not fast
> code.  I am using malloc for structures and realloc for strings/arrays
> rather than resorting to clever buffering tricks.  Of course, I'll
> switch it over to palloc/repalloc before migrating it to PostgreSQL.

I'm not sure that optimizing for simplicity over speed is a good idea.
 I think we can reject implementations as unpalatable because they are
slow or feature-poor or have licensing issues or are not actively
maintained, but rejecting them because they use complex code in order
to be fast doesn't seem like the right trade-off to me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams

On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas  wrote:
> On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
>  wrote:
>> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>>
>> It should be built-in, rather than relying on an external library
>> (like XML does).
>
> Why?  I'm not saying you aren't right, but you need to make an
> argument rather than an assertion.  This is a community, so no one is
> entitled to decide anything unilaterally, and people want to be
> convinced - including me.

I apologize; I was just starting the conversation with some of my
ideas to receive feedback.  I didn't want people to have to wade
through too many "I think"s .  I'll be sure to use  tags in
the future :-)

My reasoning for "It should be built-in" is:
 * It would be nice to have a built-in serialization format that's
available by default.
 * It might be a little faster because it doesn't have to link to an
external library.
 * The code to interface between JSON logic and PostgreSQL will
probably be much larger than the actual JSON encoding/decoding itself.
 * The externally-maintained and packaged libjson implementations I
saw brought in lots of dependencies (e.g. glib).
 * "Everyone else" (e.g. PHP) uses a statically-linked JSON implementation.

Is the code in question "*really*" small?  Well, not really, but it's
not enormous either.  By the way, I found a bug in PHP's JSON_parser
(json_decode("true "); /* with a space */ returns null instead of
true).  I'll have to get around to reporting that.

Now, assuming JSON support is built-in to PostgreSQL and is enabled by
default, it is my opinion that encoding issues should not be dealt
with in the JSON code itself, but that the JSON code itself should
assume UTF-8.  I think conversions should be done to/from UTF-8 before
passing it through the JSON code because this would likely be the
smallest way to implement it (not necessarily the fastest, though).

Mike Rylander pointed out something wonderful, and that is that JSON
code can be stored in plain old ASCII using \u... .  If a target
encoding supports all of Unicode, the JSON serializer could be told
not to generate \u escapes.  Otherwise, the \u escapes would be
necessary.

Thus, here's an example of how (in my opinion) character sets and such
should be handled in the JSON code:

Suppose the client's encoding is UTF-16, and the server's encoding is
Latin-1.  When JSON is stored to the database:
 1. The client is responsible and sends a valid UTF-16 JSON string.
 2. PostgreSQL checks to make sure it is valid UTF-16, then converts
it to UTF-8.
 3. The JSON code parses it (to ensure it's valid).
 4. The JSON code unparses it (to get a representation without
needless whitespace).  It is given a flag indicating it should only
output ASCII text.
 5. The ASCII is stored in the server, since it is valid Latin-1.

When JSON is retrieved from the database:
 1. ASCII is retrieved from the server
 2. If user needs to extract one or more fields, the JSON is parsed,
and the fields are extracted.
 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

Note that I am being biased toward optimizing code size rather than speed.

Here's a question about semantics: should converting JSON to text
guarantee that Unicode will be \u escaped, or should it render actual
Unicode whenever possible (when the client uses a Unicode-complete
charset) ?

As for reinventing the wheel, I'm in the process of writing yet
another JSON implementation simply because I didn't find the other
ones I looked at palatable.  I am aiming for simple code, not fast
code.  I am using malloc for structures and realloc for strings/arrays
rather than resorting to clever buffering tricks.  Of course, I'll
switch it over to palloc/repalloc before migrating it to PostgreSQL.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC Query

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 10:01 PM, gaurav gupta
 wrote:
> My idea is to add a functionality of Auto tuning and Auto Indexing/
> Reindexing in DB languages.
>
> Though I am not working on this I have some idea about implementation.
> Idea is that on the no. of rows deleted, Inserted in the table we can make
> our system capable to reindex the table that will save the time of user.

Reindexing is not routine maintenance for PostgreSQL, so this seems
fairly pointless.

> Similarly using the no. of select hits on a table we can check that if
> maximum no. of times it is on a non-index field we can index on that field
> to make select faster.

Well, a SELECT statement "hits" a whole row, not a single column; but
even if you could somehow figure out a way to tally up per-column
statistics (and it's certainly not obvious to me how to do such a
thing) it doesn't follow that a column which is frequently accessed is
a good candidate for indexing.

I don't think this is a good project for a first-time hacker, or
something that can realistically be completed in one summer.  It
sounds more like a PhD project to me.  I wrote to another student who
is considering submitting a GSOC proposal with some ideas I thought
might be suitable.  You might want to review that email:

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01034.php

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC Query

2010-03-28 Thread gaurav gupta

Sir/Ma'am,

I am a Mtech student and want to participate in GSoC. I have a project
idea and want to discuss its feasibility, usability and chance of selection
with you.

My idea is to add a functionality of Auto tuning and Auto Indexing/
Reindexing in DB languages.

Though I am not working on this I have some idea about implementation.
Idea is that on the no. of rows deleted, Inserted in the table we can make
our system capable to reindex the table that will save the time of user.
Similarly using the no. of select hits on a table we can check that if
maximum no. of times it is on a non-index field we can index on that field
to make select faster.

I am looking forward to hear from you.

--
Thanks & Regards,
Gaurav Kumar Gupta
+91-9032844745

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander

On Sun, Mar 28, 2010 at 8:33 PM, Robert Haas  wrote:
> On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander  wrote:
>> In practice, every parser/serializer I've used (including the one I
>> helped write) allows (and, often, forces) any non-ASCII character to
>> be encoded as \u followed by a string of four hex digits.
>
> Is it correct to say that the only feasible place where non-ASCII
> characters can be used is within string constants?

Yes.  That includes object property strings -- they are quoted string literals.

> If so, it might be
> reasonable to disallow characters with the high-bit set unless the
> server encoding is one of the flavors of Unicode of which the spec
> approves.  I'm tempted to think that when the server encoding is
> Unicode we really ought to allow Unicode characters natively, because
> turning a long string of two-byte wide chars into a long string of
> six-byte wide chars sounds pretty evil from a performance point of
> view.
>

+1

As an aside, \u-encoded (escaped) characters and native multi-byte
sequences (of any RFC-allowable Unicode encoding) are exactly
equivalent in JSON -- it's a storage and transmission format, and
doesn't prescribe the application-internal representation of the data.

If it's faster (which it almost certainly is) to not mangle the data
when it's all staying server side, that seems like a useful
optimization.  For output to the client, however, it would be useful
to provide a \u-escaping function, which (AIUI) should always be safe
regardless of client encoding.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan




Andrew Dunstan wrote:



Robert Haas wrote:
On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander  
wrote:
 

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.



Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.


  


We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in 
utf8 is 4 (and that's pretty rare, IIRC).





Sorry. Disregard this. I see what you mean.

Yeah, I thing *requiring* non-ascii character to be escaped would be evil.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan




Robert Haas wrote:

On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander  wrote:
  

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.



Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.


  


We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in utf8 
is 4 (and that's pretty rare, IIRC).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander  wrote:
> In practice, every parser/serializer I've used (including the one I
> helped write) allows (and, often, forces) any non-ASCII character to
> be encoded as \u followed by a string of four hex digits.

Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander

On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> Here's another thought. Given that JSON is actually specified to consist
>> of a string of Unicode characters, what will we deliver to the client
>> where the client encoding is, say Latin1? Will it actually be a legal
>> JSON byte stream?
>
> No, it won't.  We will *not* be sending anything but latin1 in such a
> situation, and I really couldn't care less what the JSON spec says about
> it.  Delivering wrongly-encoded data to a client is a good recipe for
> all sorts of problems, since the client-side code is very unlikely to be
> expecting that.  A datatype doesn't get to make up its own mind whether
> to obey those rules.  Likewise, data on input had better match
> client_encoding, because it's otherwise going to fail the encoding
> checks long before a json datatype could have any say in the matter.
>
> While I've not read the spec, I wonder exactly what "consist of a string
> of Unicode characters" should actually be taken to mean.  Perhaps it
> only means that all the characters must be members of the Unicode set,
> not that the string can never be represented in any other encoding.
> There's more than one Unicode encoding anyway...

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.

Whether it would be easy inside the backend, when generating JSON from
user data stored in tables that are not in a UTF-8 encoded cluster, to
convert to UTF-8, that's something else entirely.  If it /is/ easy and
safe, then it's just a matter of scanning for multi-byte sequences and
replacing those with their \u equivalents.  I have some simple and
fast code I could share, if it's needed, though I suspect it's not.
:)

UPDATE:  Thanks, Robert, for pointing to the RFC.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> Here's another thought. Given that JSON is actually specified to consist
>> of a string of Unicode characters, what will we deliver to the client
>> where the client encoding is, say Latin1? Will it actually be a legal
>> JSON byte stream?
>
> No, it won't.  We will *not* be sending anything but latin1 in such a
> situation, and I really couldn't care less what the JSON spec says about
> it.  Delivering wrongly-encoded data to a client is a good recipe for
> all sorts of problems, since the client-side code is very unlikely to be
> expecting that.  A datatype doesn't get to make up its own mind whether
> to obey those rules.  Likewise, data on input had better match
> client_encoding, because it's otherwise going to fail the encoding
> checks long before a json datatype could have any say in the matter.
>
> While I've not read the spec, I wonder exactly what "consist of a string
> of Unicode characters" should actually be taken to mean.  Perhaps it
> only means that all the characters must be members of the Unicode set,
> not that the string can never be represented in any other encoding.
> There's more than one Unicode encoding anyway...

See sections 2.5 and 3 of:

http://www.ietf.org/rfc/rfc4627.txt?number=4627

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Tom Lane

Andrew Dunstan  writes:
> Here's another thought. Given that JSON is actually specified to consist 
> of a string of Unicode characters, what will we deliver to the client 
> where the client encoding is, say Latin1? Will it actually be a legal 
> JSON byte stream?

No, it won't.  We will *not* be sending anything but latin1 in such a
situation, and I really couldn't care less what the JSON spec says about
it.  Delivering wrongly-encoded data to a client is a good recipe for
all sorts of problems, since the client-side code is very unlikely to be
expecting that.  A datatype doesn't get to make up its own mind whether
to obey those rules.  Likewise, data on input had better match
client_encoding, because it's otherwise going to fail the encoding
checks long before a json datatype could have any say in the matter.

While I've not read the spec, I wonder exactly what "consist of a string
of Unicode characters" should actually be taken to mean.  Perhaps it
only means that all the characters must be members of the Unicode set,
not that the string can never be represented in any other encoding.
There's more than one Unicode encoding anyway...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan  writes:
  

Robert Haas wrote:


I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.
  


  
The trouble is that JSON is defined to be specifically Unicode, and in 
practice for us that means UTF8 on the server side.  It could get a bit 
hairy, and it's definitely not something I think you can wave away with 
a simple "I'll just throw some encoding/decoding function calls at it."



It's just text, no?  Are there any operations where this actually makes
a difference?
  


If we're going to provide operations on it that might involve some. I 
don't know.

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

  


I agree entirely. All I'm suggesting is that there could be many 
wrinkles here.


Here's another thought. Given that JSON is actually specified to consist 
of a string of Unicode characters, what will we deliver to the client 
where the client encoding is, say Latin1? Will it actually be a legal 
JSON byte stream?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha release this week?

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 4:40 PM, Josh Berkus  wrote:
> We've got two locations and some individuals signed up for a test-fest
> this weekend.  Would it be possible to do an alpha release this week?
> It would really help to be testing later code than Alpha4.

I'm willing to do the CVS bits, if that's helpful.  Or maybe Peter
wants to do it.  Anyway I have no problem with the idea.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Tom Lane

Andrew Dunstan  writes:
> Robert Haas wrote:
>> I think you need to assume that the encoding will be the server
>> encoding, not UTF-8.  Although others on this list are better
>> qualified to speak to that than I am.

> The trouble is that JSON is defined to be specifically Unicode, and in 
> practice for us that means UTF8 on the server side.  It could get a bit 
> hairy, and it's definitely not something I think you can wave away with 
> a simple "I'll just throw some encoding/decoding function calls at it."

It's just text, no?  Are there any operations where this actually makes
a difference?

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] five-key syscaches

2010-03-28 Thread Robert Haas

Per previous discussion, PFA a patch to change the maximum number of
keys for a syscache from 4 to 5.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01105.php

This is intended for application to 9.1, and is supporting
infrastructure for knngist.

...Robert


syscache5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan




Robert Haas wrote:

On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 wrote:
  

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot.  For the sake of code size, just keeping it in text is
probably best.



+1 for text.
  


Agreed.
  

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does).



Why?  I'm not saying you aren't right, but you need to make an
argument rather than an assertion.  This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.
  



Yeah, why? We should not be in the business of reinventing the wheel 
(and then maintaining the reinvented wheel), unless the code in question 
is *really* small.


  

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.



I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.


  



The trouble is that JSON is defined to be specifically Unicode, and in 
practice for us that means UTF8 on the server side.  It could get a bit 
hairy, and it's definitely not something I think you can wave away with 
a simple "I'll just throw some encoding/decoding function calls at it."


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 wrote:
> I'm wondering whether the internal representation of JSON should be
> plain JSON text, or some binary code that's easier to traverse and
> whatnot.  For the sake of code size, just keeping it in text is
> probably best.

+1 for text.

> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>
> It should be built-in, rather than relying on an external library
> (like XML does).

Why?  I'm not saying you aren't right, but you need to make an
argument rather than an assertion.  This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.

> As far as character encodings, I'd rather keep that out of the JSON
> parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
> I'll just throw encode/decode/validate operations at it.

I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 4:12 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane  wrote:
>>> joinremoval.c ?
>
>> Maybe, except as I mentioned in the email linked upthread, my plan for
>> implementing inner join removal would also include allowing join
>> reordering in cases where we currently don't.  So I don't want to
>> sandbox it too tightly as join removal, per se, though that's
>> certainly what we have on the table ATM.  It's more like advanced
>> open-heart join-tree surgery - like prepjointree, but much later in
>> the process.
>
> Hm.  At this point we're not really working with a join *tree* in any
> case --- the data structure we're mostly concerned with is the list of
> SpecialJoinInfo structs, and what we're trying to do is weaken the
> constraints described by that list.  So I'd rather stay away from "tree"
> terminology.
>
> planjoins.c would fit with other names in the plan/ directory but it
> seems like a misnomer because we're not really "planning" any joins
> at this stage.
>
> adjustjoins.c?  loosenjoins.c?  weakenjoins.c?

How about analyzejoins.c?  Loosen and weaken don't seem like quite the
right idea; adjust is a little generic and perhaps overused, but not
bad.  If you don't like analyzejoins then go with adjustjoins.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams

I introduced myself in the thread "Proposal: access control jails (and
introduction as aspiring GSoC student)", and we discussed jails and
session-local variables.  But, as Robert Haas suggested, implementing
variable support in the backend would probably be way too ambitious a
project for a newbie like me.  I decided instead to pursue the task of
adding JSON support to PostgreSQL, hence the new thread.

I plan to reference datatype-xml.html and functions-xml.html in some
design decisions, but there are some things that apply to XML that
don't apply to JSON and vice versa.  For instance, jsoncomment
wouldn't make sense because (standard) JSON doesn't have comments.
For access, we might have something like json_get('foo[1].bar') and
json_set('foo[1].bar', 'hello').  jsonforest and jsonagg would be
beautiful.  For mapping, jsonforest/jsonagg could be used to build a
JSON string from a result set (SELECT jsonagg(jsonforest(col1, col2,
...)) FROM tbl), but I'm not sure on the best way to go the other way
around (generate a result set from JSON).  CSS-style selectors would
be cool, but "selecting" is what SQL is all about, and I'm not sure
having a json_select("dom-element[key=value]") function is a good,
orthogonal approach.

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot.  For the sake of code size, just keeping it in text is
probably best.

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does).  Priorities of the JSON implementation, in descending
order, are:

* Small
* Correct
* Fast

Moreover, JSON operations shall not crash due to stack overflows.

I'm thinking Bison/Flex is overkill for parsing JSON (I haven't seen
any JSON implementations out there that use it anyway).  I would
probably end up writing the JSON parser/serializer manually.  It
should not take more than a week.

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.

Thoughts?  Thanks.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Alpha release this week?

2010-03-28 Thread Josh Berkus

All,

We've got two locations and some individuals signed up for a test-fest
this weekend.  Would it be possible to do an alpha release this week?
It would really help to be testing later code than Alpha4.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane

Robert Haas  writes:
> On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane  wrote:
>> joinremoval.c ?

> Maybe, except as I mentioned in the email linked upthread, my plan for
> implementing inner join removal would also include allowing join
> reordering in cases where we currently don't.  So I don't want to
> sandbox it too tightly as join removal, per se, though that's
> certainly what we have on the table ATM.  It's more like advanced
> open-heart join-tree surgery - like prepjointree, but much later in
> the process.

Hm.  At this point we're not really working with a join *tree* in any
case --- the data structure we're mostly concerned with is the list of
SpecialJoinInfo structs, and what we're trying to do is weaken the
constraints described by that list.  So I'd rather stay away from "tree"
terminology.

planjoins.c would fit with other names in the plan/ directory but it
seems like a misnomer because we're not really "planning" any joins
at this stage.

adjustjoins.c?  loosenjoins.c?  weakenjoins.c?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Mar 28, 2010 at 2:04 PM, Tom Lane  wrote:
>>> * in a new file in plan/.  Not sure if it's worth this, though your
>>> thought that we might add more logic later makes it more defensible.
>
>> I sort of like the last of these ideas though I'm at a loss for what
>> to call it.  Otherwise I kind of like planmain.c.
>
> joinremoval.c ?

Maybe, except as I mentioned in the email linked upthread, my plan for
implementing inner join removal would also include allowing join
reordering in cases where we currently don't.  So I don't want to
sandbox it too tightly as join removal, per se, though that's
certainly what we have on the table ATM.  It's more like advanced
open-heart join-tree surgery - like prepjointree, but much later in
the process.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane

Robert Haas  writes:
> On Sun, Mar 28, 2010 at 2:04 PM, Tom Lane  wrote:
>> * in a new file in plan/.  Not sure if it's worth this, though your
>> thought that we might add more logic later makes it more defensible.

> I sort of like the last of these ideas though I'm at a loss for what
> to call it.  Otherwise I kind of like planmain.c.

joinremoval.c ?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 2:04 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Mar 28, 2010 at 12:19 AM, Tom Lane  wrote:
>>> * I left join_is_removable where it was, mainly so that it was easy to
>>> compare how much it changed for this usage (not a lot).  I'm not sure
>>> that joinpath.c is an appropriate place for it anymore, though I can't
>>> see any obviously better place either.  Any thoughts on that?
>
>> I dislike the idea of leaving it in joinpath.c.  I don't even think it
>> properly belongs in the path subdirectory since it no longer has
>> anything to do with paths.  Also worth thinking about where we would
>> put the logic I pontificated about here:
>> http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php
>
> The only argument I can see for leaving it where it is is that it
> depends on clause_sides_match_join, which we'd have to either duplicate
> or global-ize in order to continue sharing that code.  However, since
> join_is_removable now needs a slightly different API for that anyway
> (cf changes in draft patch), it's probably better to not try to share it.
> So let's put the join removal code somewhere else.  The reasonable
> alternatives seem to be:
>
> * in a new file in prep/.  Although this clearly has the flavor of
> preprocessing, all the other work in prep/ is done before we get into
> query_planner().  So this choice seems a bit dubious.
>
> * directly in plan/planmain.c.  Has the advantage of being where the
> caller is, so no globally visible function declaration needed.  No other
> redeeming social value though.
>
> * in plan/initsplan.c.  Somewhat reasonable, although that file is
> rather large already.
>
> * in a new file in plan/.  Not sure if it's worth this, though your
> thought that we might add more logic later makes it more defensible.

I sort of like the last of these ideas though I'm at a loss for what
to call it.  Otherwise I kind of like planmain.c.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane

Robert Haas  writes:
> On Sun, Mar 28, 2010 at 12:19 AM, Tom Lane  wrote:
>> * I left join_is_removable where it was, mainly so that it was easy to
>> compare how much it changed for this usage (not a lot).  I'm not sure
>> that joinpath.c is an appropriate place for it anymore, though I can't
>> see any obviously better place either.  Any thoughts on that?

> I dislike the idea of leaving it in joinpath.c.  I don't even think it
> properly belongs in the path subdirectory since it no longer has
> anything to do with paths.  Also worth thinking about where we would
> put the logic I pontificated about here:
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php

The only argument I can see for leaving it where it is is that it
depends on clause_sides_match_join, which we'd have to either duplicate
or global-ize in order to continue sharing that code.  However, since
join_is_removable now needs a slightly different API for that anyway
(cf changes in draft patch), it's probably better to not try to share it.
So let's put the join removal code somewhere else.  The reasonable
alternatives seem to be:

* in a new file in prep/.  Although this clearly has the flavor of
preprocessing, all the other work in prep/ is done before we get into
query_planner().  So this choice seems a bit dubious.

* directly in plan/planmain.c.  Has the advantage of being where the
caller is, so no globally visible function declaration needed.  No other
redeeming social value though.

* in plan/initsplan.c.  Somewhat reasonable, although that file is
rather large already.

* in a new file in plan/.  Not sure if it's worth this, though your
thought that we might add more logic later makes it more defensible.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane

I wrote:
> Robert Haas  writes:
>> I'm alarmed by your follow-on statement that the current code can't
>> handle the two-levels of removable join case.  Seems like it ought to
>> form {B C} as a path over {B} and then {A B C} as a path over {A}.

> Actually I think it ought to form {A B} as a no-op join and then be able
> to join {A B} to {C} as a no-op join.  It won't recognize joining A to
> {B C} as a no-op because the RHS isn't a baserel.  But yeah, I was quite
> surprised at the failure too.  We should take the time to understand why
> it's failing before we go further.

OK, I traced through it, and the reason HEAD fails on this example is
that it *doesn't* recognize {A B} as a feasible no-op join, for
precisely the reason that it sees some B vars marked as being needed for
the not-yet-done {B C} join.  So that path is blocked, and the other
possible path to the desired result is also blocked because it won't
consider {B C} as a valid RHS for a removable join.

I don't see any practical way to escape the false-attr_needed problem
given the current code structure.  We could maybe hack our way to a
solution by weakening the restriction against the RHS being a join,
eg by noting that the best path for the RHS is a no-op join and then
drilling down to the one baserel.  But it seems pretty ugly.

So I think the conclusion is clear: we should consign the current
join-removal code to the dustbin and pursue the preprocessing way
instead.  Will work on it today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More idle thoughts

2010-03-28 Thread Tom Lane

Simon Riggs  writes:
> On Fri, 2010-03-26 at 18:59 +, Greg Stark wrote:
>> It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
>> conditionally having it call a function which calls gettimeofday and
>> compares with the previous timestamp received at the last CFI().

> Reducing latency sounds good, but what has CFI got to do with that?

It took me about five minutes to figure out what Greg was on about too.
His point is that we need to locate code paths in which an extremely
long time can pass between successive CFI calls, because that means
the backend will fail to respond to SIGINT/SIGTERM for a long time.
Instrumenting CFI itself is a possible tool for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane

Simon Riggs  writes:
> Does the new patch find more than two levels of join removal?

Well, I'd assume if it can do two nested levels then it should work for
any number, but I plead guilty to not having actually tested that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More idle thoughts

2010-03-28 Thread Simon Riggs

On Fri, 2010-03-26 at 18:59 +, Greg Stark wrote:

> The Linux kernel had a big push to reduce latency, and one of the
> tricks they did was they replaced the usual interrupt points with a
> call which noted how long it had been since the last interrupt point.
> It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
> conditionally having it call a function which calls gettimeofday and
> compares with the previous timestamp received at the last CFI().

Reducing latency sounds good, but what has CFI got to do with that?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Simon Riggs

On Sun, 2010-03-28 at 02:15 -0400, Tom Lane wrote:
> I wrote:
> > [ crude patch ]
> 
> Oh, btw, if you try to run the regression test additions in that patch
> against CVS HEAD, you'll find out that HEAD actually fails to optimize
> the two-levels-of-removable-joins case.  Seems like another reason to
> press ahead with making the change.

Yes, please.

Does the new patch find more than two levels of join removal?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane

Robert Haas  writes:
> I'm alarmed by your follow-on statement that the current code can't
> handle the two-levels of removable join case.  Seems like it ought to
> form {B C} as a path over {B} and then {A B C} as a path over {A}.

Actually I think it ought to form {A B} as a no-op join and then be able
to join {A B} to {C} as a no-op join.  It won't recognize joining A to
{B C} as a no-op because the RHS isn't a baserel.  But yeah, I was quite
surprised at the failure too.  We should take the time to understand why
it's failing before we go further.  I ran out of steam last night but
will have a look into that today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas

On Sun, Mar 28, 2010 at 12:19 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sat, Mar 27, 2010 at 4:11 PM, Tom Lane  wrote:
>>> I'm not seeing how that would occur or would matter, but the worst case
>>> answer is to restart the scan of the SpecialJoinInfos from scratch any
>>> time you succeed in doing a join removal.
>
>> Well, say you have something like
>
>> SELECT 1 FROM A LEFT JOIN (B LEFT JOIN C ON Pbc) ON Pab
>
>> I think that the SpecialJoinInfo structure for the join between B and
>> C will match the criteria I articulated upthread, but the one for the
>> join between A and {B C} will not.  If C had not been in the query
>> from the begining then we'd have had:
>
>> SELECT 1 FROM A LEFT JOIN B ON Pab
>
>> ...under which circumstances the SpecialJoinInfo would match the
>> aforementioned criteria.
>
> I experimented with this and found that you're correct: the tests on the
> different SpecialJoinInfos do interact, which I hadn't believed
> initially.  The reason for this is that when we find out we can remove a
> particular rel, we have to remove the bits for it in other relations'
> attr_needed bitmaps.  In the above example, we first discover we can
> remove C.  Whatever B vars were used in Pbc will have an attr_needed
> set of {B,C}, and that C bit will prevent us from deciding that B can
> be removed when we are examining the upper SpecialJoinInfo (which will
> not consider C to be part of either min_lefthand or min_righthand).
> So we have to remove the C bits when we remove C.
>
> Attached is an extremely quick-and-dirty, inadequately commented draft
> patch that does it along the lines you are suggesting.  This was just to
> see if I could get it to work at all; it's not meant for application in
> anything like its current state.  However, I feel a very strong
> temptation to finish it up and apply it before we enter beta.  As you
> noted, this way is a lot cheaper than the original coding, whether one
> focuses on the cost of failing cases or the cost when the optimization
> is successful.  And if we hold it off till 9.1, then any bug fixes that
> have to be made in the area later will need to be made against two
> significantly different implementations, which will be a real PITA.
>
> Things that would need to be cleaned up:
>
> * I left join_is_removable where it was, mainly so that it was easy to
> compare how much it changed for this usage (not a lot).  I'm not sure
> that joinpath.c is an appropriate place for it anymore, though I can't
> see any obviously better place either.  Any thoughts on that?

I dislike the idea of leaving it in joinpath.c.  I don't even think it
properly belongs in the path subdirectory since it no longer has
anything to do with paths.  Also worth thinking about where we would
put the logic I pontificated about here:

http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php

> * The removed relation has to be taken out of the set of baserels
> somehow, else for example the Assert in make_one_rel will fail.
> The current hack is to change its reloptkind to RELOPT_OTHER_MEMBER_REL,
> which I think is a bit unclean.  We could try deleting it from the
> simple_rel_array altogether, but I'm worried that that could result in
> dangling-pointer failures, since we're probably not going to go to the
> trouble of removing every single reference to the rel from the planner
> data structures.  A possible compromise is to invent another reloptkind
> value that is only used for "dead" relations.

+1 for dead relation type.

> * It would be good to not count the removed relation in
> root->total_table_pages.  If we made either of the changes suggested
> above then we could move the calculation of total_table_pages down to
> after remove_useless_joins and ignore the removed relation(s)
> appropriately.  Otherwise I'm tempted to just subtract off the relation
> size from total_table_pages on-the-fly when we remove it.

Sounds good.

> * I'm not sure yet about the adjustment of PlaceHolder bitmaps --- we
> might need to break fix_placeholder_eval_levels into two steps to get
> it right.

Not familiar enough with that code to comment.

> * Still need to reverse out the now-dead code from the original patch,
> in particular the NoOpPath support.

Yeah.

> Thoughts?

I'm alarmed by your follow-on statement that the current code can't
handle the two-levels of removable join case.  Seems like it ought to
form {B C} as a path over {B} and then {A B C} as a path over {A}.
Given that it doesn't, we already have a fairly serious bug, so we've
either got to put more work into the old implementation or switch to
this new one - and I think at this point you and I are both fairly
convinced that this is a better way going forward.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-03-28 Thread Simon Riggs

On Sat, 2010-03-27 at 22:39 +, Greg Stark wrote:
> On Sat, Mar 27, 2010 at 7:36 PM, Simon Riggs  wrote:
> > On Sat, 2010-03-27 at 19:15 +, Greg Stark wrote:
> > > If we're pruning an index entry to a heap tuple that has been HOT
> > > pruned wouldn't the HOT pruning record have already conflicted with
> > > any queries that could see  it?
> >
> > Quite probably, but a query that started after that record arrived might
> > slip through. We have to treat each WAL record separately.
> 
> Slip through? I'm not following you.

No, there is no possibility for it to slip through, you're right. (After
much thinking).

> > Do you agree with the conjecture? That LP_DEAD items can be ignored
> > because their xid would have been earlier than the latest LP_NORMAL
> > tuple we find? (on any page).
> >
> > Or is a slightly less strong condition true: we can ignore LP_DEAD items
> > on a page that is also referenced by an LP_NORMAL item.
> 
> I don't like having dependencies on the precise logic in vacuum rather
> than only on the guarantees that vacuum provides. We want to improve
> the logic in vacuum and hot pruning to cover more cases and that will
> be harder if there's code elsewhere depending on its simple-minded xid
> <= globalxmin test.

Agreed

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

38 matches

Mail list logo