Re: [HACKERS] On Scalability

2010-10-08 Thread Vincenzo Romano
2010/10/7 Simon Riggs si...@2ndquadrant.com:
 On Thu, 2010-10-07 at 14:10 +0200, Vincenzo Romano wrote:

 Making these things sub-linear (whether not O(log n) or even O(1) ),
 provided that there's  way to, would make this RDBMS more appealing
 to enterprises.
 I mean also partial indexes (as an alternative to table partitioning).
 Being able to effectively cope with a dozen child tables or so it's more
 like an amateur feature.
 If you really need partitioning (or just hierarchical stuff) I think you'll 
 need
 for quite more than a dozen items.
 If you partition by just weeks, you'll need 50+ a year.

 Is there any precise direction to where look into the code for it?

 Is there a way to put this into a wish list?

 It's already on the wish list (TODO) and has been for many years.

 We've mostly lacked somebody with the experience and time/funding to
 complete that implementation work. I figure I'll be doing it for 9.2
 now; it may be difficult to do this for next release.

 Theoretically, this can be O(n.log n) for range partitioning and O(1)
 for exact value partitioning, though the latter isn't a frequent use
 case.

O(n*log n) is what I would expect from a good algorithm.

 Your conclusion that the current partitioning only works with a dozen or
 so items doesn't match the experience of current users however.

People on the mailing lists says so.
I think I'm forced now to plan for tests on our side, despite this is
not what I'd
like to do with the most advanced open source database.
I'll publish the results on my blog, anyway.

 --
  Simon Riggs           www.2ndQuadrant.com
  PostgreSQL Development, 24x7 Support, Training and Services





-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-08 Thread Vincenzo Romano
2010/10/7 Simon Riggs si...@2ndquadrant.com:
 On Thu, 2010-10-07 at 14:10 +0200, Vincenzo Romano wrote:

 Making these things sub-linear (whether not O(log n) or even O(1) ),
 provided that there's  way to, would make this RDBMS more appealing
 to enterprises.
 I mean also partial indexes (as an alternative to table partitioning).
 Being able to effectively cope with a dozen child tables or so it's more
 like an amateur feature.
 If you really need partitioning (or just hierarchical stuff) I think you'll 
 need
 for quite more than a dozen items.
 If you partition by just weeks, you'll need 50+ a year.

 Is there any precise direction to where look into the code for it?

 Is there a way to put this into a wish list?

 It's already on the wish list (TODO) and has been for many years.

 We've mostly lacked somebody with the experience and time/funding to
 complete that implementation work. I figure I'll be doing it for 9.2
 now; it may be difficult to do this for next release.

 Theoretically, this can be O(n.log n) for range partitioning and O(1)
 for exact value partitioning, though the latter isn't a frequent use
 case.

 Your conclusion that the current partitioning only works with a dozen or
 so items doesn't match the experience of current users however.

 --
  Simon Riggs           www.2ndQuadrant.com
  PostgreSQL Development, 24x7 Support, Training and Services


Do the same conclusions apply to partial indexes?
I mean, if I have a large number (n=100 or n=1000) of partial indexes
on a single very large table (m=10**12), how good is the planner to choose the
right indexes to plan a query?
Has also this algorithm superlinear complexity?


-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
Any feedbacks from TGL and Heikki, then?

2010/7/29 Joshua D. Drake j...@commandprompt.com:
 On Thu, 2010-07-29 at 19:52 +0200, Vincenzo Romano wrote:
 2010/7/29 Joshua D. Drake j...@commandprompt.com:
  On Thu, 2010-07-29 at 19:34 +0200, Vincenzo Romano wrote:
 
  I expect that a more complex schema will imply higher workloads
  on the query planner. What I don't know is how the increase in the
  workload will happen: linearly, sublinearly, polynomially or what?

 Do you think I should ask somewhere else?
 Any hint?

 The two people that would likely know the best are on vacation, TGL and
 Heikki. You may have to wait a bit.

 Sincerely,

 Joshua D. Drake

 --
 PostgreSQL.org Major Contributor
 Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
 Consulting, Training, Support, Custom Development, Engineering
 http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt





-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 07.10.2010 10:41, Simon Riggs wrote:

 On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote:

 constraint exclusion of CHECK constraints on the partitions is
 exponential

 Constraint exclusion is linear with respect to number of partitions.
 Why do you say exponential?

 For some reason I thought the planner needs to check the constraints of the
 partitions against each other, but you're right, clearly that's not the
 case. Linear it is.

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


Making these things sub-linear (whether not O(log n) or even O(1) ),
provided that there's  way to, would make this RDBMS more appealing
to enterprises.
I mean also partial indexes (as an alternative to table partitioning).
Being able to effectively cope with a dozen child tables or so it's more
like an amateur feature.
If you really need partitioning (or just hierarchical stuff) I think you'll need
for quite more than a dozen items.
If you partition by just weeks, you'll need 50+ a year.

Is there any precise direction to where look into the code for it?

Is there a way to put this into a wish list?

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Robert Haas robertmh...@gmail.com:
 Well, you can't just arbitrarily turn a O(n) algorithm into an O(lg n)

That's trivially true. I was not asking for the recipe to do it.

 algorithm.  I think the most promising approach to scaling to large
 numbers of partitions is the patch that Itagaki Takahiro was working
 on back in July.  Unfortunately, that patch still needs a lot of work
 - and some redesign - before it will really meet our needs.  Right
 now, the way to set up partitioning is to create a parent table and
 then create a bunch of child tables that inherit from them and then
 put mutually exclusive CHECK constraints on all the children and make
 sure constraint_exclusion is on so that the planner can notice when
 not all children need to be scanned.  As a totally general
 architecture, this is probably hard to beat (or to make sublinear).

This is exactly what's described into the official documentation.
Everyone I ask information about before going deeper in test I get
the same answer: don't try to use more than a dozen child tables.

 However, if we have DDL that allows the user to say: this is a set of
 child tables that are range partitions on this key column, with these
 boundaries, then you should be able to make the constraint exclusion
 calculations much more efficient, because it won't have to infer so
 much from first principles.  O(lg n) doesn't seem out of the question
 given that architecture.

I see the main problem in the way the planner understands which partition
is useful and which one is not.
Having the DDL supporting the feature could just be syntactic sugar
if the underlying mechanism is inadequate.

 I think, though, that that is still some way off.  If you're in a
 position to help with (or fund) the coding, it can be made to happen
 faster, of course.

This is why I was asking for directions: brwosing the whole code to look for the
relevant stuff is quite time consuming.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Tom Lane t...@sss.pgh.pa.us:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 07.10.2010 10:41, Simon Riggs wrote:
 Constraint exclusion is linear with respect to number of partitions.
 Why do you say exponential?

 For some reason I thought the planner needs to check the constraints of
 the partitions against each other, but you're right, clearly that's not
 the case. Linear it is.

 Well, it's really more like O(mn) where m is the number of partitions
 and n is the number of clauses in the query --- and not only that, but
 the O() notation is hiding a depressingly high constant factor.  And
 then there are practical problems like failing to exclude partitions as
 soon as there are any parameters in the query.


Does the same considerations apply to partial indexes?
I mean, I can replace table partitioning with index partitioning concept.
(Well I know it's not really the same).
Would then it be the same O(nm) to let the planner choose the right indexes
given a certain query?

 There's basically no way that we're going to get decent performance for
 large numbers of partitions as long as we have to resort to
 theorem-proving to lead us to the correct partition.

                        regards, tom lane


I'm not sure about MySQL, but Oracle can handle large partitioning.
So I would say there's a way to achieve the same goal.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 I see the main problem in the way the planner understands which partition
 is useful and which one is not.
 Having the DDL supporting the feature could just be syntactic sugar
 if the underlying mechanism is inadequate.

 I'm pretty sure the point with the DDL would be to have a way for the
 user to communicate to the planner more understanding about the
 partitioning, not just to be syntactic sugar.  With that additional
 information, the planner can make a faster and better decision.

        Stephen

Which kind of information are you thinking about?
I think that the stuff you put into the CHECK condition for the table
will say it all.
Infact there you have not just the column names with relevant values, but the
actual expression(s) to be checked,

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Greg Smith g...@2ndquadrant.com:
 Vincenzo Romano wrote:

 I see the main problem in the way the planner understands which
 partition
 is useful and which one is not.
 Having the DDL supporting the feature could just be syntactic sugar
 if the underlying mechanism is inadequate.


 You have the order of this backwards.  In order to do better than the way
 the current scheme is implemented, the optimizer needs higher quality
 metadata about the structure of the partitions to work with.  Right now,
 it's inferring them from the CHECK constraints, which requires the whole
 theorem-proving bit Tom mentioned.  That's never going to get any more
 algorithmically efficient than it already is.
 If the DDL that created the partitions also made better quality metadata
 available about the structure of the partitions, at that point it would be
 possible to also do better in how the optimizer pruned partitions to
 consider too.  If the list it has was known to be in a particular
 structured/sorted order, the optimizer could do a binary search to find
 relevant partitions, rather than the linear scan required right now.


Do you mean the check constraint is used as plain text to be (somehow) executed?
If this is the case, then you (all) are perfectly and obviously right
and I'm just fishing
for bicycles in the sea.

I would expect a parser to ... ehm ... parse the CHECK constraint
expression at CREATE TABLE  time and
extract all the needed high quality metadata, like the list of
columns involved and the type of
checks (range, value list, etc.).
The same would be useful for partial indexes, as well.

But maybe this is just wishful thinking.

 Until that work is done, any other improvement attempts are doomed to fail.
  That's the point Robert was trying to make to you.  And the fact Oracle
 does this is why it's able to scale to high partition counts better than
 PostgreSQL can.

 You can read more about the work that was being done here at
 http://wiki.postgresql.org/wiki/Table_partitioning

Done. As well as the official documentation.
The point is that there are no hints on the topic.
There should be a caveat in the documentation saying that partitioning
is not scalable. As well as partial indexing.

Thanks so far for the information.

 --
 Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
 PostgreSQL Training, Services and Support  www.2ndQuadrant.us

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Alvaro Herrera alvhe...@commandprompt.com:
 Excerpts from Vincenzo Romano's message of jue oct 07 10:44:34 -0400 2010:

 Do you mean the check constraint is used as plain text to be (somehow) 
 executed?
 If this is the case, then you (all) are perfectly and obviously right
 and I'm just fishing
 for bicycles in the sea.

 Yeah, hence this thread hasn't advanced things very much in any useful
 direction.  That we need to improve the partitioning implementation is
 already known.

Maybe I'm willing to help and possibly able to.
But I need to understand things that are already known but I didn't know yet.

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 Which kind of information are you thinking about?
 I think that the stuff you put into the CHECK condition for the table
 will say it all.

 The problem is that CHECK conditions can contain just about anything,
 hence the planner needs to deal with that possibility.

Not really. For partitioning there would be some constraints as you
have in the DEFAULT values.

 Infact there you have not just the column names with relevant values, but the
 actual expression(s) to be checked,

 Yes, that would be the problem.  Proving something based on expressions
 is alot more time consuming and complicated than being explicitly told
 what goes where.

Consuming computing resources at DDL-time should be OK if that will
lead to big savings at DML-time (run-time), my opinion. It'd be just like
compile time optimizations.

        Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iEYEARECAAYFAkyt3qMACgkQrzgMPqB3kiih3wCcCwLlvpDCjgG5LSgim/XGieEE
 MsEAn0mHfAizDOpvepGXWTWlxHtJibA5
 =Szx4
 -END PGP SIGNATURE-


-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 2010/10/7 Stephen Frost sfr...@snowman.net:
  * Vincenzo Romano (vincenzo.rom...@notorand
.it) wrote:
  The problem is that CHECK conditions can contain just about anything,
  hence the planner needs to deal with that possibility.

 Not really. For partitioning there would be some constraints as you
 have in the DEFAULT values.

 How do we know when it's partitioning and not a CHECK constraint being
 used for something else..?

Why asking? You don't need to tell them apart.
Just parse the expression, extract the metadata to be used when the expression
need to be evaluated. Being it a plain CHECK constraint or something
for the partition
management would then be irrelevant.

 I'll tell you- through the user using
 specific partitioning DDL statements.

That could be the next step, once  the underlying stuff is already in place.

 Consuming computing resources at DDL-time should be OK if that will
 lead to big savings at DML-time (run-time), my opinion. It'd be just like
 compile time optimizations.

 CHECK constraints, inheiritance, etc, are general things which can be
 used for more than just partitioning.  Abusing them to go through tons
 of extra gyrations to make the specific partitioning case faster at DML
 time (if that's really even possible...  I'm not convinced you could
 make it bullet-proof) isn't a good approach.

At the moment I'm not interested in particular cases.
I think that CHECK constraints (as well as partial indexes expressions) should
be handled in a more effective way. Better partitioning (both for
tables and indexes) would
be a side effect.

Thanks for the insights.

        Thanks,

                Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iEYEARECAAYFAkyt5CAACgkQrzgMPqB3kijAUACfd9QcB00Nic6mSwWmwoXABc4p
 kBoAnAijF39ZTFOGjpk1CN/8/I3Tj9HI
 =C8G/
 -END PGP SIGNATURE-


-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 I would expect a parser to ... ehm ... parse the CHECK constraint
 expression at CREATE TABLE  time and
 extract all the needed high quality metadata, like the list of
 columns involved and the type of
 checks (range, value list, etc.).

 Check constraints can be added after the table is created.  Inheiritance
 can be added/changed independently of check constraints.  Hacking all of
 the inheiritance, check constraint creation, and any other possibly
 involved code paths to try to figure out if this particular table, check
 constraint, inheiritance relationship, etc, is part of a partitioning
 setup isn't exactly trivial, or the right approach.

        Thanks,

                Stephen

I think none will say things are trivial.
So, what'd be the right approach in your vision?
I mean, if you think about partitioning a-la Oracle, then you'll have to
parse those expressions anyway.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-10-07 Thread Vincenzo Romano
2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 So, what'd be the right approach in your vision?

 Have you read http://wiki.postgresql.org/wiki/Table_partitioning and the
 various places it links to..?

 I mean, if you think about partitioning a-la Oracle, then you'll have to
 parse those expressions anyway.

 Oracle's approach is discussed there.

I didn't meant the implementation, but the goals achieved.


        Thanks,

                Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iEYEARECAAYFAkyt6ikACgkQrzgMPqB3kih0HwCcD8rQQhD6oXao8ZnG/bMAvx2d
 4HkAnjrzox4XemzVyFkhKRXb3ZjS2nba
 =6WlP
 -END PGP SIGNATURE-





-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-07-30 Thread Vincenzo Romano
2010/7/29 Josh Berkus j...@agliodbs.com:

 Or maybe checking against the source code and its documentation, if any.

 No, not really.  What you really want to know is: what's the real
 planner overhead of having dozens/hundreds of partial indexes?  What's
 the write overhead?  There's no way you can derive that from the source
 code faster than you can test it.

Again, as the test would be rather killing for my group at this stage.

I think that knowing whether certain parts have been implemented
with linear or sub-linear (or whatever else) algorithms would
give good insights about scalability.

At a first glance it seems that for inheritance some bottleneck is
hindering a full exploit for table partitioning.

Is there anyone who knows whether those algorithms are linear or not?

And of course, I agree that real tests on real data will provide the real thing.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-07-30 Thread Vincenzo Romano
2010/7/30 Greg Stark gsst...@mit.edu:
 On Fri, Jul 30, 2010 at 11:24 AM, Vincenzo Romano
 vincenzo.rom...@notorand.it wrote:
 At a first glance it seems that for inheritance some bottleneck is
 hindering a full exploit for table partitioning.

 There have been lengthy discussions of how to implement partitioning
 to fix these precise problems, yes.

Any reference?

 Is there anyone who knows whether those algorithms are linear or not?

 They're linear in both cases. But they happen at plan time rather than
 query execution time. So if your application prepares all its queries
 and then uses them many times it would not slow down query execution
 but would slow down the query planning time. In some applications this
 is much better but in others unpredictable run-times is as bad as long
 run-times.

Hmmm ... maybe I'm missing the inner meaning of your remarks, Greg.
By using PREPARE I run the query planned sooner and I should use
the plan with the later execution.
You can bet that some of the PREPAREd query variables will
pertain to either the child table's CHECK contraints (for table partitions)
or to the partial index's WHERE condition (for index partitioning).

It's exactly this point (execution time) where the linearity will
kill the query
over a largely partitioned table.

Is this what you meant?  :-)

 Also in the case of having many partial indexes it would slow down
 inserts and updates as well, though to a lesser degree, and that would
 happen at execution time.

This makes fully sense to me.


-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-07-30 Thread Vincenzo Romano
2010/7/30 Josh Berkus j...@agliodbs.com:

 Is there anyone who knows whether those algorithms are linear or not?

 Read the code?  It's really very accessible, and there's lots and lots
 of comments.  While the person who wrote the code is around, isn't it
 better to see the real implementation?

If the programmer(s) who wrote that part is around, a simple hint would suffice.
Even an hint to where look into the code would be very appreciated: the query
planner is not as simple as the ls command (which is not that simple any
more, though).

It looks like I need to go the hard way ...
Starting from postgresql-8.4.4/src/backend/optimizer

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] On Scalability

2010-07-29 Thread Vincenzo Romano
Hi all.
I'm wondering about PGSQL scalability.
In particular I have two main topics in my mind:

1. What'd be the behavior of the query planner in the case I have
a single huge table with hundreds or thousands of partial indexes
(just differing by the WHERE clause).
This is an idea of mine to make index-partitioning instead of
table-partitioning.

2. What'd be the behavior of the query planner in the case I have
hundreds or thousands of child tables, possibly in a multilevel hierarchy
(let's say, partitioning by year, month and company).

I fear the presence of linear selection algorithms in these two cases that
would kill my design.

Is there any insight about these two points?

-- 
NotOrAnd Information Technologies
Vincenzo Romano
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-07-29 Thread Vincenzo Romano
2010/7/29 Joshua D. Drake j...@commandprompt.com:
 On Thu, 2010-07-29 at 19:08 +0200, Vincenzo Romano wrote:
 Hi all.
 I'm wondering about PGSQL scalability.
 In particular I have two main topics in my mind:

 1. What'd be the behavior of the query planner in the case I have
 a single huge table with hundreds or thousands of partial indexes
 (just differing by the WHERE clause).
 This is an idea of mine to make index-partitioning instead of
 table-partitioning.

 Well the planner is not going to care about the partial indexes that
 don't match the where clause but what you are suggesting is going to
 make writes and maintenance extremely expensive. It will also increase
 planning time as the optimizer at a minimum has to discard the use of
 those indexes.


 2. What'd be the behavior of the query planner in the case I have
 hundreds or thousands of child tables, possibly in a multilevel hierarchy
 (let's say, partitioning by year, month and company).

 Again, test it. Generally speaking the number of child tables directly
 correlates to planning time. Most experience that 60-100 tables is
 really the highest you can go.

 It all depends on actual implementation and business requirements
 however.

 Sincerely,

 Joshua D. Drake

I expect that a more complex schema will imply higher workloads
on the query planner. What I don't know is how the increase in the
workload will happen: linearly, sublinearly, polinomially or what?

Significant testing would require a prototype implementation with
an almost complete feed of data from the current solution.
But I'm at the feasibility study stage and have not enough resources
for that.

Thanks anyway for the insights, Joshua.
Does the 60-100 tables limit applies to a single level
of inheritance? Or is it more general?

-- 
NotOrAnd Information Technologies
Vincenzo Romano
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-07-29 Thread Vincenzo Romano
2010/7/29 Joshua D. Drake j...@commandprompt.com:
 On Thu, 2010-07-29 at 19:34 +0200, Vincenzo Romano wrote:

 I expect that a more complex schema will imply higher workloads
 on the query planner. What I don't know is how the increase in the
 workload will happen: linearly, sublinearly, polynomially or what?

Do you think I should ask somewhere else?
Any hint?

 Thanks anyway for the insights, Joshua.
 Does the 60-100 tables limit applies to a single level
 of inheritance? Or is it more general?

 I do not currently have experience (except that it is possible) with
 multi-level inheritance and postgresql.

Thanks anyway.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Scalability

2010-07-29 Thread Vincenzo Romano
2010/7/29 Josh Berkus j...@agliodbs.com:

 Do you think I should ask somewhere else?
 Any hint?

 I might suggest asking on the pgsql-performance mailing list instead.
 You'll get *lots* more speculation there.  However, the only way you're
 really going to know is to test.

Or maybe checking against the source code and its documentation, if any.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PL/PgSQL] EXECUTE...USING enhancement proposal

2010-01-15 Thread Vincenzo Romano
2010/1/14 Pavel Stehule pavel.steh...@gmail.com:
 Hello

 I thing, so this is bad idea.

 a) this behave depends on DDL implementation, not plpgsql implementation

 b) proposed implementation needs some escape magic. This was first
 implementation of USING clause and it was rejected. Some composite and
 nested values are significant break.

 see in archive 
 http://archives.postgresql.org/pgsql-patches/2006-08/msg00267.php

 Regards
 Pavel Stehule


 2010/1/14 Vincenzo Romano vincenzo.rom...@notorand.it:
 Hi all.
 There's currently a limitation in the v8.4.2 implementation of the
 EXECUTE...USING predicate in PL/PgSQL which prevents you from
 exploiting the USING-supplied value list with DDL commands.
 For example:

 CREATE TABLE test ( i int );
 ...
 EXECUTE 'ALTER TABLE test ALTER COLUMN i SET DEFAULT $1' USING 42;

 complains with:

 ERROR:  there is no parameter $1
 CONTEXT:  SQL statement ALTER TABLE test ALTER COLUMN i SET DEFAULT $1

 while:

 EXECUTE 'SELECT $1' USING 42;

 works.
 In both cases the $1 variable/placeholder refers to a constant value.
 And actually, even if the thing defined after the USING lexeme was a
 variable, that should be evaluated and substituted *before* executing
 the command.

 The current documentation
 (http://www.postgresql.org/docs/8.4/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN)
 doesn't say so and clearly describes how this feature is meant to
 work.
 Quoting:
 
 The command string can use parameter values, which are referenced in
 the command as $1, $2,
 etc. These symbols refer to values supplied in the USING clause. This
 method is often preferable to
 inserting data values into the command string as text: it avoids
 run-time overhead of converting the
 values to text and back, and it is much less prone to SQL-injection
 attacks since there is no need for
 quoting or escaping. An example is:
 
 (38.5.4. Executing Dynamic Commands)

 It talks about values, that is typed constants.
 Please, refer also to the following discussion on pgsql-general mailing list:
 http://archives.postgresql.org/pgsql-general/2010-01/msg00522.php

 My proposal is to relax that (clearly running but undocumented)
 constraint and allow any SQL command in the EXECUTE...USING predicate.
 I would leave the responsibility to the programmer to ensure whether
 the dynamic command makes any syntactic and semantic sense.

 --
 Vincenzo Romano
 NotOrAnd Information Technologies
 NON QVIETIS MARIBVS NAVTA PERITVS

Well, the very basic proposal could then be changed into:

Allow some kind of syntactic replacement of the placeholders found
into the command string with values taken from the USING clause
evaluated straight into the PLPGSQL function body environment. The
model could be the C language sprintf().

Maybe you can think about using different placeholders for static
(or local) evaluation, like #1, #2 ... #n.

For example, you could do something like this:

EXECUTE 'ALTER TABLE '||tablename||' ADD CHECK col1=#1 and col2=#2
and col2#3 USING currval1, currval2, currval3.

The execution, within the PLPGSQL interpreter, would proceed like this:

0. Concatenate the sub-strings to just 1.
1. Evaluate the variable list after the USING clause (currval1,
currval2, currval3) to their current values.
2. Replace the placeholders with the natural ordered references within
the command string
3. Send the final string to the execution.

This makes a lot of sense (in my opinion) for higher level functions
(functions which create functions which execute dynamic commands).
It's more like a string substitution but with knowledge of the syntac
of the expressions following the USING clause.

-- 
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PL/PgSQL] EXECUTE...USING enhancement proposal

2010-01-15 Thread Vincenzo Romano
2010/1/15 Pavel Stehule pavel.steh...@gmail.com:
 I disagree with this functionality for USING clause. Main parser
 doesn't support some enhanced syntax. But we can discus about some
 function 'printf' or 'format' that can help with similar task.

 some like

 EXECUTE format('ALTER TABLE % ADD CHECK (col1=% AND col2=%)',
                              quote_ident(tablename), cval1, cval2, cval3);


 there was two proposals:
 a) based on syntax our RAISE statements
 b) based on printf syntax

 Personally prefer a) - it is simpler and enough

 Pavel



I do like the printf-like approach more than my proposal!
Do you think about a built-in implementation rather than the on in PLGSQL?

-- 
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PL/PgSQL] EXECUTE...USING enhancement proposal

2010-01-14 Thread Vincenzo Romano
Hi all.
There's currently a limitation in the v8.4.2 implementation of the
EXECUTE...USING predicate in PL/PgSQL which prevents you from
exploiting the USING-supplied value list with DDL commands.
For example:

CREATE TABLE test ( i int );
...
EXECUTE 'ALTER TABLE test ALTER COLUMN i SET DEFAULT $1' USING 42;

complains with:

ERROR:  there is no parameter $1
CONTEXT:  SQL statement ALTER TABLE test ALTER COLUMN i SET DEFAULT $1

while:

EXECUTE 'SELECT $1' USING 42;

works.
In both cases the $1 variable/placeholder refers to a constant value.
And actually, even if the thing defined after the USING lexeme was a
variable, that should be evaluated and substituted *before* executing
the command.

The current documentation
(http://www.postgresql.org/docs/8.4/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN)
doesn't say so and clearly describes how this feature is meant to
work.
Quoting:

The command string can use parameter values, which are referenced in
the command as $1, $2,
etc. These symbols refer to values supplied in the USING clause. This
method is often preferable to
inserting data values into the command string as text: it avoids
run-time overhead of converting the
values to text and back, and it is much less prone to SQL-injection
attacks since there is no need for
quoting or escaping. An example is:

(38.5.4. Executing Dynamic Commands)

It talks about values, that is typed constants.
Please, refer also to the following discussion on pgsql-general mailing list:
http://archives.postgresql.org/pgsql-general/2010-01/msg00522.php

My proposal is to relax that (clearly running but undocumented)
constraint and allow any SQL command in the EXECUTE...USING predicate.
I would leave the responsibility to the programmer to ensure whether
the dynamic command makes any syntactic and semantic sense.

-- 
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers