[PERFORM] max partitions behind a view?

2017-09-18 Thread Rick Otten
I use materialized views to cache results from a foreign data wrapper to a
high latency, fairly large (cloud) Hadoop instance.  In order to boost
refresh times I split the FDW and materialized views up into partitions.

Note:  I can't use pg_partman or native partitioning because those don't
really work with this architecture - they are designed for "real" tables.
I can't really use citus because it isn't FDW/matview aware at this time
either.

I then join the various materialized views together with a regular view
made up of a bunch of 'union all' statements.

I have a set of functions which automatically create the new partitions and
then replace the top level view to add them in on the fly.  At this time I
probably have about 60 partitions.

With that approach I can refresh individual chunks of data, or I can
refresh several chunks in parallel.  Generally this has been working pretty
well.  One side effect is that because this is not a real partition, the
planner does have to check each partition whenever I run a query to see if
it has the data I need.  With appropriate indexes, this is ok, checking the
partitions that don't have the data is very quick.  It does make for some
long explain outputs though.

The challenge is that because of an exponential rate of data growth, I
might have to significantly increase the number of partitions I'm working
with - to several hundred at a minimum and potentially more than 1000...

This leads me to the question how many 'union all' statements can I have in
one view?   Should I create a hierarchy of views to gradually roll the data
up instead of putting them all in one top-level view?


Re: [PERFORM] max partitions behind a view?

2017-09-18 Thread Tom Lane
Rick Otten  writes:
> The challenge is that because of an exponential rate of data growth, I
> might have to significantly increase the number of partitions I'm working
> with - to several hundred at a minimum and potentially more than 1000...

> This leads me to the question how many 'union all' statements can I have in
> one view?

I don't think there's a hard limit short of INT32_MAX or so, but I'd be
worried about whether there are any O(N^2) algorithms that would start
to be noticeable at the O(1000) level.

> Should I create a hierarchy of views to gradually roll the data
> up instead of putting them all in one top-level view?

That would likely make things worse not better; the planner would flatten
them anyway and would expend extra cycles doing so.  You could perhaps
stop the flattening with optimization fences (OFFSET 0) but I really doubt
you want the side-effects of that.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pageinspect bt_metap help

2017-09-18 Thread Neto pr
Very interesting information.
See if I'm right, so for performance purposes, would it be better to
consider the columns: fast_root and fast_level instead of the root and
level columns?

I have read that even deleting records the B-tree tree is not rebuilt, so
it does not cause overhead in dbms, and can have null pointers.

In my example, the values ​​of fast_root, fast_root are equal to root,
level, I believe that due to the newly created index and no delete
operations occurred in the table.

Best Regards
Neto

2017-09-17 18:59 GMT-03:00 Peter Geoghegan :

> On Sun, Sep 17, 2017 at 2:52 PM, Neto pr  wrote:
> > I am using Postgresql extension pageinspect.
> >
> > Could someone tell me the meaning of these columns: magic, version, root,
> > level, fastroot, fastlevel of the bt_metap function.
> >
> > This information is not presents in the documentation.
>
> A magic number distinguishes the meta-page as a B-Tree meta-page. A
> version number is used for each major incompatible revision of the
> B-Tree code (these are very infrequent).
>
> The fast root can differ from the true root following a deletion
> pattern that leaves a "skinny index". The implementation can never
> remove a level, essentially because it's optimized for concurrency,
> though it can have a fast root, to just skip levels. This happens to
> levels that no longer contain any distinguishing information in their
> single internal page.
>
> I imagine that in practice the large majority of B-Trees never have a
> true root that differs from its fast root - you see this with repeated
> large range deletions. Probably nothing to worry about.
>
> > The height of the b-tree (position of node farthest from root to leaf),
> is
> > the column Level?
>
> Yes.
>
> If you want to learn more about the B-Tree code, I suggest that you
> start by looking at the code for contrib/amcheck.
>
> --
> Peter Geoghegan
>


[PERFORM] https://stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is-available-for-postgressql-server

2017-09-18 Thread 園田祥平
Hi experts,

For an academic experiment I need to *restrict the total amount of memory
that is available for a pgSQL server* to compute a given set of queries.

I know that I can do this through postgressql.conffile, where I can adjust
some parameters related with Resource Management.

The problem is that: it's not clear for me--given the several parameters
available on the config file--which is the parameter that I should change.

When I first opened the config file I'm expecting someting like this:
max_server_memmory. Instead I found a lot of: shared_buffers, temp_buffers,
work_mem, and so on...

Given that, I've consulted pgSQL docs. on Resource Consumption
 and
I come up with the shared_buffers as the best candidate for what I'm
looking for: *the parameter that restricts the total amount of memory that
a pgSQL server can use to perform its computation*. But I'm not completely
sure about this.

Can you guys give me some insight about which parameters should I adjust to
restrict the pgSQL server's memory, please?


Re: [PERFORM] Pageinspect bt_metap help

2017-09-18 Thread Peter Geoghegan
On Mon, Sep 18, 2017 at 7:31 AM, Neto pr  wrote:
> In my example, the values of fast_root, fast_root are equal to root, level,
> I believe that due to the newly created index and no delete operations
> occurred in the table.

Fast root and true root will probably never be different, even when
there are many deletions, including page deletions by VACUUM. As I
understand it, the fast root thing is for a fairly rare, though still
important edge case. It's a way of working around the fact that a
B-Tree can never become shorter due to the locking protocols not
allowing it. We can instead just pretend that it's shorter, knowing
that upper levels don't contain useful information.


-- 
Peter Geoghegan


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] https://stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is-available-for-postgressql-server

2017-09-18 Thread George Neuner

On Tue, 19 Sep 2017 00:49:14 +,   wrote:


For an academic experiment I need to *restrict the total amount of memory
that is available for a pgSQL server* to compute a given set of queries.

I know that I can do this through postgressql.conffile, where I can adjust
some parameters related with Resource Management.

The problem is that: it's not clear for me--given the several parameters
available on the config file--which is the parameter that I should change.

When I first opened the config file I'm expecting someting like this:
max_server_memmory. Instead I found a lot of: shared_buffers, 
temp_buffers,

work_mem, and so on...

Given that, I've consulted pgSQL docs. on Resource Consumption


Re: [PERFORM] Pageinspect bt_metap help

2017-09-18 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Sep 18, 2017 at 7:31 AM, Neto pr  wrote:
>> In my example, the values of fast_root, fast_root are equal to root, level,
>> I believe that due to the newly created index and no delete operations
>> occurred in the table.

> Fast root and true root will probably never be different, even when
> there are many deletions, including page deletions by VACUUM. As I
> understand it, the fast root thing is for a fairly rare, though still
> important edge case. It's a way of working around the fact that a
> B-Tree can never become shorter due to the locking protocols not
> allowing it. We can instead just pretend that it's shorter, knowing
> that upper levels don't contain useful information.

My (vague) recollection is that it's actually useful in cases where the
live key-space constantly migrates to the right, so that the original
upper-level key splits would become impossibly unbalanced.  This isn't
all that unusual a situation; consider timestamp keys for instance,
in a table where old data gets flushed regularly.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] https://stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is-available-for-postgressql-server

2017-09-18 Thread Jonathan Rogers
On 09/18/2017 10:44 PM, George Neuner wrote:
> On Tue, 19 Sep 2017 00:49:14 +,   wrote:
> 
>> For an academic experiment I need to *restrict the total amount of memory
>> that is available for a pgSQL server* to compute a given set of queries.
>>
>> I know that I can do this through postgressql.conffile, where I can
>> adjust
>> some parameters related with Resource Management.
>>
>> The problem is that: it's not clear for me--given the several parameters
>> available on the config file--which is the parameter that I should
>> change.
>>
>> When I first opened the config file I'm expecting someting like this:
>> max_server_memmory. Instead I found a lot of: shared_buffers,
>> temp_buffers,
>> work_mem, and so on...
>>
>> Given that, I've consulted pgSQL docs. on Resource Consumption
>> >
>> I come up with the shared_buffers as the best candidate for what I'm
>> looking for: *the parameter that restricts the total amount of memory
>> that
>> a pgSQL server can use to perform its computation*. But I'm not
>> completely
>> sure about this.
>>
>> Can you guys give me some insight about which parameters should I
>> adjust to
>> restrict the pgSQL server's memory, please?
> 
> What you are asking - a way to configure Postgresql to a hard memory
> limit - effectively is impossible.  Shared memory isn't really a hard
> limit on anything - it's just a cache for query results.  You can limit
> how much is available, but there isn't any way to limit how much a
> particular query [worker process] can take.  Then, local [to the worker
> process] work buffers are allocated as needed to perform the joins,
> sorts, groupings, etc. as specified by the query.  For any given query,
> you may be able to explain/analyze your way to a reasonable estimate of
> the maximum allocation, but there isn't any way via configuration to
> actually limit the worker process to that maximum.
> 
> The only way I can think of to impose such limits would be to sandbox
> the processes with ULIMIT.  If you set appropriate limits before
> starting the postmaster process, those limits will apply to every worker
> process it spawns afterwards.   The thing to remember is that limits on
> processes apply individually - e.g., if you say "ulimit -d 50" and
> then start Postgresql, each individual worker process will be able to
> use up to 500MB.  And when you limit the data size or the address space,
> you need to consider and include the shared memory.
> see https://ss64.com/bash/ulimit.html
> 
> If you want to place a global limit on the entire Postgresql "server"
> [i.e. the collection of worker processes], you can limit the user that
> owns the processes (in /etc/security/limits.conf) - which usually is
> "postgres" when Postgresql is run as a service.


The easiest way to impose a limit on the entire Postgres cluster is to
run it in a container using Docker. For example you could use the image
from hub.docker.com and run it with the "--memory" argument.

https://hub.docker.com/_/postgres/
https://docs.docker.com/engine/reference/commandline/run/

-- 
Jonathan Rogers
Socialserve.com by Emphasys Software
jrog...@emphasys-software.com



signature.asc
Description: OpenPGP digital signature