Re: Proposal: Partitioning Advisor for PostgreSQL

2018-06-12 Thread Julien Rouhaud
Hi,

On Tue, Jun 12, 2018 at 11:14 AM, Dilip Kumar  wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya 
> wrote:
>>
>> This is
>> why we are working on partitioning advisor.  We plan to release the first
>> version
>> of partitioning advisor for PostgreSQL 11, and then, improve it for
>> PostgreSQL 12.
>>
>
> Interesting.

Thanks!

>> - Estimating stats
>> It is complicated because hypothetical partition has no data.  Currently,
>> we compute
>> hypothetical partition's size using clauselist_selectivity() according to
>> their partition
>> bound and original table stats.  As a result, estimate is done with low
>> accuracy,
>> especially if there is WHERE clause.  We will improve during developing,
>> but for now,
>> we don't have good ideas.
>
>
> I haven't yet read the patch but curious to know.  Suppose we have table
> which is already loaded with some data.  Now, if I create  hypothetical
> partitions on that will we create any stat data (mcv, histogram) for
> hypothetical table? because, in this case we already have the data from the
> main table and we also have partition boundary for the hypothetical table.
> I am not sure you are already doing this or its an open item?


For now we're simply using the original table statistics, and
appending the partition bounds as qual on the hypothetical partition.
It'll give good result if the query doesn't have quals for the table,
or for simple cases where selectivity functions understand that
expressions such as

(id BETWEEN 1 AND 100) AND (id < 6)

will return only 5 rows, while they can't for expressions like

(id IN (x,y...)) AND (id < z)

In this second case, the estimates are for now therefore quite wrong.
I think that we'd have no other choice than to generate hypothetical
statistics according to the partition bounds, and only compute
selectivity based on the query quals.  It's definitely not simple to
do, but it should be doable with the hooks currently available.



Re: Proposal: Partitioning Advisor for PostgreSQL

2018-06-12 Thread Dilip Kumar
On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya 
wrote:

> Hello,
>
> I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.
>
> I have been developing partitioning advisor prototype with Julien Rouhaud.
> It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
> will help partitioning design tuning.  Currently, HypoPG only supports
> index
> design tuning; it allows users to define hypothetical indexes for real
> tables and
> shows resulting queries' plan/cost with EXPLAIN as if they were actually
> constructed.
> Since declarative partitioning will be greatly improved in PostgreSQL 11
> and further
> versions, there are emerging needs to support partitioning design tuning.
> This is
> why we are working on partitioning advisor.  We plan to release the first
> version
> of partitioning advisor for PostgreSQL 11, and then, improve it for
> PostgreSQL 12.
>
>
Interesting.


>
> - Estimating stats
> It is complicated because hypothetical partition has no data.  Currently,
> we compute
> hypothetical partition's size using clauselist_selectivity() according to
> their partition
> bound and original table stats.  As a result, estimate is done with low
> accuracy,
> especially if there is WHERE clause.  We will improve during developing,
> but for now,
> we don't have good ideas.
>

I haven't yet read the patch but curious to know.  Suppose we have table
which is already loaded with some data.  Now, if I create  hypothetical
partitions on that will we create any stat data (mcv, histogram) for
hypothetical
table? because, in this case we already have the data from the main table
and we also have partition boundary for the hypothetical table.  I am not
sure you are already doing this or its an open item?

>
>
> [1] https://github.com/HypoPG/hypopg
>
> Best regards,
> 
> Yuzuko Hosoya
> NTT Open Source Software Center
>
>
>
>


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Proposal: Partitioning Advisor for PostgreSQL

2018-06-12 Thread Ashutosh Bapat
On Tue, Jun 12, 2018 at 12:21 AM, Julien Rouhaud  wrote:
>
> I both like and dislike this idea.  The good thing is that it's way
> less hacky than what we did in our prototype, and it's also working
> out of the box.  However, the problem I have with this approach is
> that the generated plans will be quite different from real
> partitioning,  The main features such as partition pruning or
> partition-wise join will probably work, but you'll always have a
> ForeignScan as the primary path and I think that it'll drastically
> limit the planner and the usability.

AFAIR, there is a hook using which we can change the EXPLAIN output,
so we could change the ForeignScan label. But I don't remember that
hook top of my head and a brief look at Explain code didn't reveal
anything. May be there isn't any hook. We may be able add one in that
case or use CustomScan or something like that. I agree that seeing a
ForeignScan in the plan is not a good thing.

Anyway, the work involved in my proposal may not be worth the utility
we get out of this extension, so may not be worth pursuing it further.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Proposal: Partitioning Advisor for PostgreSQL

2018-06-11 Thread Julien Rouhaud
Hi Ashutosh,

Thanks for answering!  And I'm very sorry for the time I needed to reply

On Wed, May 30, 2018 at 5:44 PM, Ashutosh Bapat
 wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya
>>
>> However, PostgreSQL isn't designed to have hypothetical tables,
>
> I agree. But there are light-weight tables like foreign tables, views
> and partitioned tables themselves. These kinds of tables do not have
> any storage associated with them. We could implement semi-hypothetical
> partitioned table using these three. The reason I say it's
> semi-hypothetical since we will need to create some real objects, but
> which do not take actual storage. The idea is to create partitioned
> table with foreign table partitions which point to views simulating
> partitions. The steps are
> 1. Create views one per partition which select data from the
> unpartitioned table that would fall in a partition simulated by that
> view. So something like SELECT * FROM unpartitioned_table WHERE
> partition constraint for that partition.
> 2. Create partitioned table
> 3. Create foreign table partitions that point to the views created in
> the first step.
> 4. ANALYZE the foreign tables and the partitioned table
>
> Now if we EXPLAIN the query on unpartitioned table by redirecting it
> to the partitioned table, we would get the EXPLAIN plans as if the
> query is running on the partitioned table. We will need to zero out
> the FDW costs, so that the cost of accessing foreign table comes out
> to be same as accessing a local table. That's mostly setting the right
> FDW GUCs.
>
> Since we are creating and dropping some real objects, may be we want
> to create temporary objects (we don't have support to create temporary
> foreign tables for now, but may be that's desirable feature) or create
> them in a different database to reduce catalog bloat. Similarly we
> won't be able to create indexes on the foreign table, but may be we
> could simulate those using hypothetical indexes feature.
>
> This method doesn't need any core changes which are useful only for
> this extension. Supporting temporary foreign table and declarative
> indexes on foreign tables may be seen as separate features and
> acceptable in the community.

I both like and dislike this idea.  The good thing is that it's way
less hacky than what we did in our prototype, and it's also working
out of the box.  However, the problem I have with this approach is
that the generated plans will be quite different from real
partitioning,  The main features such as partition pruning or
partition-wise join will probably work, but you'll always have a
ForeignScan as the primary path and I think that it'll drastically
limit the planner and the usability.

I'm also not a fan of doing core changes for a single extension
purpose only, but I think that many of the blockers could be solved
with only slight changes in the core code (for instance, don't use a
Relation as a function parameter just to get the underlying
PartitionKey, but directly pass the PartitionKey on top level).  For
the rest, I'm not sure yet of what exactly would need to be changed
(the partitioning code moved quite a lot lately, and it's hard to stay
up to date), and if such changes could also be useful for other
purpose.



Re: Proposal: Partitioning Advisor for PostgreSQL

2018-05-30 Thread Ashutosh Bapat
On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya
 wrote:
> Hello,
>
> I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.
>
> I have been developing partitioning advisor prototype with Julien Rouhaud.
> It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
> will help partitioning design tuning.  Currently, HypoPG only supports index
> design tuning; it allows users to define hypothetical indexes for real tables 
> and
> shows resulting queries' plan/cost with EXPLAIN as if they were actually 
> constructed.
> Since declarative partitioning will be greatly improved in PostgreSQL 11 and 
> further
> versions, there are emerging needs to support partitioning design tuning. 
> This is
> why we are working on partitioning advisor.  We plan to release the first 
> version
> of partitioning advisor for PostgreSQL 11, and then, improve it for 
> PostgreSQL 12.

+10 for this feature. Finding that the partitioning doesn't work as
expected after implementing is rather painful since the partitioning
and unpartitioning is time consuming right now.

>
>
> Overview of partitioning advisor
> ---
> - Partitioning advisor allows users to define multiple hypothetical 
> partitioning
>   schemes on real tables and real data
> - PostgreSQL can show resulting queries' plan/cost with EXPLAIN using 
> hypothetical
>   partitioning schemes
> Users can quickly check how their queries would behave if some tables were
> partitioned, and try different partitioning schemes (for instance, to 
> optimize some
> queries efficiency Vs. maintenance efficiency).
>
>
> Partitioning advisor works as follows:
>
> Usage
> -
> 0. Consider this target table, t1
> #= CREATE TABLE t1 (a int, b text);
> #= INSERT INTO t1 SELECT i, 'test' FROM generate_series(1,299) i ;
> #= EXPLAIN SELECT * FROM t1;
>   QUERY PLAN
>-
>Seq Scan on t1  (cost=0.00..4.99 rows=299 width=9)
>(1 row)
>
> 1. Partition the target table hypothetically
> #= SELECT * FROM hypopg_partition_table('t1','partition by range(a)');
>The hypopg_partition_table() defines hypothetical range partitioned table 
> 't1'
>by the partition key 'a' and stores these information into backend local 
> memory.
>
> 2. Create hypothetical partitions
> #= SELECT * FROM hypopg_add_partition('t1_1','partition of t1 for values 
> from (1) to (100)');
> #= SELECT * FROM hypopg_add_partition('t1_2','partition of t1 for values 
> from (100) to (300)');
>The hypopg_add_partition() defines hypothetical partitions t1_1 and t1_2 
> according
>to their bounds 'from (1) to (100)' and 'from (100) to (300)' 
> respectively, and stores
>these information into backend local memory.
>
> 3. PostgreSQL can show resulting queries' plan/cost with EXPLAIN
> #= EXPLAIN SELECT * FROM t1;
> QUERY PLAN
>---
>Append  (cost=0.00..7.49 rows=299 width=9)
>  ->  Seq Scan on t1 t1_1  (cost=0.00..1.99 rows=99 width=9)
>  ->  Seq Scan on t1 t1_2  (cost=0.00..4.00 rows=200 width=9)
>(3 rows)
>   PostgreSQL retrieves hypothetical partitioning schemes from HypoPG.
>   And then if the referred table is defined as hypothetical partitioned table,
>   PostgreSQL creates plans using them.
>
> This is a simple example. In addition, it enables us to simulate 
> range/list/hash
> partitioning, partition pruning, N-way join and partition-wise 
> join/aggregation.
> It is already helpful for users to design partitioning schemes.
>
>
> Current implementation
> 
> We mainly use get_relation_info_hook().  What we do in this hook is to inject
> hypothetical partitioning schemes according to user definition.  At first, we 
> do
> all processes that are done at expand_inherited_tables().  Specifically, we
> expand root->simple_rte_array and root->parse->rtable, rewrite target
> table's RangeTblEntry as a partitioned table, and create RangeTblEntries and
> AppendRelInfos for all hypothetical partitions.  Besides that, we set 
> hypothetical
> partition's name into rte->alias->aliasname at this time to display 
> hypothetical
> partition's name with EXPLAIN.  And then, we rewrite RelOptInfo as needed.
> Specifically, we add partition information, which is set at 
> set_relation_partition_info(),
> to hypothetical partitioned tables, and set rel->tuples and rel->pages for
> hypothetical partitions.
>
>
> However, PostgreSQL isn't designed to have hypothetical tables,

I agree. But there are light-weight tables like foreign tables, views
and partitioned tables themselves. These kinds of tables do not have
any storage associated with them. We could implement semi-hypothetical
partitioned table using these three. The reason I say it's
semi-hypothetical since we will need to create some real objects, but
which do not take actual