Hi

2015-12-23 21:36 GMT+01:00 Daniel Verite <dan...@manitou-mail.org>:

>    Hi,
>
> Here's an updated patch that replaces sorted arrays by AVL binary trees
> when gathering distinct values for the columns involved in the pivot.
> The change is essential for large resultsets. For instance,
> it allows to process a query like this (10 million rows x 10 columns):
>
> select x,(random()*10)::int, (random()*1000)::int from
>   generate_series(1,10000000) as x
>  \crosstabview
>
> which takes about 30 seconds to run and display on my machine with the
> attached patch. That puts it seemingly in the same ballpark than
> the equivalent test with the server-side crosstab().
>
> With the previous iterations of the patch, this test would never end,
> even with much smaller sets, as the execution time of the 1st step
> grew exponentially with the number of distinct keys.
> The exponential effect starts to be felt at about 10k values on my low-end
> CPU,
> and from there quickly becomes problematic.
>
> As a client-side display feature, processing millions of rows like in
> the query above does not necessarily make sense, it's pushing the
> envelope, but stalling way below 100k rows felt lame, so I'm happy to get
> rid of that limitation.
>
> However, there is another one. The above example does not need or request
> an additional sort step, but if it did, sorting more than 65535 entries in
> the vertical header would error out, because values are shipped as
> parameters to PQexecParams(), which only accepts that much.
> To avoid the problem, when the rows in the output "grid" exceed 2^16 and
> they need to be sorted, the user must  let the sort being driven by ORDER
> BY
> beforehand in the query, knowing that the pivot will keep the original
> ordering intact in the vertical header.
>
> I'm still thinking about extending this based on Pavel's diff for the
> "label" column, so that
>  \crosstabview [+|-]colV[:colSortH] [+|-]colH[:colSortH]
> would mean to use colV/H as grid headers but sort them according
> to colSortV/H.
> I prefer that syntax over adding more parameters, and also I'd like
> to have it work in both V and H directions.
>
> Aside from the AVL trees, there are a few other minor changes in that
> patch:
> - move non-exportable structs from the .h to the .c
> - move code in common.c to respect alphabetical ordering
> - if vertical sort is requested, add explicit check against more than 65535
>   params instead of letting the sort query fail
> - report all failure cases of the sort query
> - rename sortColumns to serverSort and use less the term "columns" in
>   comments and variables.
>
>

I checked this version and it is looking well.

* all regress tests passed
* patch is clean, well documented, well formatted
* no objection related to code
* current limits 65K * 1600 is good enough, I don't see it as limiting

I am looking for next version

Regards

Pavel




>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Reply via email to