crosstab/repivot...any interest?

Merlin Moncure Mon, 25 Feb 2019 17:35:39 -0800

On Fri, Jan 25, 2019 at 9:14 PM Morris de Oryx <morrisdeo...@gmail.com>
wrote:
>
> Hello, I'm not a C coder and can't help....but I love cross-tab/pivot
tables. They're the best, and just fantastic for preparing data to feed
into various analysis tools. The tablefunc module is helpful, but a bit
awkward to use (as I remember it.)
>
> From a user's point of view, I high-performance cross-tab generator would
be just fantastic.
>
> As I understand it, this is what's involved in a pivot:
>
> 1. Identify rows that should be grouped (consolidated.)
> 2. Distinguish the value that identifies each derived column.
> 3. Distinguish the value that identifies each row-column value.
> 4. Collapse the rows, build the columns, and populate the 'cells' with
data.
>
> In an ideal world, you would be able to perform different grouping
operations. Such as count, sum, avg, etc.
>
> If there's a way to do this in a system-wide and standards-pointing way,
so much the better.
>
> Apologies if I'm violating list etiquette by jumping in here. I've been
lurking on several Postgres lists for a bit and picking up interesting
details every day. If I've been Unintentionally and Cluelessly Off, I'm
find with being told.


No worries, sir! Apologies on the late reply.  I've made some headway on
this item.  Waiting for postgres to implement the SQL standard pivoting
(regardless if it implements the cases I need) is not an option for my
personal case. I can't use the SQL approach either as it's very slow and
imposing some scaling limits that need to work around in the short run.

My strategy is to borrow [steal] from crosstab_hash and make a new version
called repivot which takes an arleady pivoted data set and repivots it
against an identified column.   Most of the code can be shared with
tablefunc so ideally this could live as an amendment to that extension.
That might not happen though, so I'm going to package it as a separate
extension (removing the majority of tablefunc that I don't need) and submit
it to this group for consideration.

If we punt, it'll end up as a private extension or live the life of an
Orphan in pgxn.  If there's some interest here, we can consider a new
contrib extension (which I personally rate very unlikely) or recasting as
an extra routine to tablefunc.  Any way we slice it, huge thanks to Joe
Conway&co for giving us such an awesome function to work with all these
years (not to mention the strategic plr language).  SRF crosstab() is still
somewhat baroque, but it still fills a niche that nothing else implements.

The interface I'm looking at is:
SELECT repivot(
  query TEXT,
  static_attributes INT,  /* number of static attributes that are unchanged
around key; we need this in our usages */
  attribute_query  TEXT); /* query that draws up the pivoted attribute list
*/

The driving query is expected to return 0+ static attributes which are
essentially simply pasted to the output. The next two columns are the key
column and the pivoting column.   So if you had three attributes, the input
set would be:

a1, a2, a3, k1, p, v1...vn

Where the coordinates v and p would exchange.  I need to get this done
quickly and so am trying to avoid more abstracted designs (maybe multi part
keys should be supported through...this is big limitation of crosstab
albeit with some obnoxious work arounds).

merlin

crosstab/repivot...any interest?

Reply via email to