Re: [SQL] grouping/clustering query
At 11:28 AM 10/23/2008, Joe wrote:
Steve Midgley wrote:
# (invoiceid, txid)
(A, 1)
(A, 3)
(B, 1)
(B, 2)
(C, 5)
(D, 6)
(D, 7)
(E, 8)
(F, 8)
For journalling, I need to group/cluster this together. Is there a
SQL
query that can generate this output:
# (journal: invoiceids, txids)
[A,B] , [1,2,3]
[C], [5]
[D], [6,7]
[E,F], [8]
Hi Dave,
I'm not following the logic here. A has 1,3 and B has 1,2. So why
does the first line print:
[A,B] , [1,2,3]
What's the rule that tells the query to output this way? Is it that
all of B's values are between A's values?
From a purely accounting standpoint, since transaction 1 was applied
to both invoices A and B, you need to group the invoices so that you
can compare total invoiced against total paid.
I tinkered around briefly but didn't come up with a good idea, but I
bet someone on this list can. However, I did create a CREATE script for
your table design which, in my experience, makes it more likely that a
real expert will take on your problem..
Hope this helps,
Steve
DROP TABLE IF EXISTS trans;
CREATE TABLE trans
(
id serial NOT NULL,
inv_id character varying,
tx_id character varying,
CONSTRAINT pk_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
insert into trans (inv_id, tx_id) values('A','1');
insert into trans (inv_id, tx_id) values('A','3');
insert into trans (inv_id, tx_id) values('B','1');
insert into trans (inv_id, tx_id) values('B','2');
insert into trans (inv_id, tx_id) values('C','5');
insert into trans (inv_id, tx_id) values('D','6');
insert into trans (inv_id, tx_id) values('D','7');
insert into trans (inv_id, tx_id) values('E','8');
insert into trans (inv_id, tx_id) values('F','8');
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] grouping/clustering query
Your script is handy, Steve.
Spontaneously, This seems to be an array type problem, something I just have
vague notions about.
I'll take a look at this,
http://www.postgresql.org/docs/8.3/static/arrays.html to see if something
occurs...
Best,
Oliveiros
- Original Message -
From: "Steve Midgley" <[EMAIL PROTECTED]>
To: "Joe" <[EMAIL PROTECTED]>
Cc: ; "David Garamond" <[EMAIL PROTECTED]>
Sent: Friday, October 24, 2008 4:04 PM
Subject: Re: [SQL] grouping/clustering query
At 11:28 AM 10/23/2008, Joe wrote:
Steve Midgley wrote:
# (invoiceid, txid)
(A, 1)
(A, 3)
(B, 1)
(B, 2)
(C, 5)
(D, 6)
(D, 7)
(E, 8)
(F, 8)
For journalling, I need to group/cluster this together. Is there a SQL
query that can generate this output:
# (journal: invoiceids, txids)
[A,B] , [1,2,3]
[C], [5]
[D], [6,7]
[E,F], [8]
Hi Dave,
I'm not following the logic here. A has 1,3 and B has 1,2. So why does
the first line print:
[A,B] , [1,2,3]
What's the rule that tells the query to output this way? Is it that all
of B's values are between A's values?
From a purely accounting standpoint, since transaction 1 was applied to
both invoices A and B, you need to group the invoices so that you can
compare total invoiced against total paid.
I tinkered around briefly but didn't come up with a good idea, but I bet
someone on this list can. However, I did create a CREATE script for your
table design which, in my experience, makes it more likely that a real
expert will take on your problem..
Hope this helps,
Steve
DROP TABLE IF EXISTS trans;
CREATE TABLE trans
(
id serial NOT NULL,
inv_id character varying,
tx_id character varying,
CONSTRAINT pk_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
insert into trans (inv_id, tx_id) values('A','1');
insert into trans (inv_id, tx_id) values('A','3');
insert into trans (inv_id, tx_id) values('B','1');
insert into trans (inv_id, tx_id) values('B','2');
insert into trans (inv_id, tx_id) values('C','5');
insert into trans (inv_id, tx_id) values('D','6');
insert into trans (inv_id, tx_id) values('D','7');
insert into trans (inv_id, tx_id) values('E','8');
insert into trans (inv_id, tx_id) values('F','8');
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] grouping/clustering query
On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <[EMAIL PROTECTED]> wrote:
> At 11:28 AM 10/23/2008, Joe wrote:
>>
>> Steve Midgley wrote:
# (invoiceid, txid)
(A, 1)
(A, 3)
(B, 1)
(B, 2)
(C, 5)
(D, 6)
(D, 7)
(E, 8)
(F, 8)
For journalling, I need to group/cluster this together. Is there a SQL
query that can generate this output:
# (journal: invoiceids, txids)
[A,B] , [1,2,3]
[C], [5]
[D], [6,7]
[E,F], [8]
>>>
>>> Hi Dave,
>>>
>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>> the first line print:
>>>
[A,B] , [1,2,3]
>>>
>>> What's the rule that tells the query to output this way? Is it that all
>>> of B's values are between A's values?
>>
>> From a purely accounting standpoint, since transaction 1 was applied to
>> both invoices A and B, you need to group the invoices so that you can
>> compare total invoiced against total paid.
>
> I tinkered around briefly but didn't come up with a good idea, but I bet
> someone on this list can. However, I did create a CREATE script for your
> table design which, in my experience, makes it more likely that a real
> expert will take on your problem..
>
> Hope this helps,
>
> Steve
>
> DROP TABLE IF EXISTS trans;
>
> CREATE TABLE trans
> (
> id serial NOT NULL,
> inv_id character varying,
> tx_id character varying,
> CONSTRAINT pk_id PRIMARY KEY (id)
> )
> WITH (OIDS=FALSE);
>
> insert into trans (inv_id, tx_id) values('A','1');
> insert into trans (inv_id, tx_id) values('A','3');
> insert into trans (inv_id, tx_id) values('B','1');
> insert into trans (inv_id, tx_id) values('B','2');
> insert into trans (inv_id, tx_id) values('C','5');
> insert into trans (inv_id, tx_id) values('D','6');
> insert into trans (inv_id, tx_id) values('D','7');
> insert into trans (inv_id, tx_id) values('E','8');
> insert into trans (inv_id, tx_id) values('F','8');
This is as close as I can get the data. I think I'd need a custom
array grouping aggregate to get the results to match completely.
Notice how ABC are on their own lines?
test=# SELECT inv_array, tx_array
FROM (
SELECT tx_id, array_accum(inv_id) AS inv_array
FROM trans
GROUP BY tx_id
ORDER BY tx_id
) AS t
JOIN (
SELECT inv_id, array_accum(tx_id) AS tx_array
FROM trans
GROUP BY inv_id
ORDER BY inv_id
) AS i ON (t.tx_id = ANY(i.tx_array) OR i.inv_id =ANY(t.inv_array))
GROUP BY tx_array,inv_array
;
inv_array | tx_array
---+--
{A,B} | {1,2}
{B} | {1,2}
{A} | {1,3}
{A,B} | {1,3}
{C} | {5}
{D} | {6,7}
{E,F} | {8}
(7 rows)
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] grouping/clustering query
On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <[EMAIL PROTECTED]> wrote:
>> At 11:28 AM 10/23/2008, Joe wrote:
>>>
>>> Steve Midgley wrote:
>
> # (invoiceid, txid)
> (A, 1)
> (A, 3)
> (B, 1)
> (B, 2)
> (C, 5)
> (D, 6)
> (D, 7)
> (E, 8)
> (F, 8)
>
> For journalling, I need to group/cluster this together. Is there a SQL
> query that can generate this output:
>
> # (journal: invoiceids, txids)
> [A,B] , [1,2,3]
> [C], [5]
> [D], [6,7]
> [E,F], [8]
Hi Dave,
I'm not following the logic here. A has 1,3 and B has 1,2. So why does
the first line print:
> [A,B] , [1,2,3]
What's the rule that tells the query to output this way? Is it that all
of B's values are between A's values?
>>>
>>> From a purely accounting standpoint, since transaction 1 was applied to
>>> both invoices A and B, you need to group the invoices so that you can
>>> compare total invoiced against total paid.
>>
>> I tinkered around briefly but didn't come up with a good idea, but I bet
>> someone on this list can. However, I did create a CREATE script for your
>> table design which, in my experience, makes it more likely that a real
>> expert will take on your problem..
>>
>> Hope this helps,
>>
>> Steve
>>
>> DROP TABLE IF EXISTS trans;
>>
>> CREATE TABLE trans
>> (
>> id serial NOT NULL,
>> inv_id character varying,
>> tx_id character varying,
>> CONSTRAINT pk_id PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>>
>> insert into trans (inv_id, tx_id) values('A','1');
>> insert into trans (inv_id, tx_id) values('A','3');
>> insert into trans (inv_id, tx_id) values('B','1');
>> insert into trans (inv_id, tx_id) values('B','2');
>> insert into trans (inv_id, tx_id) values('C','5');
>> insert into trans (inv_id, tx_id) values('D','6');
>> insert into trans (inv_id, tx_id) values('D','7');
>> insert into trans (inv_id, tx_id) values('E','8');
>> insert into trans (inv_id, tx_id) values('F','8');
Here's a stab at a custom aggregate attempting to explain what's going
on inside.
CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
VARCHAR) RETURNS VARCHAR[] AS $$
DECLARE
res VARCHAR[];
BEGIN
RAISE NOTICE 'input state is %',x;
RAISE NOTICE 'input variable is %',y;
IF x = '{}' THEN
RAISE NOTICE 'x is empty, returning input variable %',y;
res[1] := y;
ELSE
RAISE NOTICE 'input array is not empty, checking if input
variable is a member %',y;
res := x;
IF y = ANY(res) THEN
RAISE NOTICE 'y is already in array %, skipping',res;
ELSE
res := array_append(res, y);
RAISE NOTICE 'appending input variable %',y;
END IF;
END IF;
RETURN res;
END
$$ LANGUAGE plpgsql STRICT;
DROP AGGREGATE array_accum_unique(VARCHAR);
CREATE AGGREGATE array_accum_unique
(
basetype = VARCHAR
, sfunc = varchar_array_accum_unique
, stype = VARCHAR[]
, initcond = '{}'
);
SELECT array_accum_unique(inv_id) AS invoiceids
, array_accum_unique(tx_id) AS transactionids
FROM (
SELECT tx_id, inv_id
FROM trans
WHERE inv_id IN (
SELECT inv_id
FROM trans
WHERE id IN (
SELECT id FROM trans
WHERE tx_id=1
)
)
ORDER BY tx_id, inv_id
) AS ss
;
Returns this result for transaction id 1.
invoiceids | transactionids
+
{A,B} | {1,2,3}
(1 row)
Hope this helps!
Tony
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] grouping/clustering query
Tony, Joe, Steve,
Thanks for the follow-ups. Yes, the problem is related to double-entry
accounting, where one needs to balance total debit and credit
(payments and invoices) in each journal/transaction.
Due to time constraint, I ended up doing this in the client-side
programming language, since I am nowhere near fluent in PLs. The
algorithm should be simple (at least the "brute force" version), it's
basically checking if each element of the pair (txid, invoiceid) is
already mentioned in some journal and if it is, add the pair to the
journal, otherwise create a new journal with that pair as the first
entry. I believe this can easily be implemented in a PL. But still I
wonder if there is some SQL incantation that can do the same without
any PL.
Regards,
dave
On Sat, Oct 25, 2008 at 3:48 AM, Tony Wasson <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <[EMAIL PROTECTED]> wrote:
>> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <[EMAIL PROTECTED]> wrote:
>>> At 11:28 AM 10/23/2008, Joe wrote:
Steve Midgley wrote:
>>
>> # (invoiceid, txid)
>> (A, 1)
>> (A, 3)
>> (B, 1)
>> (B, 2)
>> (C, 5)
>> (D, 6)
>> (D, 7)
>> (E, 8)
>> (F, 8)
>>
>> For journalling, I need to group/cluster this together. Is there a SQL
>> query that can generate this output:
>>
>> # (journal: invoiceids, txids)
>> [A,B] , [1,2,3]
>> [C], [5]
>> [D], [6,7]
>> [E,F], [8]
>
> Hi Dave,
>
> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
> the first line print:
>
>> [A,B] , [1,2,3]
>
> What's the rule that tells the query to output this way? Is it that all
> of B's values are between A's values?
From a purely accounting standpoint, since transaction 1 was applied to
both invoices A and B, you need to group the invoices so that you can
compare total invoiced against total paid.
>>>
>>> I tinkered around briefly but didn't come up with a good idea, but I bet
>>> someone on this list can. However, I did create a CREATE script for your
>>> table design which, in my experience, makes it more likely that a real
>>> expert will take on your problem..
>>>
>>> Hope this helps,
>>>
>>> Steve
>>>
>>> DROP TABLE IF EXISTS trans;
>>>
>>> CREATE TABLE trans
>>> (
>>> id serial NOT NULL,
>>> inv_id character varying,
>>> tx_id character varying,
>>> CONSTRAINT pk_id PRIMARY KEY (id)
>>> )
>>> WITH (OIDS=FALSE);
>>>
>>> insert into trans (inv_id, tx_id) values('A','1');
>>> insert into trans (inv_id, tx_id) values('A','3');
>>> insert into trans (inv_id, tx_id) values('B','1');
>>> insert into trans (inv_id, tx_id) values('B','2');
>>> insert into trans (inv_id, tx_id) values('C','5');
>>> insert into trans (inv_id, tx_id) values('D','6');
>>> insert into trans (inv_id, tx_id) values('D','7');
>>> insert into trans (inv_id, tx_id) values('E','8');
>>> insert into trans (inv_id, tx_id) values('F','8');
>
>
> Here's a stab at a custom aggregate attempting to explain what's going
> on inside.
>
> CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
> VARCHAR) RETURNS VARCHAR[] AS $$
> DECLARE
> res VARCHAR[];
> BEGIN
> RAISE NOTICE 'input state is %',x;
> RAISE NOTICE 'input variable is %',y;
> IF x = '{}' THEN
>RAISE NOTICE 'x is empty, returning input variable %',y;
>res[1] := y;
> ELSE
>RAISE NOTICE 'input array is not empty, checking if input
> variable is a member %',y;
>res := x;
>IF y = ANY(res) THEN
>RAISE NOTICE 'y is already in array %, skipping',res;
>ELSE
>res := array_append(res, y);
>RAISE NOTICE 'appending input variable %',y;
>END IF;
> END IF;
> RETURN res;
> END
> $$ LANGUAGE plpgsql STRICT;
>
> DROP AGGREGATE array_accum_unique(VARCHAR);
> CREATE AGGREGATE array_accum_unique
> (
>basetype = VARCHAR
>, sfunc = varchar_array_accum_unique
>, stype = VARCHAR[]
>, initcond = '{}'
> );
>
> SELECT array_accum_unique(inv_id) AS invoiceids
> , array_accum_unique(tx_id) AS transactionids
> FROM (
>SELECT tx_id, inv_id
>FROM trans
>WHERE inv_id IN (
>SELECT inv_id
>FROM trans
>WHERE id IN (
>SELECT id FROM trans
> WHERE tx_id=1
>)
>)
>ORDER BY tx_id, inv_id
> ) AS ss
> ;
>
>
> Returns this result for transaction id 1.
>
> invoiceids | transactionids
> +
> {A,B} | {1,2,3}
> (1 row)
>
> Hope this helps!
> Tony
>
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Truncate on pg_dump / pg_restore
Dear Postgres Gurus; Is there a way to truncate a table, at pg_dump time? I'm aware of various ways to exclude a table from a dump (>= 8.2), or to selectively pg_restore. What I'm seeking here is different. I've got tables with pretty disposable data... meaning I want to drop the data... but restore empty indexed tables at pg_restore time. Doable? -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
