[SQL] Problem with ORDER BY and DISTINCT ON
Hi,
I'm a little baffled. I'm trying to generate a SQL statement that
issues a DISTINCT ON using the same values as my ORDER BY statement.
I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm
on Pg 8.2. Here is some SQL to get you started at seeing my problem:
--
drop table if exists property;
create table property
( id serial,
state varchar(255),
search_rate_max decimal(8,2),
data_priority_code varchar(255)
);
SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id
LIMIT 10 OFFSET 0
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
SQL state: 42P10
Now if you run this statement it works
--
SELECT DISTINCT ON
("property"."state",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
property.id
LIMIT 10 OFFSET 0
--
However if you run this statement it ALSO works, which tells me it's
not just my CASE statements that are messing things up (note in this
example, I just removed the primary key "property.id" from the ORDER BY
and DISTINCT ON clauses:
---
SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max"
)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max"
LIMIT 10 OFFSET 0
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
SQL state: 42P10
Finally, if you run this statement it works fine (removing one of the
duplicate search_rate_max statements):
SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id LIMIT 10 OFFSET 0
What's going on here? Am I doing something that isn't legitimate SQL? I
can't see why having a duplicate CASE statement should foul things up
like this? It's pretty clear (from additional testing not included in
this email) that the duplicate "search_rate_max" CASE is causing the
problem.
Thanks for any advice or suggestions on how to get this to run
correctly. Is this a bug?
Basically I'm doing this as an optimization - I can get much better
performance running the DISTINCT ON in some circumstances than using
DISTINCT, but the edge case above is breaking my tests and preventing
me from implementing the idea. The code is generated by an application
layer which is not really paying attention to whether or not the two
CASE statements apply to the same field or not (sometimes they do
sometimes they don't)..
Thanks!
Steve
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
[SQL] integrity check and visibility was: COPY equivalent for updates
On Tue, 15 Jul 2008 22:13:42 -0500 Decibel! <[EMAIL PROTECTED]> wrote: > Depending on what you're doing, it might well be fastest to... > BEGIN; > COPY temp_table FROM 'file'; > DELETE FROM real_table WHERE real_table_id IN (SELECT > real_table_id FROM temp_table); > INSERT INTO real_table SELECT * FROM temp_table; > COMMIT; Thank you for making me reconsider this approach. I gave up since I have a lot of ON DELETE CASCADE... but you made me remind that there should be a way to delay the checks at the end of the transaction (I still couldn't find the syntax on the manual). Actually from my understanding an UPDATE in PostgreSQL is a DELETE + INSERT. But wouldn't the subquery DELETE FROM real_table WHERE real_table_id IN (SELECT real_table_id FROM temp_table); take longer to execute than the UPDATE version: update t1 set col1=temp_t1.col1, col2=temp_t1.col2 where t1.id=temp_t1.id; mutatis mutandis? I'd say that pg should be enough smart to take advantage of a unique key on real_table.real_table_id and stop searching as soon as it deleted the first occurrence of an ID found in real_table from temp_table. Is it? Would adding an unique index on temp_table make the process faster or slower? I know that temp_table.real_table_id will be unique so I'm not interested in the integrity check (that anyway should take place later). At first glance the unique index on real_table.real_table_id may be useful, the one on temp_table should not and will actually make the process slower adding index creation time. So the "ON DELETE CASCADE/SET NULL" problem should be solved once I find the syntax to delay checks Can it be solved this way? I'm going to check as soon as I find the syntax on the manual. What about visibility of the "going to be deleted rows"? If another transaction start between the DELETE and the INSERT statement are executed... what is it going to see? What if I don't have just a single INSERT statement since some columns will be moved to another table? Does this make necessary the UPDATE approach in spite of the DELETE/INSERT approach? If I had temp_table split into 2 tables eg. create temp_table( id int, col11 varchar(32), col21 varchar(32) ); create table table1 ( id int primary key, col11 varchar(32) ); create table table2 ( id int references table1(id) on delete cascade, col21 varchar(32) ); I wouldn't be concerned that data in table1 and table2 contains data updated in different times and "not coherent" rather I'd be concerned there is a row in table1 but there isn't in table2. Furthermore... if I delay checks and ON DELETE triggers rows in table2 won't be deleted when I delete rows in table1 unless I do it "manually". At this point I'm more concerned of maintainability over speed. Actually if I follow the UPDATE approach and temp_table is still split across several tables I'll have to write an update for each tableN anyway... so there shouldn't be too much difference between the UPDATE and the DELETE/INSERT approach. What about the effect of a longer transaction compared to a list of updates on speed? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Problem with ORDER BY and DISTINCT ON
Steve Midgley <[EMAIL PROTECTED]> writes:
> SELECT DISTINCT ON
> ("property"."state",
> CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
> END,"search_rate_max",
> CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
> END,"search_rate_max",
> property.id)
> property.id
> FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
> ORDER BY
>"property"."state",
>CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
> END,"search_rate_max",
>CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
> END,"search_rate_max",
>property.id
> LIMIT 10 OFFSET 0
> RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER
> BY expressions
Interesting. You realize of course that sorting by the same expression
twice is completely redundant? I haven't dug through the code yet but
I think what is happening is that ORDER BY knows that and gets rid of
the duplicate entries while DISTINCT ON fails to do so. Or some story
approximately like that. It should be fixed, but the immediate
workaround is just to get rid of the redundant sort keys:
SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
"search_rate_max",
-- CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
-- "search_rate_max",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
"search_rate_max",
-- CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
-- "search_rate_max",
property.id
LIMIT 10 OFFSET 0
BTW, why are you bothering with the CASEs at all? Null values of
search_rate_max would sort high already.
regards, tom lane
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Rollback in Postgres
> No, they developed it for marketing. Perhaps, but towards whom? PostgreSQL wouldn't hurt if a lot of developers and DBA's was lured into the trap by this new feature. > Keep in mind that Oracle has six thousand full-time developers and an > already extremely mature database. Stuff that they see fit to add is > not necessarily going to be on our radar screen in the foreseeable > future. I wasn't proposing to add it in 8.4. Just to add it to the TODO. Perhaps someone would look at it some point in the future. -- Med venlig hilsen Kaare Rasmussen, Jasonic Jasonic Telefon: +45 3816 2582 Nordre Fasanvej 12 2000 Frederiksberg Email: [EMAIL PROTECTED] -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] How to GROUP results BY month
On Wed, 2008-07-16 at 07:39 +0200, A. Kretschmer wrote: > am Tue, dem 15.07.2008, um 18:15:07 -0700 mailte Mark Roberts > folgendes: > > > > On Tue, 2008-07-15 at 14:31 +0200, A. Kretschmer wrote: > > > am Tue, dem 15.07.2008, um 13:12:39 +0100 mailte Oliveiros > Cristina folgendes: > > > > Howdy, all, > > > > > > > > I have a problem. > > > > > > > > I have a table which one of the fields is of type date. > > > > > > > > I need to obtain the totals of the other fields in a by-month > basis > > > > IS there any easy way to do this using the GROUP BY or any other > construct? > > > > > > ... group by extract(month from date) > > > > > > > > > Andreas > > > > It's worth noting that extract(month from timestamp) returns a > month_no, and thus will not be suitable for grouping queries that span > years. > > Right, but that wasn't the question... Honestly, the way the question was phrased, I'd have assumed that it wanted to group by month (not group by a group of months). Jan 08 is distinct from Jan 07. Please accept my sincerest apologies if you you feel that I misinterpreted the question. I was merely trying to illustrate the difference between what each approach was. -Mark -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Problem with ORDER BY and DISTINCT ON
At 07:29 AM 7/16/2008, Tom Lane wrote: Steve Midgley <[EMAIL PROTECTED]> writes: Interesting. You realize of course that sorting by the same expression twice is completely redundant? I haven't dug through the code yet but Thanks Tom. Yeah, I was a little embarrassed to throw this code up on the list b/c it's pretty weak. It's basically machine written - so sometimes the CASE features a different "THEN X ELSE Y" order. I think what is happening is that ORDER BY knows that and gets rid of the duplicate entries while DISTINCT ON fails to do so. Or some story approximately like that. It should be fixed, but the immediate workaround is just to get rid of the redundant sort keys: I don't know if this will help track down the problem, but I figured out last night that ORDER BY is totally fine with having TWO duplicate entries, so long as I only put ONE entry in the DISTINCT ON area. Of course removing the duplicate from both areas is the correct solution and I broke down and hacked that into the auto-sql-writing code and so my immediate problem is solved. I'm happy to file this as a ticket for Pg (please point me to your ticket tool as I've never used it). This is not a very big deal but Pg has such a high compliance with wacky-but-valid SQL it does seem like it should be fixed just because. Let me know if I can help on that. Best, Steve -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] pg_advisory_lock(bigint) vs. LOCK TABLE
Hi, What's the difference between below two queue implementations? -- -- With advisory locks. -- BEGIN; SELECT pg_advisory_lock((SELECT oid FROM pg_class WHERE relname = 'queue')); DELETE FROM queue WHERE id = (SELECT MIN(id) FROM queue) RETURNING id, val; SELECT pg_advisory_unlock((SELECT oid FROM pg_class WHERE relname = 'queue')); COMMIT; -- -- With LOCK table command. -- BEGIN; LOCK TABLE queue; DELETE FROM queue WHERE id = (SELECT MIN(id) FROM queue) RETURNING id, val; COMMIT; Any helps will be appreciated. (BTW, yep, I'm aware of PGQ module of skyytools[1].) Regards. [1] http://skytools.projects.postgresql.org/ -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] PERSISTANT PREPARE (another point of view)
Milan Oparnica wrote: It's simply to complicated to return recordsets through server-side stored procedures. They are obviously designed to do complex data manipulation, returning few output variables informing the caller about final results. Returning records through sets of user-defined-types is memory and performance waste (please see my previous post as reply to Steve for more details). Plus it's hard to maintain and make improvements to such a system. I hate to see 800 user types made for every query we made as stored procedure. Is this topic completely out of scope in Postgre ? If I'm missing something too obvious or too important, please let me know what. I run over and over through internet and Postgre documentation and still found nothing. Is there a better place to communicate with Postgre developers ? Sincerely, Milan Oparnica -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] PERSISTANT PREPARE (another point of view)
Hello 2008/7/16 Milan Oparnica <[EMAIL PROTECTED]>: > Milan Oparnica wrote: >> >> It's simply to complicated to return recordsets through server-side stored >> procedures. They are obviously designed to do complex data manipulation, >> returning few output variables informing the caller about final results. >> Returning records through sets of user-defined-types is memory and >> performance waste (please see my previous post as reply to Steve for more >> details). Plus it's hard to maintain and make improvements to such a system. >> I hate to see 800 user types made for every query we made as stored >> procedure. > > Is this topic completely out of scope in Postgre ? > If I'm missing something too obvious or too important, please let me know > what. > > I run over and over through internet and Postgre documentation and still > found nothing. > try to write prototype and show advantages. I am able to undestand advantages of persistent prep. stamenents, but I see some disadvatage too. Mainly you have to manage some shared memory space for stored plans. It's not easy task - MySQL develepoers can talk. Implemenation on postgresql is little bit dificult - lot of structures that lives in processed memory have to be moved to shared memory. This feature is nice, but question is - who do write it? Actually this problem is solved from outside - with pooling. Regards Pavel Stehule > Is there a better place to communicate with Postgre developers ? > > Sincerely, > > Milan Oparnica > > -- > Sent via pgsql-sql mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql > -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
