date:20080716

[SQL] Problem with ORDER BY and DISTINCT ON

2008-07-16 Thread Steve Midgley


Hi,

I'm a little baffled. I'm trying to generate a SQL statement that 
issues a DISTINCT ON using the same values as my ORDER BY statement. 
I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm 
on Pg 8.2. Here is some SQL to get you started at seeing my problem:


--

drop table if exists property;
create table property
( id serial,
  state varchar(255),
  search_rate_max decimal(8,2),
  data_priority_code varchar(255)
);

SELECT DISTINCT ON
("property"."state",
 CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",
 CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",

 property.id)
 property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
  "property"."state",
  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",
  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",

  property.id
LIMIT 10 OFFSET 0


RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions

SQL state: 42P10


Now if you run this statement it works

--

SELECT DISTINCT ON
("property"."state",
 property.id)
 property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
  "property"."state",
  property.id
LIMIT 10 OFFSET 0

--

However if you run this statement it ALSO works, which tells me it's 
not just my CASE statements that are messing things up (note in this 
example, I just removed the primary key "property.id" from the ORDER BY 
and DISTINCT ON clauses:


---

SELECT DISTINCT ON
("property"."state",
 CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",
 CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max"

 )
 property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
  "property"."state",
  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",
  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max"

LIMIT 10 OFFSET 0


RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions

SQL state: 42P10


Finally, if you run this statement it works fine (removing one of the 
duplicate search_rate_max statements):




SELECT DISTINCT ON
("property"."state",
 CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",

 property.id)
 property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
  "property"."state",
  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",

  property.id LIMIT 10 OFFSET 0



What's going on here? Am I doing something that isn't legitimate SQL? I 
can't see why having a duplicate CASE statement should foul things up 
like this? It's pretty clear (from additional testing not included in 
this email) that the duplicate "search_rate_max" CASE is causing the 
problem.


Thanks for any advice or suggestions on how to get this to run 
correctly. Is this a bug?


Basically I'm doing this as an optimization - I can get much better 
performance running the DISTINCT ON in some circumstances than using 
DISTINCT, but the edge case above is breaking my tests and preventing 
me from implementing the idea. The code is generated by an application 
layer which is not really paying attention to whether or not the two 
CASE statements apply to the same field or not (sometimes they do 
sometimes they don't)..


Thanks!

Steve


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] integrity check and visibility was: COPY equivalent for updates

2008-07-16 Thread Ivan Sergio Borgonovo

On Tue, 15 Jul 2008 22:13:42 -0500
Decibel! <[EMAIL PROTECTED]> wrote:

> Depending on what you're doing, it might well be fastest to...

> BEGIN;
> COPY temp_table FROM 'file';
> DELETE FROM real_table WHERE real_table_id IN (SELECT
> real_table_id FROM temp_table);
> INSERT INTO real_table SELECT * FROM temp_table;
> COMMIT;

Thank you for making me reconsider this approach.
I gave up since I have a lot of ON DELETE CASCADE... but you made me
remind that there should be a way to delay the checks at the end of
the transaction (I still couldn't find the syntax on the manual).

Actually from my understanding an UPDATE in PostgreSQL is a DELETE +
INSERT. But wouldn't the subquery
DELETE FROM real_table WHERE real_table_id IN (SELECT real_table_id
FROM temp_table);

take longer to execute than the UPDATE version:
update t1 set col1=temp_t1.col1, col2=temp_t1.col2
  where t1.id=temp_t1.id;
mutatis mutandis?

I'd say that pg should be enough smart to take advantage of a unique
key on real_table.real_table_id and stop searching as soon as it
deleted the first occurrence of an ID found in real_table from
temp_table. Is it?
Would adding an unique index on temp_table make the process faster
or slower? I know that temp_table.real_table_id will be unique so
I'm not interested in the integrity check (that anyway should take
place later). At first glance the unique index on
real_table.real_table_id may be useful, the one on temp_table should
not and will actually make the process slower adding index creation
time.

So the "ON DELETE CASCADE/SET NULL" problem should be solved once I
find the syntax to delay checks
Can it be solved this way? I'm going to check as soon as I find the
syntax on the manual.

What about visibility of the "going to be deleted rows"?

If another transaction start between the DELETE and the INSERT
statement are executed... what is it going to see?
What if I don't have just a single INSERT statement since some
columns will be moved to another table?

Does this make necessary the UPDATE approach in spite of the
DELETE/INSERT approach?
If I had temp_table split into 2 tables eg.

create temp_table(
  id int,
  col11 varchar(32),
  col21 varchar(32)
);
create table table1 (
  id int primary key,
  col11 varchar(32)
);
create table table2 (
  id int references table1(id) on delete cascade,
  col21 varchar(32)
);

I wouldn't be concerned that data in table1 and table2 contains data
updated in different times and "not coherent" rather I'd be
concerned there is a row in table1 but there isn't in table2.
Furthermore... if I delay checks and ON DELETE triggers rows in
table2 won't be deleted when I delete rows in table1 unless I do it
"manually".

At this point I'm more concerned of maintainability over speed.
Actually if I follow the UPDATE approach and temp_table is still
split across several tables I'll have to write an update for each
tableN anyway... so there shouldn't be too much difference between
the UPDATE and the DELETE/INSERT approach.

What about the effect of a longer transaction compared to a list of
updates on speed?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Problem with ORDER BY and DISTINCT ON

2008-07-16 Thread Tom Lane

Steve Midgley <[EMAIL PROTECTED]> writes:
> SELECT DISTINCT ON
> ("property"."state",
>   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>   property.id)
>   property.id
> FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
> ORDER BY
>"property"."state",
>CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>property.id
> LIMIT 10 OFFSET 0

> RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
> BY expressions

Interesting.  You realize of course that sorting by the same expression
twice is completely redundant?  I haven't dug through the code yet but
I think what is happening is that ORDER BY knows that and gets rid of
the duplicate entries while DISTINCT ON fails to do so.  Or some story
approximately like that.  It should be fixed, but the immediate
workaround is just to get rid of the redundant sort keys:

SELECT DISTINCT ON
("property"."state",
  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
  "search_rate_max",
--  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
--  "search_rate_max",
  property.id)
  property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
   "property"."state",
   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
   "search_rate_max",
--   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
--   "search_rate_max",
   property.id
LIMIT 10 OFFSET 0

BTW, why are you bothering with the CASEs at all?  Null values of
search_rate_max would sort high already.

regards, tom lane

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Rollback in Postgres

2008-07-16 Thread Kaare Rasmussen

> No, they developed it for marketing.

Perhaps, but towards whom? PostgreSQL wouldn't hurt if a lot of developers and 
DBA's was lured into the trap by this new feature.

> Keep in mind that Oracle has six thousand full-time developers and an
> already extremely mature database.  Stuff that they see fit to add is
> not necessarily going to be on our radar screen in the foreseeable
> future.

I wasn't proposing to add it in 8.4. Just to add it to the TODO. Perhaps 
someone would look at it some point in the future.

-- 

Med venlig hilsen
Kaare Rasmussen, Jasonic

Jasonic Telefon: +45 3816 2582
Nordre Fasanvej 12
2000 Frederiksberg  Email: [EMAIL PROTECTED]

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] How to GROUP results BY month

2008-07-16 Thread Mark Roberts


On Wed, 2008-07-16 at 07:39 +0200, A. Kretschmer wrote:
> am  Tue, dem 15.07.2008, um 18:15:07 -0700 mailte Mark Roberts
> folgendes:
> > 
> > On Tue, 2008-07-15 at 14:31 +0200, A. Kretschmer wrote:
> > > am  Tue, dem 15.07.2008, um 13:12:39 +0100 mailte Oliveiros
> Cristina folgendes:
> > > > Howdy, all,
> > > > 
> > > > I have a problem.
> > > > 
> > > > I have a table which one of the fields is of type date.
> > > > 
> > > > I need to obtain the totals of the other fields in a  by-month
> basis
> > > > IS there any easy way to do this using the GROUP BY or any other
> construct?
> > > 
> > > ... group by extract(month from date)
> > > 
> > > 
> > > Andreas
> > 
> > It's worth noting that extract(month from timestamp) returns a
> month_no, and thus will not be suitable for grouping queries that span
> years.
> 
> Right, but that wasn't the question...

Honestly, the way the question was phrased, I'd have assumed that it
wanted to group by month (not group by a group of months).  Jan 08 is
distinct from Jan 07.

Please accept my sincerest apologies if you you feel that I
misinterpreted the question.  I was merely trying to illustrate the
difference between what each approach was.

-Mark


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Problem with ORDER BY and DISTINCT ON

2008-07-16 Thread Steve Midgley


At 07:29 AM 7/16/2008, Tom Lane wrote:

Steve Midgley <[EMAIL PROTECTED]> writes:
Interesting.  You realize of course that sorting by the same 
expression

twice is completely redundant?  I haven't dug through the code yet but
Thanks Tom. Yeah, I was a little embarrassed to throw this code up on 
the list b/c it's pretty weak. It's basically machine written - so 
sometimes the CASE features a different "THEN X ELSE Y" order.



I think what is happening is that ORDER BY knows that and gets rid of
the duplicate entries while DISTINCT ON fails to do so.  Or some story
approximately like that.  It should be fixed, but the immediate
workaround is just to get rid of the redundant sort keys:
I don't know if this will help track down the problem, but I figured 
out last night that ORDER BY is totally fine with having TWO duplicate 
entries, so long as I only put ONE entry in the DISTINCT ON area.


Of course removing the duplicate from both areas is the correct 
solution and I broke down and hacked that into the auto-sql-writing 
code and so my immediate problem is solved. I'm happy to file this as a 
ticket for Pg (please point me to your ticket tool as I've never used 
it). This is not a very big deal but Pg has such a high compliance with 
wacky-but-valid SQL it does seem like it should be fixed just because. 
Let me know if I can help on that.


Best,

Steve


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] pg_advisory_lock(bigint) vs. LOCK TABLE

2008-07-16 Thread Volkan YAZICI

Hi,

What's the difference between below two queue implementations?

  --
  -- With advisory locks.
  --
  BEGIN;
  SELECT pg_advisory_lock((SELECT oid
 FROM pg_class
WHERE relname = 'queue'));
  DELETE FROM queue
WHERE id = (SELECT MIN(id) FROM queue)
RETURNING id, val;
  SELECT pg_advisory_unlock((SELECT oid
   FROM pg_class
  WHERE relname = 'queue'));
  COMMIT;
  
  --
  -- With LOCK table command.
  --
  BEGIN;
  LOCK TABLE queue;
  DELETE FROM queue
WHERE id = (SELECT MIN(id) FROM queue)
RETURNING id, val;
  COMMIT;

Any helps will be appreciated. (BTW, yep, I'm aware of PGQ module of
skyytools[1].)


Regards.

[1] http://skytools.projects.postgresql.org/

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] PERSISTANT PREPARE (another point of view)

2008-07-16 Thread Milan Oparnica


Milan Oparnica wrote:


It's simply to complicated to return recordsets through server-side 
stored procedures. They are obviously designed to do complex data 
manipulation, returning few output variables informing the caller about 
final results. Returning records through sets of user-defined-types is 
memory and performance waste (please see my previous post as reply to 
Steve for more details). Plus it's hard to maintain and make 
improvements to such a system. I hate to see 800 user types made for 
every query we made as stored procedure.


Is this topic completely out of scope in Postgre ?
If I'm missing something too obvious or too important, please let me 
know what.


I run over and over through internet and Postgre documentation and still 
found nothing.


Is there a better place to communicate with Postgre developers ?

Sincerely,

Milan Oparnica

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] PERSISTANT PREPARE (another point of view)

2008-07-16 Thread Pavel Stehule

Hello

2008/7/16 Milan Oparnica <[EMAIL PROTECTED]>:
> Milan Oparnica wrote:
>>
>> It's simply to complicated to return recordsets through server-side stored
>> procedures. They are obviously designed to do complex data manipulation,
>> returning few output variables informing the caller about final results.
>> Returning records through sets of user-defined-types is memory and
>> performance waste (please see my previous post as reply to Steve for more
>> details). Plus it's hard to maintain and make improvements to such a system.
>> I hate to see 800 user types made for every query we made as stored
>> procedure.
>
> Is this topic completely out of scope in Postgre ?
> If I'm missing something too obvious or too important, please let me know
> what.
>
> I run over and over through internet and Postgre documentation and still
> found nothing.
>

try to write prototype and show advantages. I am able to undestand
advantages of persistent prep. stamenents, but I see some disadvatage
too. Mainly you have to manage some shared memory space for stored
plans. It's not easy task - MySQL develepoers can talk. Implemenation
on postgresql is little bit dificult - lot of structures that lives in
processed memory have to be moved to shared memory.

This feature is nice, but question is - who do write it? Actually this
problem is solved from outside - with pooling.

Regards
Pavel Stehule

> Is there a better place to communicate with Postgre developers ?
>
> Sincerely,
>
> Milan Oparnica
>
> --
> Sent via pgsql-sql mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] Problem with ORDER BY and DISTINCT ON

[SQL] integrity check and visibility was: COPY equivalent for updates

Re: [SQL] Problem with ORDER BY and DISTINCT ON

Re: [SQL] Rollback in Postgres

Re: [SQL] How to GROUP results BY month

Re: [SQL] Problem with ORDER BY and DISTINCT ON

[SQL] pg_advisory_lock(bigint) vs. LOCK TABLE

Re: [SQL] PERSISTANT PREPARE (another point of view)

Re: [SQL] PERSISTANT PREPARE (another point of view)

9 matches

Site Navigation

Mail list logo

Footer information