[SQL] select is fast, update based on same where clause is slow

2001-09-21 Thread Jeff Barrett

I have a select statement that returns a set of 74,000+ results back in
under a minute as follows:

select s.sessid, s.membid, s.datetime
from sessions2 s, (select min(datetime) as datetime, membid
  from sessions2
  where membid is not null
  group by membid) as minsess
where s.membid = minsess.membid
and s.datetime = minsess.datetime;

The final cost from explain for the above select is 22199.15 ... 24318.40
with rows = 5 and width = 28.

Then I issue an update as follows (to update those 74,000+ rows):
update sessions2 set sinceinception = 0
from sessions2 s, (select min(datetime) as datetime, membid from sessions2
group by membid) as mindate
where s.membid = mindate.membid
and s.datetime = mindate.datetime;

The final cost from explain for the above update is 31112.11...98869.91 with
rows = 2013209 and width=87.

This update statement has been left running over night and does not
complete. The ram usage on the machine is at about 3/4 capacity (800mb)
during this time and CPU usage is near 100%. The machine has the -F option
set and memory segments of 200mb and is running 7.1.2.

What could be causing this update statement to not complete?
Why are the costs so different since it seems to me that besides the cost of
the update they are the same query?

Any help would be great!

Jeff Barrett



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [SQL] select is fast, update based on same where clause is slow

2001-09-21 Thread Stephan Szabo

On Fri, 21 Sep 2001, Jeff Barrett wrote:

> I have a select statement that returns a set of 74,000+ results back in
> under a minute as follows:
> 
> select s.sessid, s.membid, s.datetime
> from sessions2 s, (select min(datetime) as datetime, membid
>   from sessions2
>   where membid is not null
>   group by membid) as minsess
> where s.membid = minsess.membid
> and s.datetime = minsess.datetime;
> 
> The final cost from explain for the above select is 22199.15 ... 24318.40
> with rows = 5 and width = 28.
> 
> Then I issue an update as follows (to update those 74,000+ rows):
> update sessions2 set sinceinception = 0
> from sessions2 s, (select min(datetime) as datetime, membid from sessions2
> group by membid) as mindate
> where s.membid = mindate.membid
> and s.datetime = mindate.datetime;
> 
> The final cost from explain for the above update is 31112.11...98869.91 with
> rows = 2013209 and width=87.
> 
> This update statement has been left running over night and does not
> complete. The ram usage on the machine is at about 3/4 capacity (800mb)
> during this time and CPU usage is near 100%. The machine has the -F option
> set and memory segments of 200mb and is running 7.1.2.
> 
> What could be causing this update statement to not complete?
> Why are the costs so different since it seems to me that besides the cost of
> the update they are the same query?

I thought that the updated table is always in your from list (implicitly),
so you'd want:
update sessions2 set sinceinception = 0
from (select min(datetime) as datetime, membid from sessions2 group by
 membid) as mindate
where sessions2.membid=mindate.membid and 
sessions2.datetime=mindate.datetime;

I think your query would be joining the s/mindate results against
sessions2.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [SQL] select is fast, update based on same where clause is slow

2001-09-21 Thread Josh Berkus

Jeff,

I think that you're running into a syntactical problem here:

> Then I issue an update as follows (to update those 74,000+ rows):
> update sessions2 set sinceinception = 0
> from sessions2 s, (select min(datetime) as datetime, membid from
> sessions2
> group by membid) as mindate
> where s.membid = mindate.membid
> and s.datetime = mindate.datetime;

In the query above, there is no linkage between the instance of
sessions2 (sessions2) you are updating and the instance you are limiting
(s).  As a result, you are running an update on 74,000^2 rows (about 55
trillion), and if the query ever completed you would find that
sinceinception would be set to 0 for all rows in sessions2.  Your query
should read:

 update sessions2 set sinceinception = 0
 from (select min(datetime) as datetime, membid from
   sessions2 group by membid) as mindate
 where sessions2.membid = mindate.membid
   and sessions2.datetime = mindate.datetime;

It's a peculiarity of Update queries that the name of the updated table
cannot be aliased anywhere in the query.  To add to the confusion, in MS
Transact SQL, you are expected to repeat the name of the updated table
in the FROM clause, while in Postgres such repetition is prohibited.

This is mainly due to the fact that UPDATE ... FROM is not well-defined
in the SQL 92 standard.

-Josh


__AGLIO DATABASE SOLUTIONS___
   Josh Berkus
  Complete information technology  [EMAIL PROTECTED]
   and data management solutions   (415) 565-7293
  for law firms, small businessesfax 621-2533
and non-profit organizations.  San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] Selecting latest value II

2001-09-21 Thread Thurstan R. McDougle

Hi back

Carl van Tast wrote:
> 
> Hi, Thurstan
> 
> On Thu, 20 Sep 2001 17:30:46 +0100, "Thurstan R. McDougle"
> <[EMAIL PROTECTED]> wrote:
> 
> > [...]
> >Carl van Tast had 2 good methods as follows
> >
> >SELECT userid, val
> >FROM tbl
> >WHERE NOT EXISTS (SELECT * FROM tbl AS t2
> >  WHERE tbl.userid=t2.userid AND t2.ts > tbl.ts);
> >
> >or
> >
> >SELECT tbl.userid, tbl.val
> >FROM tbl
> > INNER JOIN
> > (SELECT userid, max(ts) AS maxts
> >  FROM tbl
> >  GROUP BY userid) AS t2
> > ON (tbl.userid=t2.userid AND tbl.ts=t2.maxts);
> 
> ... although I like Tom Lane's even better. This one should outperform
> all others, especially my first one:
> 
> > SELECT DISTINCT ON (userid) userid, val, ts FROM table
> > ORDER BY userid, ts DESC;
Indeed.  I was looking for a 1 pass method but had not found one.  I had
thought of DISTINCT, but not DISTINCT ON as it I have not used it yet...
Even though I had just been looking at the code for the
DISTINCT/DISTINCT ON processing!

> 
> That's the reason I'm here: learning by helping :-)
Well, I think of helping more as paying my dues for the help received.

> 
> Kind regards
>  Carl van Tast

-- 
This is the identity that I use for NewsGroups. Email to 
this will just sit there. If you wish to email me replace
the domain with knightpiesold . co . uk (no spaces).

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [SQL] select is fast, update based on same where clause is slow

2001-09-21 Thread Jeff Barrett

That did the trick. Thank you for the quick detailed answer. It runs in
about a minute now.

Jeff Barrett

"Stephan Szabo" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> On Fri, 21 Sep 2001, Jeff Barrett wrote:
>
> > I have a select statement that returns a set of 74,000+ results back in
> > under a minute as follows:
> >
> > select s.sessid, s.membid, s.datetime
> > from sessions2 s, (select min(datetime) as datetime, membid
> >   from sessions2
> >   where membid is not null
> >   group by membid) as minsess
> > where s.membid = minsess.membid
> > and s.datetime = minsess.datetime;
> >
> > The final cost from explain for the above select is 22199.15 ...
24318.40
> > with rows = 5 and width = 28.
> >
> > Then I issue an update as follows (to update those 74,000+ rows):
> > update sessions2 set sinceinception = 0
> > from sessions2 s, (select min(datetime) as datetime, membid from
sessions2
> > group by membid) as mindate
> > where s.membid = mindate.membid
> > and s.datetime = mindate.datetime;
> >
> > The final cost from explain for the above update is 31112.11...98869.91
with
> > rows = 2013209 and width=87.
> >
> > This update statement has been left running over night and does not
> > complete. The ram usage on the machine is at about 3/4 capacity (800mb)
> > during this time and CPU usage is near 100%. The machine has the -F
option
> > set and memory segments of 200mb and is running 7.1.2.
> >
> > What could be causing this update statement to not complete?
> > Why are the costs so different since it seems to me that besides the
cost of
> > the update they are the same query?
>
> I thought that the updated table is always in your from list (implicitly),
> so you'd want:
> update sessions2 set sinceinception = 0
> from (select min(datetime) as datetime, membid from sessions2 group by
>  membid) as mindate
> where sessions2.membid=mindate.membid and
> sessions2.datetime=mindate.datetime;
>
> I think your query would be joining the s/mindate results against
> sessions2.
>
>
>
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: Q on "Re: [SQL] select is fast, update based on same where clause is slow "

2001-09-21 Thread Josh Berkus

Bladvin,

> I understand why did Jeff's original solution not work. I understand
> why yours do.
> 
> But could you please explain me how the 74000^2 can be calculated
> for this original query?:

Sure.  In Jeff's original query, the updated instance of the table
(sessions2) and the referenced instance (s) are seperate, without any
JOIN or WHERE conditions to link them.  Under these conditions, SQL
engines do a CROSS JOIN, where every single one of the rows in the first
table is matched against every single one of the rows in the second
table.  This gives you a result set of the number of rows in sessions2 *
the number of rows in s.  74,000^2  Get it?

Example:

TableA:
1   A
2   B

Table B
7   H
8   G

SELECT * FROM TableA, TableB
1   A   7   H
1   A   8   G
2   B   7   H
2   B   8   G

-Josh

-Josh


__AGLIO DATABASE SOLUTIONS___
   Josh Berkus
  Complete information technology  [EMAIL PROTECTED]
   and data management solutions   (415) 565-7293
  for law firms, small businessesfax 621-2533
and non-profit organizations.  San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org