Thanks Dennis! That's exactly what I was looking for.
The column I'm updating _is_ indexed and I can understand why it would
take long to update. I'll double check to see if the index is really
needed.
However, I did go ahead and implement your other suggestions and tested
it on my system which is a recent copy of the actual db.
Holy carp! Processing time went from about 30 seconds down to about 1
second!
I have one question on your point #2: I assume TxnHist should be listed
first in the FROM clause because it is the table being updated and not
because it is the larger (more rows) of the two tables.
Thanks again and best wishes for continued recovery from your auto accident.
Doug
On 2/3/2015 9:48 AM, Dennis McGrath wrote:
Doug,
First of all, is the column you are updating indexed? That would slow updating
it tremendously on a table this long.
If that is not the case I would do this:
1. Make a multi column index on your temp table for the 3 columns in the order
that is used in your joining where clause.
2. Make the temp table the second table, not the first.
3. Set manopt on to make sure R:BASE follows your optimization.
4. Use this syntax (no parenthesis around the where clause):
UPDATE TxnHist +
SET ChryInvNbr = INV.ChryInvNbr +
FROM TxnHist TXN, ChryInvDtlTmp INV +
WHERE +
TXN.VPlNmbr = INV.VPlNmbr AND +
TXN.CusPnbr = INV.CusPnbr AND +
TXN.TxDate = INV.InvoiceDate
This will avoid trying to use any of the single indexes in TxnHist, and use a
very efficient multi-column index to get the update value from the temp table.
Further optimization can be done by changing the where clause (and temp index)
clause so the most unique column is first.
I suspect InvoiceDate would be the most unique, but only you can answer that
question.
Dennis McGrath
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Doug
Hamilton
Sent: Monday, February 02, 2015 8:15 PM
To: RBASE-L Mailing List
Subject: [RBASE-L] - WHERE Clause, indices and parens questions
Hi All -
I'm updating a column in a 1.2 megarow table (TxnHist) with values from
a column in a temp table (ChryPaid) that may have a few thousand rows,
based on three columns matching between the two tables.
I remember previous posts about use of parenthesis to force (or ignore?)
use of indexed columns and the best order in which to list the columns.
Unfortunately I can't find that post 'cause I'm asking those questions
again:
How should I use parenthesis to optimize the WHERE clause?
Does the order of the columns matter? i.e. most unique listed first
or last?
Are these even a concern if MANOPT is OFF?
The reason I'm asking is that it took about an hour run the following
command:
UPDATE TxnHist +
SET ChryInvNbr = INV.ChryInvNbr +
FROM ChryInvDtlTmp INV, TxnHist TXN +
WHERE +
(INV.VPlNmbr = TXN.VPlNmbr AND +
INV.CusPnbr = TXN.CusPnbr AND +
INV.InvoiceDate = TXN.TxDate)
Column CusPnbr in table TxnHist is an FK, the other five columns in the
WHERE clause are single-column indices.
The most unique column in TxnHist is VPlNmbr.
Settings in the RBASE.DAT file are:
SET STATICDB ON
SET FASTLOCK ON
SET QUALCOLS 2
MANOPT is not set in RBASE.DAT or in code, I believe, so it is OFF by
default.
Per Taskmanager, Network utilization is 17% on a 100 mbps connection
(high?). CPU usage is about 10% to 15%.
This is on an XP machine with 2 Gb RAM.
R:Base is version 9.5.5.20116.
Thanks much for your help and my apologies for the repeat questions.
Doug
---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com
---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com