Re: Using indexes
On Tue, 28 Sep 2021, 17:52 Rick Hillegas, wrote: > Glad that you have made progress on this puzzle. One comment inline... > > On 9/28/21 4:10 AM, John English wrote: > > The WHERE clause seems to prevent the index from being used. > The index is not usable in this query because username is not the > leading column in the index. > Ah, right. >
Re: Using indexes
Glad that you have made progress on this puzzle. One comment inline... On 9/28/21 4:10 AM, John English wrote: A couple more data points, from testing different variants of the inner select: 1) SELECT time FROM system_log ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY; 419ms, "Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at read uncommitted isolation level using share row locking chosen by the optimizer" 2) SELECT time FROM system_log WHERE username='foo' ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY; 2245ms, "Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation level using share row locking chosen by the optimizer" Removing "FETCH NEXT 20 ROWS ONLY": 1) 388,237 rows in 3859ms, otherwise same as above (uses index). 2) 23,211 rows in 2199ms, otherwise same as above (doesn't use index). The WHERE clause seems to prevent the index from being used. The index is not usable in this query because username is not the leading column in the index.
Re: Using indexes
A couple more data points, from testing different variants of the inner select: 1) SELECT time FROM system_log ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY; 419ms, "Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at read uncommitted isolation level using share row locking chosen by the optimizer" 2) SELECT time FROM system_log WHERE username='foo' ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY; 2245ms, "Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation level using share row locking chosen by the optimizer" Removing "FETCH NEXT 20 ROWS ONLY": 1) 388,237 rows in 3859ms, otherwise same as above (uses index). 2) 23,211 rows in 2199ms, otherwise same as above (doesn't use index). The WHERE clause seems to prevent the index from being used. -- John English -- This email has been checked for viruses by AVG. https://www.avg.com
Re: Using indexes
On 27/09/2021 18:23, John English wrote: I'll try the temporary table approach as soon as I get a minute, and will let you know what happens. I'm finally getting somewhere. Having (a) moved the row counting out to a separate query, (b) renamed columns in the table to match what my code requires so I can access the table directly instead of using a view, and (c) using your (Rick's) original idea of a nested select (which I decided to try again before going for a temporary table), I get an execution time of 455ms (woo-hoo!) for this query: SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details FROM system_log, (SELECT time FROM system_log ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY) AS x WHERE system_log.time=x.time; However, if I add in a WHERE clause: SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details FROM system_log, (SELECT time FROM system_log WHERE username='foo' ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY) AS x WHERE system_log.time=x.time; the execution time goes up to 2 seconds (I'm still happy!). The left result set (the nested select, by the look of it) still doesn't use the index: Left result set: Row Count (2): Number of opens = 1 Rows seen = 20 Rows filtered = 0 ... Source result set: Sort ResultSet: Number of opens = 1 Rows input = 23211 Rows returned = 20 Eliminate duplicates = false In sorted order = false Sort information: Number of rows input=23211 Number of rows output=23211 Sort type=internal ... optimizer estimated row count: 38824.20 optimizer estimated cost: 399695.58 Source result set: Project-Restrict ResultSet (4): Number of opens = 1 Rows seen = 23211 Rows filtered = 0 restriction = false projection = true ... Source result set: Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation level using share row locking chosen by the optimizer Number of opens = 1 Rows seen = 23211 but the right result set uses the PK index. So, I'm in a much better position as the result of Rick's advice, but I still have no idea why the descending PK index isn't ever used!!! -- John English -- This email has been checked for viruses by AVG. https://www.avg.com
Re: Using indexes
On 26/09/2021 22:57, Rick Hillegas wrote: The support for FETCH/OFFSET is pretty minimal. Probably, the optimizer isn't smart enough to know that the subquery returns only 20 small rows. What happens if you dump the results of the subquery into a temporary table and then join that with system_log? Thanks, I'll give it a whirl. Meanwhile,I've got it down to about 5 or 6 seconds by (a) eliminating the view in favour of a direct table access, and (b) using a separate SELECT COUNT(*) to get the row count, again using the table directly and without using an ORDER BY clause. (I renamed the id and time columns to time and t_time respectively, so the names match what the code expects to get back from the view, and then recreated the index.) It still doesn't seem to be using the index (see trace below) but it's an improvement. Sorting by time ASC use the PK index and takes about 200ms, but time DESC doesn't use an index and takes about 3s. And I still don't understand why it's ignoring the index. I'll try the temporary table approach as soon as I get a minute, and will let you know what happens. Thanks for the help, -- John English Statement Text: SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details,name,username FROM system_log ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY Parse Time: 19 Bind Time: 1 Optimize Time: 1 Generate Time: 0 Compile Time: 21 Execute Time: 4474 Begin Compilation Timestamp : 2021-09-27 17:52:23.802 End Compilation Timestamp : 2021-09-27 17:52:23.823 Begin Execution Timestamp : 2021-09-27 17:52:23.824 End Execution Timestamp : 2021-09-27 17:52:28.298 Statement Execution Plan Text: Scroll Insensitive ResultSet: Number of opens = 1 Rows seen = 20 Number of reads from hash table = 20 Number of writes to hash table = 20 constructor time (milliseconds) = 0 open time (milliseconds) = 4474 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 388236.00 optimizer estimated cost: 399685.32 Source result set: Row Count (1): Number of opens = 1 Rows seen = 20 Rows filtered = 0 constructor time (milliseconds) = 0 open time (milliseconds) = 4474 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 388236.00 optimizer estimated cost: 399685.32 Source result set: Project-Restrict ResultSet (5): Number of opens = 1 Rows seen = 20 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 4474 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 388236.00 optimizer estimated cost: 399685.32 Source result set: Sort ResultSet: Number of opens = 1 Rows input = 388231 Rows returned = 20 Eliminate duplicates = false In sorted order = false Sort information: Number of rows input=388231 Number of rows output=388231 Sort type=internal constructor time (milliseconds) = 0 open time (milliseconds) = 4474 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 388236.00 optimizer estimated cost: 399685.32 Source result set: Project-Restrict ResultSet (3): Number of opens = 1 Rows seen = 388231 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 4232 close time (milliseconds) = 3 restriction time (milliseconds) = 0 projection time (milliseconds) = 1807
Re: Using indexes
On 9/25/21 11:59 AM, John English wrote: On 25/09/2021 21:14, Rick Hillegas wrote: On 9/25/21 7:39 AM, John English wrote: SELECT id,DateTimeFormat(time,null) AS t_time,name,username,facility,event,details FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY; I can remember whether you tried to rewrite the query to use a subquery. Something like this: SELECT id, time AS t_time,name,username,facility,event,details FROM system_log s, ( SELECT id AS log_id FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY ) t WHERE s.id = t.log_id ; Does that help? -Rick No, when I tried it, it made it even worse if anything. The subquery still fetches all 400,000 rows, sorts them without using the index, and then uses the results to select again in the outer query. I really don't understand why the index is ignored. The support for FETCH/OFFSET is pretty minimal. Probably, the optimizer isn't smart enough to know that the subquery returns only 20 small rows. What happens if you dump the results of the subquery into a temporary table and then join that with system_log?
Re: Using indexes
On 25/09/2021 21:14, Rick Hillegas wrote: SELECT id, time AS t_time,name,username,facility,event,details FROM system_log s, ( SELECT id AS log_id FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY ) t WHERE s.id = t.log_id ; Ideally I want to access the table through a view, defined like this: CREATE VIEW system_log_view AS SELECT time AS t_time, facility, event, details, name, username, id AS time FROMsystem_log (The renamings may seem strange but they serve a purpose!) Selecting the ids directly from the table takes 0.4s: SELECT id FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY; Going through the view takes much longer (2.9s): SELECT time FROM system_log_view ORDER BY time DESC NULLS LAST FETCH FIRST 20 ROWS ONLY; It seems like the view is materializing all the rows, and then 20 rows are being selected from the 400,000: Source result set: Sort ResultSet: Number of opens = 1 Rows input = 388226 Rows returned = 20 In any case I want to know the total number of rows, even though I display them in pages of 20 at a time. I thought I might get away with using a separate query for this, but no: SELECT COUNT(*) FROM system_log; -- 0.271s SELECT COUNT(*) FROM system_log_view; -- 2.184s I don't understand why the view is so much slower than accessing the table directly. The direct table access says this: Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at read uncommitted isolation level using share row locking chosen by the optimizer The corresponding line from executing the view doesn't mention the index: Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation level using share row locking chosen by the optimizer What am I not understanding about views here? -- John English -- This email has been checked for viruses by AVG. https://www.avg.com
Re: Using indexes
On 25/09/2021 21:14, Rick Hillegas wrote: On 9/25/21 7:39 AM, John English wrote: SELECT id,DateTimeFormat(time,null) AS t_time,name,username,facility,event,details FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY; I can remember whether you tried to rewrite the query to use a subquery. Something like this: SELECT id, time AS t_time,name,username,facility,event,details FROM system_log s, ( SELECT id AS log_id FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY ) t WHERE s.id = t.log_id ; Does that help? -Rick No, when I tried it, it made it even worse if anything. The subquery still fetches all 400,000 rows, sorts them without using the index, and then uses the results to select again in the outer query. I really don't understand why the index is ignored. -- John English -- This email has been checked for viruses by AVG. https://www.avg.com
Re: Using indexes
On 9/25/21 7:39 AM, John English wrote: SELECT id,DateTimeFormat(time,null) AS t_time,name,username,facility,event,details FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY; I can remember whether you tried to rewrite the query to use a subquery. Something like this: SELECT id, time AS t_time,name,username,facility,event,details FROM system_log s, ( SELECT id AS log_id FROM system_log ORDER BY id DESC NULLS LAST FETCH FIRST 20 ROWS ONLY ) t WHERE s.id = t.log_id ; Does that help? -Rick