On May 8, 2005, at 6:51 PM, Russell Smith wrote:

On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
You cannot use an index for %CORVETTE%, or %RED%. There is no way
for the index to know if a row had that in the middle without scanning the whole
index. So it's much cheaper to do a sequence scan.



While I believe you, I'm confused by this line in my original EXPLAIN ANALYZE:


-> Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473)
Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text)
Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

Doesn't that mean it was using an index to filter? Along those lines, before I created index 'ea1', the query was much much slower. So, it seemed like creating this index made a difference.


One possible way to make the query faster is to limit based on date, as you will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans. I think this query does what you do
above, and I think it will be faster, but I don't know.



I REALLY like this idea! If I could just filter by date first and then sequential scan through those, it should be very manageable. Hopefully I can keep this goal while still accommodating the requirement listed in my next paragraph.


select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' );



I have run this, and while it is very fast, I'm concerned it's not doing what I need. Here's the situation:


Due to the format of the systems with which I integrate ( I have no control over these formats ), we will get these 'recordtext' values one line at a time, accumulating over time. The only way I can find to make this work is to insert a new record for each line. The problem is, that when someone wants to search multiple keywords, they expect these words to be matched across multiple records with a given incident number.

 For a very simple example:

IncidentID Date Recordtext
-------------- ------------- -------------------------------------------------------
11111 2005-05-01 14:21 blah blah blah RED blah blah
2222 2005-05-01 14:23 not what we are looking for
11111 2005-05-02 02:05 blah CORVETTE blah blah


So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the condition will only be applied to a single row of recordtext at a time, not a whole group with the same incident number.

If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. I may also receive additional updates to the previous data. In that case, I need to replace the original record with the latest version of it. If I have already concatenated these rows into a single field, the logic to in-line replace only the old text that has changed is very very difficult at best. So, that's the reason I had to do two subqueries in my example. Please tell me if I misunderstood your logic and it really will match given my condition above, but it didn't seem like it would.

Thanks again for the quick responses! This list has been a great resource for me.

-Dan


---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Reply via email to