They are all 'not null' and I am trying to do exactly the kind of task you described. I tried the first example on my DB and got a syntax error:


tle-bu=> SELECT a.file_name, a.file_parent_dir, a.file_type FROM file_info_1 a WHERE NOT EXIST (SELECT NULL FROM file_set_1 b WHERE b.fs_name=a.file_name, b.fs_parent_dir=a.file_parent_dir, b.fs_type=a.file_type);
ERROR: syntax error at or near "SELECT" at character 88


I've quickly read the thread and I don't think you got an answer as to why you are getting a syntax error here. Your query shows something line

WHERE b.fs_name=a.file_name, b.fs_parent_dir=a.file_parent_dir

you need to put an AND instead of a comma:

WHERE b.fs_name=a.file_name AND b.fs_parent_dir=a.file_parent_dir



As for which of the queries is best I don't know. My background is as an Oracle developer. I think that Bruno already suggested testing the three queries. There is a trace utility which shows some of what happens under the covers of a query. I've used it extensively in Oracle but have never used it in Postgresql.

If I understand what you said, the NOT IN was significantly slower. That has been my experience in Oracle long time ago so I've tended to shy away from that syntax. I'm sure optimizers are much better now then when I experimented with NOT IN but my coworker who tried it in Oracle was getting a slower response than with a subselect about a year ago. Theoretically if 3 queries are logically equivalent as the three queries you've been given, an optimizer should find the same best query plan to execute it. I don't think that optimizers are that smart yet.

The outer join is probably doing either a sort merge or a hash join. In your application this should be the best option. (A sort merge sorts both tables first or at least the key columns and then merges the tables together.)

Bruno said that the subselect would be slower. It may be that he thinks it will do a nested loop. That is that it will read each row in table A and try to find that concatenated key in table B's index. I don't think that a nested loop would be very good in your particular application.

As for the indexes you set up, I think they are correct indexes.

Vincent






---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to