[PERFORM] How to speed up this translation query?

2006-08-01 Thread tlm
I need some expert advice on how to optimize a translation query (this word choice will become clear shortly, I hope).

Say I have aHUMONGOUS table of foreign language translations (call it TRANS) with records like these:

meaning_id: 1
language_id: 5
translation: jidoosha

meaning_id: 1
language_id: 2
translation: voiture


meaning_id: 1
language_id: 5
translation: kuruma

meaning_id: 2
language_id: 2
translation: chat

meaning_id: 2
language_id: 5
translation: neko

meaning_id: 2
language_id: 3
translation: katz

meaning_id: 3
language_id: 4
translation: pesce


meaning_id: 3
language_id: 2
translation: poisson


meaning_id: 3
language_id: 5
translation: sakana

For the sake of this description, let's assume that the records above are all the records in TRANS (though in fact the number of records in TRANS is really abouttenmillion times greater).

Now suppose I have a tiny table calledINPUT consisting of single textfield (say, word). E.g. suppose thatINPUT looks like this:

katz

voiturepesce

Now, let's fix a language_id, say 5. This is the target language_id. Given this target language_id, and this particular INPUT table, I want the results of the query to be something like this:

neko
jidoosha
kuruma
sakana

I.e. for each word W in INPUT, the query must first findeach record Rin TRANS that hasW as its translation field; then find each record Q in TRANSwhose language_id is 5 (the target language_id) AND has the same meaning_id as R does. 
E.g.if W is 'katz', then R is


meaning_id: 2
language_id: 3
translation: katz

and therefore the desired Q is


meaning_id: 2
language_id: 5
translation: neko

...and so on.

The only difficulty here is that performance is critical, and in real life,TRANS has around 50M records (and growing),while INPUT has typicallybetween 500and 1000 records.

Any advice on how to make this as fast as possible would be much appreciated.

Thanks!

G.

P.S. Just to show that this post is not just from a college student trying to get around doing homework, below I post my most successful query so far. It works, but it's performance isn't great. And it is annoyingly complex, to boot; I'mvery much the SQLnoob, and if nothing else, at least I'd like to learn to writebetter (
i.e. more elegant,morelegible, more clueful)SQL that this:

SELECT q3.translation, q2.otherstuffFROM( SELECT INPUT.word, q1.meaning_id, INPUT.otherstuff FROM INPUT INNER JOIN ( SELECT translation, meaning_id FROM TRANS WHERE translation IN (SELECT word FROM INPUT)
 ) AS q1 ON INPUT.word = q1.translation) AS q2LEFT JOIN( SELECT translation, meaning_id FROM TRANS WHERE language_id=5) AS q3ON q2.meaning_id=q3.meaning_id;
As you can see, there are additional fields that I didn't mention in my original description (e.g. INPUT.otherstuff). Also the above is actually a subquery in a larger query, but it is by far, the worst bottleneck. Last, there's an index on TRANS(translation).




Re: [PERFORM] How to speed up this translation query?

2006-08-01 Thread Niklas Johansson


On 1 aug 2006, at 20.09, tlm wrote:

SELECT q3.translation, q2.otherstuff
FROM
(
  SELECT INPUT.word, q1.meaning_id, INPUT.otherstuff
  FROM
  INPUT
  INNER JOIN
  (
SELECT translation, meaning_id
FROM TRANS
WHERE translation IN (SELECT word FROM INPUT)
  ) AS q1
  ON INPUT.word = q1.translation
) AS q2
LEFT JOIN
(
  SELECT translation, meaning_id
  FROM TRANS
  WHERE language_id=5
) AS q3
ON q2.meaning_id=q3.meaning_id;


Maybe I'm not following you properly, but I think you've made things  
a little bit more complicated than they need to be. The nested sub- 
selects look a little nasty.


Now, you didn't provide any explain output but I think the following  
SQL will achieve the same result, and hopefully produce a better plan:


SELECT t2.translation, i.otherstuff
FROM input i INNER JOIN trans t ON i.word=t.translation
INNER JOIN trans t2 ON t.meaning_id=t2.meaning_id
WHERE t2.language_id=5;

The query will also benefit from indices on trans.meaning_id and  
trans.language_id. Also make sure the tables are vacuumed and  
analyzed, to allow the planner to make good estimates.




Sincerely,

Niklas Johansson





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings