Greg Stark wrote:
However that's not enough to explain what you've shown. How about you show the
actual query and actual plan you're working with? The plan you've shown can't
result from the query you sent.

Mea culpa, sort of.  But ... in fact, the plan I sent *was* from query I sent, with the 
table/column names changed for clarity.  This time I'll send the plan "raw".  
(This is PG 8.0.1.)

chm=> explain select count(1) from (select normalize_add_salt(smiles) from
chm(>     salt_smiles order by db_no) as foo;
QUERY PLAN ----------------------------------------------------------------------------------------------------
Aggregate  (cost=69.95..69.95 rows=1 width=0)
  ->  Subquery Scan foo  (cost=0.00..67.93 rows=806 width=0)
        ->  Index Scan using salt_smiles_pkey on salt_smiles  (cost=0.00..59.87 
rows=806 width=30)
(3 rows)

As pointed out by Tom and others, this query DOES in fact call the 
normalize_add_salt() function.

Now here's the weird part. (And where my original posting went wrong -- sorry 
for the error!  I got the two queries mixed up.)

I originally had a more complex query, the purpose being to guarantee that the 
function was called on the strings in the order specified.  (More on this 
below.)  Here is the original query I used:

chm=> explain select count(1) from (select normalize_add_salt(smiles)
chm(>   from (select smiles from salt_smiles order by db_no) as foo) as bar;
QUERY PLAN ----------------------------------------------------------------------------------------------------
Aggregate  (cost=67.94..67.94 rows=1 width=0)
  ->  Subquery Scan foo  (cost=0.00..65.92 rows=806 width=0)
        ->  Index Scan using salt_smiles_pkey on salt_smiles  (cost=0.00..57.86 
rows=806 width=30)
(3 rows)

Notice that the plans are essentially identical, yet in this one the function does NOT get called.  
I proved this by brute force, inserting "char **p = NULL; *p = "foo";" into the 
C code to guarantee a segmentation violation if the function gets called.  In the first case it 
does SIGSEGV, and in the second case it does not.

Now the reason for this more-complex query with an additional subselect is that 
the SMILES (which, by the way, are a lexical way of representing chemical 
structures - see www.daylight.com), must be passed to the function in a 
particular order (hence the ORDER BY).  In retrospect I realize the optimizer 
apparently flattens this query anyway (hence the identical plans, above).

But the weird thing is that, in spite of flattening, which would appear to make 
the queries equivalent, the function gets called in one case, and not in the 
other.

Steinar H. Gunderson asked:
select count(1) from (select foo_init(value) from foo_init_table order by value_id) as foo;
Why not just count(foo_init(value))?

Because the SMILES must be processed in a specific order, hence the more 
complex queries.

The simple answer to this whole problem is what Steinar wrote:
This works well, but it requires me to actually retrieve the function's value 800 times.

Is this actually a problem?

No, it's just a nuisance.  It occurs to me that in spite of the ORDER BY 
expression, Postgres is free to evaluate the function first, THEN sort the 
results, which means the SMILES would be processed in random order anyway.  
I.e. my ORDER BY clause is useless for the intended purpose.

So the only way I can see to get this right is to pull the SMILES into my application with the ORDER BY to ensure I have them in the correct order, then send them back one at a time via a "select normalize_add_salt(smiles)", meaning I'll retrieve 800 strings and then send them back.
I just thought there ought to be a way to do this all on the PG server instead of sending 
all these strings back and forth.  I'd like to say to Postgres, "Just do it this 
way, OK?"  But the optimizer can't be turned off, so I guess I have to do it the 
slow way.  The good news is that this is just an initialization step, after which I 
typically process thousands of molecules, so the extra overhead won't kill me.

Thanks to all for your help.

Craig

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to