[PERFORM] Get master-detail relationship metadata

Laszlo Nagy Thu, 03 Feb 2011 03:56:46 -0800


  Hi All,

I'm working on a client program that iterates over master-detailrelationships in a loop chain.


Pseudo code:

for row_1 in table_1:
    table_2 = get_details(row_1,"table2")
    for row_2 in table_2:
        row_3 = get_details(row_2,"table3")
        .... etc.
                process_data(row1,row_2,row_3,....)

My task is to write the "get_details" iterator effectively. The obviousway to do it is to query details in every get_details() call, but thatis not efficient. We have relationships where one master only has a fewdetails. For 1 million master rows, that would result in execution ofmillions of SQL SELECT commands, degrading the performance bymagnitudes. My idea was that the iterator should pre-fetch and cachedata for many master records at once. The get_details() would use thecached rows, thus reducing the number of SQL SELECT statements needed.Actually I wrote the iterator, and it works fine in some cases. For example:


producers = get_rows("producer")
for producer in producers:
    products = get_getails(producer,"product")
    for product in products:
        prices = get_details(product,"prices")
        for price in prices:
            process_product_price(producer,product,price)

This works fine if one producer has not more than 1000 products and oneproduct has not more than 10 prices. I can easly keep 10 000 records inmemory. The actual code executes about 15 SQL queries while iteratingover 1 million rows. Compared to the original "obvious" method,performance is increased to 1500%

But sometimes it just doesn't work. If a producer has 1 millionproducts, and one product has 100 prices, then it won't work, because Icannot keep 100 million prices in memory. My program should somehowfigure out, how much rows it will get for one master, and select betweenthe cached and not cached methods.

So here is the question: is there a way to get this information fromPostgreSQL itself? I know that the query plan contains information aboutthis, but I'm not sure how to extract. Should I run an ANALYZE commandof some kind, and parse the result as a string? For example:


EXPLAIN select * from product where producer_id=1008;
                              QUERY PLAN
----------------------------------------------------------------------
 Seq Scan on product  (cost=0.00..1018914.74 rows=4727498 width=1400)
   Filter: (producer_id = 1008)
(2 rows)

Then I could extract "rows=4727498" to get an idea about how much detailrows I'll get for the master.


Is there any better way to do it? And how reliable is this?


Thanks,

   Laszlo


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Get master-detail relationship metadata

Reply via email to