Hi, my answer may be out of topic since you might be looking for a postgres-only solution.. But just in case....
What are you trying to achieve exactly ? Is there any way you could re-work your algorithms to avoid selects and use a sequential scan (consider your postgres data as one big file) to retrieve each of the rows, analyze / compute them (possibly in a distributed manner), and join the results at the end ? A few pointers : http://lucene.apache.org/hadoop/ http://www.gridgain.com/ Regards, Sami Dalouche On Sun, 2007-12-02 at 12:26 +0200, Mindaugas wrote: > Hello, > > Started to work with big tables (like 300GB) and performance problems > started to appear. :( > > To simplify things - table has an index on From an index on To columns. And > it also have several other not indexed columns. There are 100000+ of > different values for From and the same for To. > > I execute simple query "select * from bigtable where From='something'". > Query returns like 1000 rows and takes 5++ seconds to complete. As far as I > understand the query is slow because: > - first it has to retrieve pointers to rows with data from index. That goes > fast. > - retrieve all the rows one by one. There we have 100% random read because > rows with the same From is distributed evenly through all the 300GB and most > probably nothing is cached. So there we are limited by _one_ disk performance > independently of how many disks we have in storage? And in case storage > consists of 15k rpm Cheetahs with 3.5ms average read seek time we should > expect not more than ~285 rows per second? > > I feel that I'm overlooking something here. But I'm new into data > warehousing. :) > > Also this query should greatly benefit from parallel execution or async IO. > Storage (seeks/second) scales almost linearly when it has a lot of disks. And > query is completely IO bound so it should scale well on single server. > > And I cannot use some index organized table or table partitioned by From :) > because there are at least 2 similar indexes by which queries can be executed > - From and To. > > Ideas for improvement? Greenplum or EnterpriseDB? Or I forgot something > from PostgreSQL features. > > Thanks, > > Mindaugas > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend