Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 21.08.2008 um 19:08 schrieb Merlin Moncure: On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <[EMAIL PROTECTED]> wrote: Am 21.08.2008 um 16:39 schrieb Scott Carey: It looks to me like the work_mem did have an effect. Your earlier queries had a sort followed by group aggregate

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 21.08.2008 um 16:39 schrieb Scott Carey: It looks to me like the work_mem did have an effect. Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came fr

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 21.08.2008 um 09:04 schrieb Moritz Onken: Am 20.08.2008 um 20:28 schrieb Tom Lane: "Scott Carey" <[EMAIL PROTECTED]> writes: The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Th

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 20.08.2008 um 20:28 schrieb Tom Lane: "Scott Carey" <[EMAIL PROTECTED]> writes: The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug or there's something else wrong here. That is

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 20.08.2008 um 20:06 schrieb Scott Carey: Ok, so the problem boils down to the sort at the end. The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should h

Re: [PERFORM] Slow query with a lot of data

2008-08-20 Thread Moritz Onken
Am 19.08.2008 um 17:23 schrieb Moritz Onken: Am 19.08.2008 um 16:49 schrieb Scott Carey: What is your work_mem set to? The default? Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
Am 19.08.2008 um 16:49 schrieb Scott Carey: What is your work_mem set to? The default? Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the RAM you have just to see what happens. In all

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
Am 19.08.2008 um 14:17 schrieb Matthew Wakeling: On Tue, 19 Aug 2008, Moritz Onken wrote: tablename| attname | n_distinct | correlation result | domain | 1642 | 1 Well, the important thing is the correlation, which is 1, indicating that

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
As far as I can tell, it should. If it is clustered on an index on domain, and then analysed, it should no longer have to sort on domain. Could you post here the results of running: select * from pg_stats where attname = 'domain'; schemaname | tablename| attname | null_fra

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
Am 18.08.2008 um 18:05 schrieb Matthew Wakeling: On Mon, 18 Aug 2008, Moritz Onken wrote: Running the query for more than one user is indeed not much slower. That's what I need. I'm clustering the results table on domain right now. But why is this better than clustering it on &qu

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Moritz Onken
Well, you're getting the database to read the entire contents of the domain_categories table in order. That's 12 million rows - a fair amount of work. You may find that removing the "user = 1337" constraint doesn't make the query much slower - that's where you get a big win by clusterin

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Moritz Onken
Am 18.08.2008 um 16:30 schrieb Matthew Wakeling: On Mon, 18 Aug 2008, Moritz Onken wrote: I have indexes on result.domain, domain_categories.domain, result.user, domain_categories.category. Clustered result on user and domain_categories on domain. "-> Materializ

[PERFORM] Slow query with a lot of data

2008-08-18 Thread Moritz Onken
Hi, I run this query: select max(a."user"), b.category, count(1) from result a, domain_categories b where a."domain" = b."domain" group by b.category; the table result contains all websites a user visited. And the table domain_categories contains all categories a domain is in. result has 2

Re: [PERFORM] Using PK value as a String

2008-08-12 Thread Moritz Onken
Am 12.08.2008 um 17:21 schrieb Bill Moran: In response to Moritz Onken <[EMAIL PROTECTED]>: Am 12.08.2008 um 17:04 schrieb Bill Moran: In response to Moritz Onken <[EMAIL PROTECTED]>: We chose UUID as PK because there is still some information in an integer key. You can s

Re: [PERFORM] Using PK value as a String

2008-08-12 Thread Moritz Onken
Am 12.08.2008 um 17:04 schrieb Bill Moran: In response to Moritz Onken <[EMAIL PROTECTED]>: We chose UUID as PK because there is still some information in an integer key. You can see if a user has registered before someone else (user1.id < user2.id) or you can see how many

Re: [PERFORM] Using PK value as a String

2008-08-12 Thread Moritz Onken
We chose UUID as PK because there is still some information in an integer key. You can see if a user has registered before someone else (user1.id < user2.id) or you can see how many new users registered in a specific period of time (compare the id of the newest user to the id a week ago). Thi

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Moritz Onken
The thing here is that you are effectively causing Postgres to run a sub-select for each row of the "result" table, each time generating either an empty list or a list with one or more identical URLs. This is effectively forcing a nested loop. In a way, you have two constraints where you

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Moritz Onken
Am 30.06.2008 um 16:59 schrieb Steinar H. Gunderson: On Mon, Jun 30, 2008 at 09:16:06AM +0200, Moritz Onken wrote: the result table has 20.000.000 records and the item table 5.000.000. The query select count(1) from result where url in (select shorturl from item where shorturl = result.url

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Moritz Onken
However there's a lot more scope for improving a query along these lines, like adding indexes, or CLUSTERing on an index. It depends what other queries you are wanting to run. I don't know how much update/insert activity there will be on your database. However, if you were to add an inde

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Moritz Onken
Am 30.06.2008 um 12:19 schrieb Matthew Wakeling: select count(1) from result where url in (select shorturl from item where shorturl = result.url); What on earth is wrong with writing it like this? SELECT COUNT(*) FROM (SELECT DISTINCT result.url FROM result, item WHERE item.shorturl =

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Moritz Onken
Am 28.06.2008 um 21:19 schrieb Steinar H. Gunderson: On Sat, Jun 28, 2008 at 06:24:42PM +0200, Moritz Onken wrote: SELECT distinct url from item where url like 'http://www.micro%' limit 10; Here, the planner knows the pattern beforehand, and can see that it's a simple

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Moritz Onken
Anfang der weitergeleiteten E-Mail: Von: Moritz Onken <[EMAIL PROTECTED]> Datum: 30. Juni 2008 09:16:06 MESZ An: Steinar H. Gunderson <[EMAIL PROTECTED]> Betreff: Re: [PERFORM] Planner should use index on a LIKE 'foo%' query Am 28.06.2008 um 21:19 schrieb Steinar H. Gu

[PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-28 Thread Moritz Onken
Hi, I have a query select count(*) from result where exists (select * from item where item.url LIKE result.url || '%' limit 1); which basically returns the number of items which exist in table result and match a URL in table item by its prefix. I read all about idexes (http://www.postgre

Re: [PERFORM] PostgreSQL performance on a virtual host

2008-03-05 Thread Moritz Onken
We have very good experiences with openVZ as virtualizer. Since it's not a para virtualization like xen it's very fast. Almost as fast as the host. www.openvz.org Am 04.03.2008 um 16:43 schrieb Theo Kramer: Hi We are thinking of running a PostgreSQL instance on a virtual host under Xen.

Re: [PERFORM] store A LOT of 3-tuples for comparisons

2008-02-23 Thread Moritz Onken
I am also wondering about the ordering and whether that matters. Can you have "he", "can", "drink" as well as "drink", "he", "can" and should they be considered the same? If so you will need a different tactic. ordering matters. So the 3-column tactic should work. Thanks for your advice!

[PERFORM] store A LOT of 3-tuples for comparisons

2008-02-22 Thread Moritz Onken
Hi, I need to store a lot of 3-tuples of words (e.g. "he", "can", "drink"), order matters! The source is about 4 GB of these 3-tuples. I need to store them in a table and check whether one of them is already stored, and if that's the case to increment a column named "count" (or something).