Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon

Hi Josh,

On Fri, Oct 03, 2003 at 02:07:10PM -0700, Josh Berkus wrote:
 Dror,
 
  select articlenumber, channel, description, title, link, dtstamp  from
  items, my_channels where items.channel = '2' and my_channels.id =
  '2' and owner = 'drormata'  and dtstamp  last_viewed and
  articlenumber not in (select item from viewed_items where channel
  ='2' and owner = 'drormata');
 
 the NOT IN is a bad idea unless the subselect never returns more than a 
 handful of rows.  If viewed_items can grow to dozens  of rows, wyou should 
 use WHERE NOT EXISTS instead.  Unless you're using 7.4.
 

I am using 7.4, and had tried NOT EXISTS and didn't see any
improvements.

  item_max_date() looks like this:
 select max(dtstamp) from items where channel = $1 and link = $2;
 
 Change it to 
 
 SELECT dtstamp from iterm where channel = $1 and link = $2
 ORDER BY dtstamp DESC LIMIT 1
 

Didn't make a difference. And plugging real values into this query as
well as into  the original 
  select max(dtstamp) from items where channel = $1 and link = $2;

and doing an explain analyze shows that the cost is the same. The
strange things is that when I run the above queries by hand they take
about .5 msec. Yet on a resultset that fetches 5 rows, I go up from 15
msec to 300 msec. It would seem like it should be something like 15 +
(0.5 * 5) + small overhead, = 30 msec or so rather than the 300 I'm
seeing.

 and possibly build an index on channel, link, dtstamp

Didn't make a difference either. Explain analyze shows that it didn't
use it.

 
 -- 
 -Josh Berkus
  Aglio Database Solutions
  San Francisco
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon
On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
  item_max_date() looks like this:
 select max(dtstamp) from items where channel = $1 and link = $2;
 
 It is too bad the (channel, link) index doesn't have dtstamp at the end
 of it, otherwise the below query would be a gain (might be a small one
 anyway).
 
   select dtstamp
 from items
where channel = $1
  and link = $2
 ORDER BY dtstamp DESC
LIMIT 1;

Similar idea to what Josh suggested. I did create an additional index
with dtstamp at the end and it doesn't look like the planner used it.
Using the above query instead of max() didn't improve things either.

 
 
 Could you show us the exact specification of the function?  In
 particular, did you mark it VOLATILE, IMMUTABLE, or STABLE?
 
 I hope it isn't the first or second one ;)

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql';




-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon
On Fri, Oct 03, 2003 at 02:35:46PM -0700, Josh Berkus wrote:
 Dror,
 
  I am using 7.4, and had tried NOT EXISTS and didn't see any
  improvements.
 
 It wouldn't if you're using 7.4, which has improved IN performance immensely.
 
 What happens if you stop using a function and instead use a subselect?

An improvement. Now I'm getting in the 200 msec response time. 

And by the way, I tried not exists again and it actually runs slower
than not in.

 
 -- 
 -Josh Berkus
  Aglio Database Solutions
  San Francisco
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon
On Fri, Oct 03, 2003 at 06:10:29PM -0400, Rod Taylor wrote:
 On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
  On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
item_max_date() looks like this:
   select max(dtstamp) from items where channel = $1 and link = $2;
   
   It is too bad the (channel, link) index doesn't have dtstamp at the end
   of it, otherwise the below query would be a gain (might be a small one
   anyway).
   
 select dtstamp
   from items
  where channel = $1
and link = $2
   ORDER BY dtstamp DESC
  LIMIT 1;
 
 It didn't make a difference even with the 3 term index? I guess you
 don't have very many common values for channel / link combination.

There's no noticeable difference between two term and three term
indexes.

 
 
 
 How about the below? Note the word STABLE on the end.
 
 CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
 timestamptz AS '
 select max(dtstamp) from items where channel = $1 and link = $2;
 ' LANGUAGE 'sql' STABLE;

Made no difference.



-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speeding up Aggregates

2003-10-08 Thread Dror Matalon

Actually what finally sovled the problem is repeating the 
dtstamp  last_viewed
in the sub select

select articlenumber, channel, description, title, link, dtstamp  from items i1, 
my_channels where ((i1.channel = '2' and
my_channels.id = '2' and owner = 'drormata'  and (dtstamp  last_viewed)) ) and 
(dtstamp = (select max (dtstamp) from items i2 
where channel = '2' and i1.link = i2.link));

to
explain analyze select articlenumber, channel, description, title, link, dtstamp  from 
items i1, my_channels where ((i1.channel = '2' and
my_channels.id = '2' and owner = 'drormata'  and (dtstamp  last_viewed)) ) and 
(dtstamp = (select max (dtstamp) from items i2 where
channel = '2' and i1.link = i2.link and dtstamp  last_viewed));

Which in the stored procedure looks like this:
CREATE or REPLACE FUNCTION item_max_date (int4, varchar, timestamptz)
RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2 and
dtstamp  $3;
' LANGUAGE 'sql';


Basically I have hundreds or thousands of items but only a few that
satisfy dtstamp  last_viewed. Obviously I want to run the max() only on
on a few items. Repeating dtstamp  last_viewed did the trick, but it
seems like there should be a more elegant/clear way to tell the planner
which constraint to apply first.

Dror



On Wed, Oct 08, 2003 at 10:54:24AM -0400, Greg Stark wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
 
  On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
   On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
   
It is too bad the (channel, link) index doesn't have dtstamp at the end
of it, otherwise the below query would be a gain (might be a small one
anyway).

  select dtstamp
from items
   where channel = $1
 and link = $2
ORDER BY dtstamp DESC
   LIMIT 1;
  
  It didn't make a difference even with the 3 term index? I guess you
  don't have very many common values for channel / link combination.
 
 You need to do:
 
  ORDER BY channel DESC, link DESC, dtstamp DESC
 
 This is an optimizer nit. It doesn't notice that since it selected on channel
 and link already the remaining tuples in the index will be ordered simply by
 dtstamp.
 
 (This is the thing i pointed out previously in
 [EMAIL PROTECTED] on Feb 13th 2003 on pgsql-general)
 
 
 -- 
 greg
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Compare rows

2003-10-08 Thread Dror Matalon
Greg,

On Wed, Oct 08, 2003 at 03:07:30PM -0400, Greg Spiegelberg wrote:
 Dror,
 
 I gave this some serious thought at first.  I only deal with
 int8, numeric(24,12) and varchar(32) columns which I could
 reduce to 3 different tables.  Problem was going from 1700-3000

I'm not sure how the data types come into play here. I was for the most
part following your examples.

 rows to around 300,000-1,000,000 rows per system per day that
 is sending data to our database.
 

Depending on the distribution of your data you can end up with more,
less or roughly the same amount of data in the end. It all depends on
how many of the 600+ columns change every time you insert a row. If only
a few of them do, then you'll clearly end up with less total data, since
you'll be writing several rows that are very short instead of one
huge row that contains all the information. In other words, you're
tracking changes better.

It also sounds like you feel that having a few thousand rows in a very
wide table is better than having 300,000 - 1,00,000 rows in a narrow
table. My gut feeling is that it's the other way around, but there are
plenty of people on this list who can provide a more informed answer.

Using the above eample, assuming that both tables roughly have the same
number of pages in them, would postgres deal better with a table with
3-4 columns with 300,000 - 1,000,000 rows or with a table with several
hundred columns with only 3000 or so rows?

Regards,

Dror


 BTW, the int8 and numeric(24,12) are for future expansion.
 I hate limits.
 
 Greg
 
 
 Dror Matalon wrote:
 It's still not quite clear what you're trying to do. Many people's gut
 reaction is that you're doing something strange with so many columns in
 a table.
 
 Using your example, a different approach might be to do this instead:
 
  Day  |  Name |   Value
  --+-+---
  Oct 1 | OS  | Solaris 5.8 
  Oct 1 | Patch   | 108528-12
  Oct 3 | Patch   | 108528-13
 
 
 You end up with lots more rows, fewer columns, but it might be
 harder to query the table. On the other hand, queries should run quite
 fast, since it's a much more normal table.
 
 But without knowing more, and seeing what the other columns look like,
 it's hard to tell.
 
 Dror
 
 
 -- 
 Greg Spiegelberg
  Sr. Product Development Engineer
  Cranel, Incorporated.
  Phone: 614.318.4314
  Fax:   614.431.8388
  Email: [EMAIL PROTECTED]
 Cranel. Technology. Integrity. Focus.
 
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-09 Thread Dror Matalon

Yeah, I had similar thought to Oliver's and suspected that this would be
the answer. 
Also, while it's not too hard to do this for a single platform, it gets
complecated once you start looking at different ones.

Josh, let me know when you're ready to do this. I'll try to help,
although my perl's kind of rusty. Also, can you even assume perl for a
postgres install? Does Solaris, for instance come with perl?

Dror

On Thu, Oct 09, 2003 at 09:56:11AM -0700, Josh Berkus wrote:
 Oliver,
 
  I think instead of thinking about where to put the
  information about tuning, someone should provide a
  pgsql-autotune. Maybe even a shell script would do the
  trick.
 
 Well, you see, there's the issue.  I think someone.  Lots of people have 
 spoken in favor of an auto-conf script; nobody so far has stepped forward 
 to get it done for 7.4, and I doubt we have time now.
 
 I'll probably create a Perl script in a month or so, but not before that 
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Speeding up Aggregates

2003-10-09 Thread Dror Matalon
On Thu, Oct 09, 2003 at 07:07:00PM -0400, Greg Stark wrote:
 Dror Matalon [EMAIL PROTECTED] writes:
 
  Actually what finally sovled the problem is repeating the 
  dtstamp  last_viewed
  in the sub select
 
 That will at least convince the optimizer to use an index range lookup. But it
 still will have to scan every record that matches channel==$1, link==$2, and
 dtstamp$3.
 
 The trick of using limit 1 will be faster still as it only has to retrieve a
 single record using the index. But you have to be sure to convince it to use

How is doing order by limit 1 faster than doing max()? Seems like the
optimizer will need to sort or scan the data set either way. That part
didn't actually make a difference in my specific case.


 the index and the way to do that is to list exactly the same columns in the
 ORDER BY as are in the index definition. 
 
 Even if some of the leading columns are redundant because they'll be constant
 for all of the records retrieved. The optimizer doesn't know to ignore those.

The main problem in my case was that the optimizer was doing the max()
on all 700 rows, rather than the filtered rows. It's not until I put the
dtstamp last_viewed in the sub select as well as in the main query
that it realized that it can first filter the 696 rows out and then to
the max() on the 4 rows that satisfied this constraint. 

That was the big saving.

Hope this all makes sense,

Dror
 
   (This is the thing i pointed out previously in
   [EMAIL PROTECTED] on Feb 13th 2003 on pgsql-general)
 
 -- 
 greg
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Speeding up Aggregates

2003-10-10 Thread Dror Matalon
On Fri, Oct 10, 2003 at 10:32:32AM -0700, Josh Berkus wrote:
 Dror,
 
  Ouch. I just double checked and you're right. Is this considered a bug,
  or just an implementation issue?
 
 It's an implementation issue, which may be fixed by 7.5 but not sooner.  
 Basically, the free ability of PostgreSQL users to define their own 
 aggregates limits our ability to define query planner optimization for 
 aggregates.   Only recently has anyone suggested a feasable way around this.
 
  While I've seen this hint a few times in the lists, it seems like it's
  one of those magic incantations that those in the know, know about, and
  that people new to postgres are going to be surprised by the need to use
  this idiom.
 
 It IS in the FAQ.

Might be a good idea to put it in its own section rather than under My
queries are slow or don't make use of the indexes. Why?

Also, you might want to take out for 7.4

4.22) Why are my subqueries using IN so slow?


 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Various performance questions

2003-10-26 Thread Dror Matalon
On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote:
 Dror Matalon [EMAIL PROTECTED] writes:
 
  explain analyze select count(*) from items where channel  5000;
  QUERY PLAN
  --
   Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual 
  time=26224.603..26224.608 rows=1 loops=1)
 -  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual 
  time=7.599..17686.869 rows=1632057 loops=1)
   Filter: (channel  5000)
   Total runtime: 26224.703 ms
  
  
  How can it do a sequential scan and apply a filter to it in less time
  than the full sequential scan? Is it actually using an index without
  really telling me? 
 
 It's not using the index and not telling you. 
 
 It's possible the count(*) operator itself is taking some time. Postgres

I find it hard to believe that the actual counting would take a
significant amount of time.

 doesn't have to call it on the rows that don't match the where clause. How
 long does explain analyze select 1 from items with and without the where
 clause take?

Same as count(*). Around 55 secs with no where clause, around 25 secs
with.

 
 What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an

This is 7.4.

 int8 to store its count so it's not limited to 4 billion records.
 Unfortunately int8 is somewhat inefficient as it has to be dynamically
 allocated repeatedly. It's possible it's making a noticeable difference,
 especially with all the pages in cache, though I'm a bit surprised. There's
 some thought about optimizing this in 7.5.
 
 -- 
 greg
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 12:52:27PM +0530, Shridhar Daithankar wrote:
 Dror Matalon wrote:
 
 On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
 Most of the time involves:
 
 a) Reading each page of the table, and
 b) Figuring out which records on those pages are still live.
 
 
 The table has been VACUUM ANALYZED so that there are no dead records.
 It's still not clear why select count() would be slower than select with
 a where clause.
 
 Do a vacuum verbose full and then everything should be within small range 
 of each other.
 

I did vaccum full verbose and the results are the same as before, 55
seconds for count(*) and 26 seconds for count(*) where channel  5000.

 Also in the where clause, does explicitly typecasting helps?
 
 Like 'where channel5000::int2;'

It makes no difference.

 
  HTH
 
  Shridhar
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 11:12:37AM -0500, Vivek Khera wrote:
  DM == Dror Matalon [EMAIL PROTECTED] writes:
 
 DM effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192
 
 DM 1. While it seems to work correctly, I'm unclear on why this number is
 DM correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
 DM seems like the number should be more like 1 - 1.5 Gigs.
 
 Nope, that's correct...

I know it's correct. I was asking why it's correct.

 
 
 -- 
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Vivek Khera, Ph.D.Khera Communications, Inc.
 Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
 AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 07:52:06AM -0500, Christopher Browne wrote:
 In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
  I was answering an earlier response that suggested that maybe the actual
  counting took time so it would take quite a bit longer when there are
  more rows to count.
 
 Well, if a where clause allows the system to use an index to search
 for the subset of elements, that would reduce the number of pages that
 have to be examined, thereby diminishing the amount of work.
 
 Why don't you report what EXPLAIN ANALYZE returns as output for the
 query with WHERE clause?  That would allow us to get more of an idea
 of what is going on...


Here it is once again, and I've added another data poing channel 
1000 which takes even less time than channel  5000. It almost seems
like the optimizer knows that it can skip certain rows rows=4910762 vs
rows=1505605 . But how can it do that without using an index or
actually looking at each row?

zp1936= EXPLAIN ANALYZE select count(*) from items;
QUERY PLAN
--
 Aggregate  (cost=245044.53..245044.53 rows=1 width=0) (actual 
time=55806.893..55806.897 rows=1 loops=1)
   -  Seq Scan on items  (cost=0.00..232767.62 rows=4910762 width=0)
(actual time=0.058..30481.482 rows=4910762 loops=1)
 Total runtime: 55806.992 ms
(3 rows)

zp1936= EXPLAIN ANALYZE select count(*) from items where channel  5000;
QUERY PLAN
--
 Aggregate  (cost=248808.54..248808.54 rows=1 width=0) (actual 
time=26071.264..26071.269 rows=1 loops=1)
   -  Seq Scan on items  (cost=0.00..245044.52 rows=1505605 width=0)
(actual time=0.161..17623.033 rows=1632057 loops=1)
 Filter: (channel  5000)
 Total runtime: 26071.361 ms
(4 rows)

zp1936= EXPLAIN ANALYZE select count(*) from items where channel  1000;
  QUERY PLAN
---
 Aggregate  (cost=245429.74..245429.74 rows=1 width=0) (actual 
time=10225.272..10225.276 rows=1 loops=1)
   -  Seq Scan on items  (cost=0.00..245044.52 rows=154085 width=0) (actual 
time=7.633..10083.246 rows=25687 loops=1)
 Filter: (channel  1000)
 Total runtime: 10225.373 ms
(4 rows)


 -- 
 (format nil [EMAIL PROTECTED] cbbrowne acm.org)
 http://www3.sympatico.ca/cbbrowne/spiritual.html
 When  replying, it  is  often possible  to cleverly edit  the original
 message in such a way  as to subtly alter  its meaning or tone to your
 advantage while  appearing that you are  taking pains  to preserve the
 author's intent.   As a   bonus,   it will   seem that your   superior
 intellect is cutting through all the excess verbiage to the very heart
 of the matter.  -- from the Symbolics Guidelines for Sending Mail
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread Dror Matalon

Hi,

Searches with like or regexes often can't use the index. Think of the index as
a sorted list of your items. It's easy to find an item when you know it
starts with mif so ('mif%' should use the index). But when you use a
'like' that starts with '%' the index is useless and the search needs to
do a sequential scan.

Regards,

Dror

On Tue, Nov 25, 2003 at 07:48:49PM +, LIANHE SHAO wrote:
 Hi all,
 
 I want to use index on the gene_symbol column in my
 query and gene_symbol is indexed. but when I use
 lower (gene_symbol) like lower('%mif%'), the index
 is not used. While when I change to
 lower(gene_symbol) = lower('mif'), the index is used
 and index scan works, but this is not what I like. I
 want all the gene_symbols  containing substring
 'mif' are pulled out, and not necessarily exactly match.
 
 could anybody give me some hints how to deal with 
 this. If I do not used index, it take too long for
 the query.
 

 PGA explain select distinct probeset_id, chip,
 gene_symbol, title, sequence_description, pfam from
 affy_array_annotation where lower(gene_symbol) like
 upper('%mif%');
QUERY PLAN
 -
  Unique  (cost=29576.44..29591.44 rows=86 width=265)
-  Sort  (cost=29576.44..29578.59 rows=857
 width=265)
  Sort Key: probeset_id, chip, gene_symbol,
 title, sequence_description, pfam
  -  Seq Scan on affy_array_annotation 
 (cost=0.00..29534.70 rows=857 width=265)
Filter: (lower((gene_symbol)::text)
 ~~ 'MIF%'::text)
 (5 rows)
 
 
 PGA= explain select distinct probeset_id, chip,
 gene_symbol, title, sequence_description, pfam from
 affy_array_annotation where lower(gene_symbol) =
 upper('%mif%');

  QUERY PLAN
 -
  Unique  (cost=3433.44..3448.44 rows=86 width=265)
-  Sort  (cost=3433.44..3435.58 rows=857 width=265)
  Sort Key: probeset_id, chip, gene_symbol,
 title, sequence_description, pfam
  -  Index Scan using gene_symbol_idx_fun1
 on affy_array_annotation  (cost=0.00..3391.70
 rows=857 width=265)
Index Cond:
 (lower((gene_symbol)::text) = '%MIF%'::text)
 (5 rows)
 
 
 
 
 
 Regards,
 William
 
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Maximum Possible Insert Performance?

2003-11-26 Thread Dror Matalon

But the permissions of the base ramdisk might be wrong. I'd su to the
user that you run postgres as (probably postgres), and make sure that
you can go to the directory where the log and the database files are and
make sure you can see the files.

On Wed, Nov 26, 2003 at 10:03:47AM -0800, William Yu wrote:
 Tom Lane wrote:
 William Yu [EMAIL PROTECTED] writes:
 
 I then tried to put the WAL directory onto a ramdisk. I turned off 
 swapping, created a tmpfs mount point and copied the pg_xlog directory 
 over. Everything looked fine as far as I could tell but Postgres just 
 panic'd with a file permissions error. Anybody have thoughts to why 
 tmpfs would not work?
 
 
 I'd say you got the file or directory ownership or permissions wrong.
 
 I did a mv instead of a cp which duplicates ownership  permissions exactly.
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] tuning questions

2003-12-04 Thread Dror Matalon
On Thu, Dec 04, 2003 at 11:59:32AM -0500, Jeff wrote:
 On Thu, 04 Dec 2003 08:06:23 -0800
 Jack Coates [EMAIL PROTECTED] wrote:
 
  testbed:
  dual P3 1.3 GHz box with 2GB RAM
  two IDE 120G drives on separate channels (DMA on), OS on one, DB on
  the other, some swap on each (totalling 2.8G).
  RH Linux 8.
 
 Side Note: be sure to turn off write caching on those disks or you may
 have data corruption in the event of a failure

I've seen this comment several times from different people.
Would someone care to explain how you would get data corruption? I
thought that the whole idea of the log is to provide a journal similar
to what you get in a journaling file system. 

In other words, the db writes a series of transactions to the log and marks 
that log entry (don't know the right nomeclature) as valid. When the db
crashes, it reads the log, and discards the last log entry if it wasn't
marked as valid, and replays any transactions that haven't been
commited ot the db. The end result being that you might loose your last
transaction(s) if the db crashes, but nothing ever gets corrupted.

So what am I missing in this picture?

Regards,

Dror

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] FreeBSD config

2004-02-26 Thread Dror Matalon
On Thu, Feb 26, 2004 at 11:55:31AM -0700, scott.marlowe wrote:
 On Thu, 26 Feb 2004, Dror Matalon wrote:
 
  Hi,
  
  We have postgres running on freebsd 4.9 with 2 Gigs of memory. As per
  repeated advice on the mailing lists we configured effective_cache_size
  = 25520 which you get by doing `sysctl -n vfs.hibufspace` / 8192
  
  Which results in using 200Megs for disk caching. 
  
  Is there a reason not to increase the hibufspace beyond the 200 megs and
  provide a bigger cache to postgres? I looked both on the postgres and
  freebsd mailing lists and couldn't find a good answer to this.
 
 Actually, I think you're confusing effective_cache_size with 
 shared_buffers.

No, I'm not.

 
 effective_cache_size changes no cache settings for postgresql, it simply 
 acts as a hint to the planner on about how much of the dataset your OS / 
 Kernel / Disk cache can hold.

I understand that. The question is why have the OS, in this case FreeBsd
use only 200 Megs for disk cache and not more. Why not double the
vfs.hibufspace  to 418119680 and double the effective_cache_size to 51040.

 
 Making it bigger only tells the query planny it's more likely the data 
 it's looking for will be in cache.
 
 shared_buffers, OTOH, sets the amount of cache that postgresql uses.  It's 
 generall considered that 256 Megs or 1/4 of memory, whichever is LESS, is 
 a good setting for production database servers.
 

Actually last I looked, I thought that the recommended max shared
buffers was 10,000, 80MB,  even on machines with large amounts of memory.

Regards,

Dror

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] FreeBSD config

2004-02-26 Thread Dror Matalon

Thanks for the pointer. So 

maxbufspace = nbuf * BKVASIZE;

Which is confirmed in 
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2003-09/0045.html

and it looks like there's a patch by Sean Chittenden at
http://people.freebsd.org/~seanc/patches/patch-HEAD-kern.nbuf

that does what I was asking. Seems a little on the bleeding edge. Has
anyone tried this?


On Thu, Feb 26, 2004 at 04:36:01PM -0600, Kevin Barnard wrote:
 On 26 Feb 2004 at 13:58, Dror Matalon wrote:
 
  
  which brings me back to my question why not make Freebsd use more of its
  memory for disk caching and then tell postgres about it. 
  
 
 I think there is some confusion about maxbufsize and hibufspace.  I looking at a 
 comment in the FreeBSB  source 4.9 that explains this.  I think you will want to 
 increase effective_cache to match maxbufsize not hibufspace but I could be wrong.
 
 $FreeBSD: src/sys/kern/vfs_bio.c,v 1.242.2.21 line 363
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] FreeBSD config

2004-02-26 Thread Dror Matalon
On Fri, Feb 27, 2004 at 05:47:47AM +0800, Christopher Kings-Lynne wrote:
  We have postgres running on freebsd 4.9 with 2 Gigs of memory. As per
  repeated advice on the mailing lists we configured effective_cache_size
  = 25520 which you get by doing `sysctl -n vfs.hibufspace` / 8192
 
  Which results in using 200Megs for disk caching.
 
 effective_cache_size does nothing of the sort.  CHeck your
 shared_buffers value...

Sigh.

http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

effective_cache_size
Sets the optimizer's assumption about the effective size of the disk
cache (that is, the portion of the kernel's disk cache that will be
used for PostgreSQL data files). This is measured in disk pages, which
are normally 8 kB each.


http://archives.postgresql.org/pgsql-performance/2003-07/msg00159.php
talks about how to programmatically determine the right setting for
effective_cache_size:
case `uname` in FreeBSD)
echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))
;;
*)
echo Unable to automatically determine the effective cache size  
/dev/stderr
;;
esac

which brings me back to my question why not make Freebsd use more of its
memory for disk caching and then tell postgres about it. 



 
  Is there a reason not to increase the hibufspace beyond the 200 megs and
  provide a bigger cache to postgres? I looked both on the postgres and
  freebsd mailing lists and couldn't find a good answer to this.
 
 Well, maybe butnot necessarily.  It's better to leave the OS to look after
 most of your RAM.
 
 Chris
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] FreeBSD config

2004-02-26 Thread Dror Matalon
Hi,

We have postgres running on freebsd 4.9 with 2 Gigs of memory. As per
repeated advice on the mailing lists we configured effective_cache_size
= 25520 which you get by doing `sysctl -n vfs.hibufspace` / 8192

Which results in using 200Megs for disk caching. 

Is there a reason not to increase the hibufspace beyond the 200 megs and
provide a bigger cache to postgres? I looked both on the postgres and
freebsd mailing lists and couldn't find a good answer to this.

If yes, any suggestions on what would be a good size on a 2 Gig machine?

Regards,

Dror


-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] FreeBSD config

2004-02-27 Thread Dror Matalon
I guess the thing to do is to move this topic over to a freebsd list
where we can get more definitive answers on how disk caching is handled.
I asked here since I know that FreeBsd is often recommended,
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html#
as a good platform for postgres, and with Modern machines often having
Gigabytes of memory the issue of, possibly, having a disk cache of 200MB
would be one often asked.

On Fri, Feb 27, 2004 at 12:46:08PM +0530, Shridhar Daithankar wrote:
 Dror Matalon wrote:
 
 Let me try and say it again. I know that setting effective_cache_size
 doesn't affect the OS' cache. I know it just gives Postgres the *idea*
 of how much cache the OS is using. I know that. I also know that a
 correct hint helps performance.
 
 I've read Matt Dillon's discussion about the freebsd VM at
 http://www.daemonnews.org/21/freebsd_vm.html and I didn't see him
 saying that Freebsd uses all the free RAM for disk cache. Would you care
 to provide a URL pointing to that?
 
 I don't believe freeBSD yses everything available unlike linux. It is 
 actually a good thing. If you have 1GB RAM and kernel buffers set at 600MB, 
 you are guaranteed to have some mmory in crunch situations.
 
 As far you original questions, I think you can increase the kernel buffer 
 sizes for VFS safely. However remembet that more to dedicate to kernel 
 buffers, less space you have in case of crunch for whatever reasons.
 
 FreeBSD gives you a control which linux does not. Use it to best of your 
 advantage..
 
  Shridhar

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html