Re: [PERFORM] performance of implicit join vs. explicit conditions on inet queries?
"Robert Edmonds" <[EMAIL PROTECTED]> wrote > > EXPLAIN ANALYZE > SELECT * > FROM inet_addresses > WHERE addr << inet('10.2.0.0/24') >OR addr << inet('10.4.0.0/24') >OR addr << inet('10.8.0.0/24'); > > Bitmap Heap Scan on inet_addresses (cost=6.51..324.48 rows=1792335 > width=11) (actual time=0.350..1.104 rows=381 loops=1) > Recheck Cond: ((addr << '10.2.0.0/24'::inet) OR (addr << > '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet)) > Filter: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet) > OR (addr << '10.8.0.0/24'::inet)) > -> BitmapOr (cost=6.51..6.51 rows=85 width=0) (actual > time=0.336..0.336 rows=0 loops=1) > -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 > rows=28 width=0) (actual time=0.127..0.127 rows=127 loops=1) > Index Cond: ((addr > '10.2.0.0/24'::inet) AND (addr <= > '10.2.0.255'::inet)) > -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 > rows=28 width=0) (actual time=0.109..0.109 rows=127 loops=1) > Index Cond: ((addr > '10.4.0.0/24'::inet) AND (addr <= > '10.4.0.255'::inet)) > -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 > rows=28 width=0) (actual time=0.096..0.096 rows=127 loops=1) > Index Cond: ((addr > '10.8.0.0/24'::inet) AND (addr <= > '10.8.0.255'::inet)) > Total runtime: 1.613 ms > > >Instead of specifying explicit address ranges in the query, I'd like >to store the ranges in a table: > > > inet_test_db=# \d inet_ranges > Table "public.inet_ranges" > Column | Type | Modifiers > --+-+--- > range| inet| not null > range_id | integer | > Indexes: >"inet_ranges_pkey" PRIMARY KEY, btree (range) >"inet_ranges_range_id_idx" btree (range_id) > > inet_test_db=# SELECT * FROM inet_ranges; >range | range_id > --+-- > 10.2.0.0/24 |1 > 10.4.0.0/24 |1 > 10.8.0.0/24 |1 > 10.16.0.0/24 |2 > 10.32.0.0/24 |2 > 10.64.0.0/24 |2 > (6 rows) > > > > EXPLAIN ANALYZE > SELECT * > FROM inet_addresses as ia, inet_ranges as ir > WHERE ia.addr << ir.range > AND ir.range_id=1; > > Nested Loop (cost=0.00..171485.93 rows=3072574 width=26) (actual > time=1465.803..16922.979 rows=381 loops=1) > Join Filter: ("inner".addr << "outer".range) > -> Seq Scan on inet_ranges ir (cost=0.00..1.07 rows=3 width=15) > (actual time=0.008..0.021 rows=3 loops=1) > Filter: (range_id = 1) > -> Seq Scan on inet_addresses ia (cost=0.00..31556.83 rows=2048383 > width=11) (actual time=0.003..2919.405 rows=2048383 loops=3) > Total runtime: 16923.457 ms > Good illustration. I guess we have a problem of the historgram statistical information. That is, the historgrams we used can effectively record the linear space ranges(like ordinary <, >, =), but failed to do it for nonlinear ranges like inet data type. So the Nested Loop node make an error in estmating number of rows (est: 3072574, real: 381), thus a sequential scan is obviously better under this estimation. I am thinking the historgram problem is not easy to fix, but is there a way to change Inet type a little bit to make it linear for your range operators? (for example, align the length to 000.000.000.000/00?) Regards, Qingqing ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Best way to check for new data.
Rodrigo Madera wrote: I have a table that holds entries as in a ficticious table Log(id integer, msg text). Lets say then that I have the program log_tail that has as it´s sole purpose to print newly added data elements. What is the best solution in terms of performace? I have a system that does this. We do it by PK, the PK is bigint, and always increases, the client remembers the last key seen as queries based on that key... select ... where events.event_id > ?::bigint order by events.event_id limit 2000 it works, but when alot of data is added, it can become sensative to the index statistics getting out of sync with the data. Best to insert, then update the statistics, then read the data. For us these three activities are independent, but it still seems to work. I'd investigate the notify mechanism suggested by Otto if you can afford to use a postgres specific mechanism like that. David
Re: [PERFORM] Best way to check for new data.
There are a few ways to do this...thinking about it a bit, I would add a timestamp column to your log table (indexed) and keep a control table which keeps track of the last log print sweep operation. The print operation would just do select * from log where logtime > (select lastlogtime()); The idea here is not to have to keep track of anything on the log table like a flag indicating print status, which will cause some bloat issues. All you have to do is reindex once in a while. lastlogtime() is a function which returns the last log time sweep from the control table. we use a function declared immutable to force planner to treat as a constant (others might tell you to do different here). Merlin From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rodrigo Madera Sent: Friday, October 28, 2005 5:39 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Best way to check for new data. I have a table that holds entries as in a ficticious table Log(id integer, msg text). Lets say then that I have the program log_tail that has as it´s sole purpose to print newly added data elements. What is the best solution in terms of performace? Thank you for your time, Rodrigo ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] performance of implicit join vs. explicit conditions on inet queries?
"Qingqing Zhou" <[EMAIL PROTECTED]> writes: > "Robert Edmonds" <[EMAIL PROTECTED]> wrote >> Instead of specifying explicit address ranges in the query, I'd like >> to store the ranges in a table: > Good illustration. I guess we have a problem of the historgram statistical > information. No, that's completely irrelevant to his problem. The reason we can't do this is that the transformation from "x << const" to a range check on x is a plan-time transformation; there's no mechanism in place to do it at runtime. This is not easy to fix, because the mechanism that's doing it is primarily intended for LIKE/regex index optimization, and in that case a runtime pattern might well not be optimizable at all. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SOLVED: insertion of bytea
> I'm CPU-bound with an I/O well below what my disks could do :( > [...] > > PS1: someone off-list suggested using oprofile, which I will do. I've used oprofile and found out that with my test client (lots of bytea inserts) the server burns a lot of CPU time in pglz_compress. I'm using random data and my production data will be closed to random (due to noise!), so compression is of course pointless. By using alter table dbtest alter img set storage external; I can tell the server not to compress. On a test box this brought net insert rate up by 50%, which is enough to meet the requirements. Thanks again :) Bye, Chris. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] 8.x index insert performance
Greetings, We are running some performance tests in which we are attempting to insert about 100,000,000 rows in a database at a sustained rate. About 50M rows in, our performance drops dramatically. This test is with data that we believe to be close to what we will encounter in production. However in tests with purely generated, sequential data, we did not notice this slowdown. I'm trying to figure out what patterns in the "real" data may be causing us problems. I have log,data and indexes on separate LUNs on an EMC SAN. Prior to slowdown, each partition is writing at a consistent rate. Index partition is reading at a much lower rate. At the time of slowdown, index partition read rate increases, all write rates decrease. CPU utilization drops. The server is doing nothing aside from running the DB. It is a dual opteron (dual core, looks like 4 cpus) with 4GB RAM. shared_buffers = 32768. fsync = off. Postgres version is 8.1.b4. OS is SuSE Enterprise server 9. My leading hypothesis is that one indexed column may be leading to our issue. The column in question is a varchar(12) column which is non-null in about 2% of the rows. The value of this column is 5 characters which are the same for every row, followed by a 7 character zero filled base 36 integer. Thus, every value of this field will be exactly 12 bytes long, and will be substantially the same down to the last bytes. Could this pattern be pessimal for a postgresql btree index? I'm running a test now to see if I can verify, but my runs take quite a long time... If this sounds like an unlikely culprit how can I go about tracking down the issue? Thanks, -K ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 8.x index insert performance
Kelly wrote: > We are running some performance tests in which we are attempting to > insert about 100,000,000 rows in a database at a sustained rate. About > 50M rows in, our performance drops dramatically. > > This test is with data that we believe to be close to what we will > encounter in production. However in tests with purely generated, > sequential data, we did not notice this slowdown. I'm trying to figure > out what patterns in the "real" data may be causing us problems. > > I have log,data and indexes on separate LUNs on an EMC SAN. Prior to > slowdown, each partition is writing at a consistent rate. Index > partition is reading at a much lower rate. At the time of slowdown, > index partition read rate increases, all write rates decrease. CPU > utilization drops. > > The server is doing nothing aside from running the DB. It is a dual > opteron (dual core, looks like 4 cpus) with 4GB RAM. shared_buffers = > 32768. fsync = off. Postgres version is 8.1.b4. OS is SuSE Enterprise > server 9. > > My leading hypothesis is that one indexed column may be leading to our > issue. The column in question is a varchar(12) column which is non-null > in about 2% of the rows. The value of this column is 5 characters which > are the same for every row, followed by a 7 character zero filled base > 36 integer. Thus, every value of this field will be exactly 12 bytes > long, and will be substantially the same down to the last bytes. > > Could this pattern be pessimal for a postgresql btree index? I'm > running a test now to see if I can verify, but my runs take quite a long > time... > > If this sounds like an unlikely culprit how can I go about tracking down > the issue? well, can you defer index generation until after loading the set (or use COPY?) if that index is causing the problem, you may want to consider setting up partial index to exclude null values. One interesting thing to do would be to run your inserting process until slowdown happens, stop the process, and reindex the table and then resume it, and see if this helps. Merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 8.x index insert performance
On Mon, 2005-10-31 at 12:32 -0500, Merlin Moncure wrote: > well, can you defer index generation until after loading the set (or use > COPY?) I cannot defer index generation. We are using the copy API. Copying 1 rows in a batch. > > if that index is causing the problem, you may want to consider setting > up partial index to exclude null values. This is a single column index. I assumed that null column values were not indexed. Is my assumption incorrect? -K ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 8.x index insert performance
> We are running some performance tests in which we are attempting to > insert about 100,000,000 rows in a database at a sustained rate. About > 50M rows in, our performance drops dramatically. > > This test is with data that we believe to be close to what we will > encounter in production. However in tests with purely generated, > sequential data, we did not notice this slowdown. I'm trying to figure > out what patterns in the "real" data may be causing us problems. > > I have log,data and indexes on separate LUNs on an EMC SAN. Prior to > slowdown, each partition is writing at a consistent rate. Index > partition is reading at a much lower rate. At the time of slowdown, > index partition read rate increases, all write rates decrease. CPU > utilization drops. I'm doing some test-inserts (albeit with much fewer records) into 8.0.4 (on FreeBSD 6.0 RC1) and the import-time decreased three-fold when I increased the below mentioned values: shared_buffers = 8192 commit_delay = 10 commit_siblings = 1000 When I increased shared_buffers the kernel needed minor tweaking. regards Claus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 8.x index insert performance
> > if that index is causing the problem, you may want to consider setting > > up partial index to exclude null values. > > This is a single column index. I assumed that null column values were > not indexed. Is my assumption incorrect? > > -K It turns out it is, or it certainly seems to be. I didn't know that :). So partial index will probably not help for null exclusion... would be interesting to see if you are getting swaps (check pg_tmp) when performance breaks down. That is an easy fix, bump work_mem. Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 8.x index insert performance
On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > if that index is causing the problem, you may want to consider setting > up partial index to exclude null values. Hey all. Pardon my ignorance. :-) I've been trying to figure out whether null values are indexed or not from the documentation. I was under the impression, that null values are not stored in the index. Occassionally, though, I then see a suggestion such as the above, that seems to indicate to me that null values *are* stored in the index, allowing for the 'exclude null values' to have effect? Which is it? :-) Thanks, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 8.x index insert performance
> On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > > if that index is causing the problem, you may want to consider setting > > up partial index to exclude null values. > > Hey all. > > Pardon my ignorance. :-) > > I've been trying to figure out whether null values are indexed or not from > the documentation. I was under the impression, that null values are not > stored in the index. Occassionally, though, I then see a suggestion such > as the above, that seems to indicate to me that null values *are* stored > in the index, allowing for the 'exclude null values' to have effect? > > Which is it? :-) I think I'm the ignorant one...do explain on any lookup on an indexed field where the field value is null and you get a seqscan. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 8.x index insert performance
[EMAIL PROTECTED] writes: > I've been trying to figure out whether null values are indexed or not from > the documentation. I was under the impression, that null values are not > stored in the index. You're mistaken, at least with regard to btree indexes. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.x index insert performance
On Mon, 2005-10-31 at 15:30 -0500, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > I've been trying to figure out whether null values are indexed or not from > > the documentation. I was under the impression, that null values are not > > stored in the index. > > You're mistaken, at least with regard to btree indexes. Ha! So I'm creating an index 98% full of nulls! Looks like this is easily fixed with partial indexes. -K ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.x index insert performance
> [EMAIL PROTECTED] writes: > > I've been trying to figure out whether null values are indexed or not > from > > the documentation. I was under the impression, that null values are not > > stored in the index. > > You're mistaken, at least with regard to btree indexes. hmm. I tried several different ways to filter/extract null values from an indexed key and got a seq scan every time. The only way I could query for/against null values was to convert to bool via function. However I did a partial exclusion on a 1% non null value really big table and index size dropped as expected. Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.x index insert performance
On Mon, 2005-10-31 at 13:13, Merlin Moncure wrote: > > > if that index is causing the problem, you may want to consider > setting > > > up partial index to exclude null values. > > > > This is a single column index. I assumed that null column values were > > not indexed. Is my assumption incorrect? > > > > -K > It turns out it is, or it certainly seems to be. I didn't know that :). > So partial index will probably not help for null exclusion... > > would be interesting to see if you are getting swaps (check pg_tmp) when > performance breaks down. That is an easy fix, bump work_mem. OK, here's the issue in a nutshell. NULLS, like everything else, are indexed. HOWEVER, there's no way for them to be used by a normal query, since =NULL is not a legal construct. So, you can't do something like: select * from sometable where somefield = NULL because you won't get any answers, since nothing can equal NULL and select * from sometable where somefield IS NULL won't work because IS is not a nomally indexible operator. Which is why you can create two indexes on a table to get around this like so: create index iname1 on table (field) where field IS NULL and create index iname2 on table (field) where field IS NOT NULL And then the nulls are indexable by IS / IS NOT NULL. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 8.x index insert performance
> select * from sometable where somefield IS NULL won't work because IS is > not a nomally indexible operator. Ah, I didn't know that. So there is no real reason not to exclude null values from all your indexes :). Reading Tom's recent comments everything is clear now. Instead of using your two index approach I prefer to: create function nullidx(anyelement) returns boolean as $$ select $1 is null; $$ language sql immutable; create index on t(nullidx(f)); -- etc Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.x index insert performance
"Merlin Moncure" <[EMAIL PROTECTED]> writes: >> You're mistaken, at least with regard to btree indexes. > hmm. I tried several different ways to filter/extract null values from > an indexed key and got a seq scan every time. I said they were stored, not that you could query against them ;-) IS NULL isn't considered an indexable operator, mainly because it's not an operator at all in the strict sense of the word; and our index access APIs only support querying on indexable operators. The reason they're stored is that they have to be in order to make multi-column indexes work right. I suppose we could special-case single-column indexes, but we don't. In any case, it's more likely that someone would one day get around to making IS NULL an indexable operator than that we'd insert a special case like that. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.x index insert performance
Kelly Burkhart <[EMAIL PROTECTED]> writes: > Ha! So I'm creating an index 98% full of nulls! Looks like this is > easily fixed with partial indexes. Still, though, it's not immediately clear why you'd be seeing a severe dropoff in insert performance after 50M rows. Even though there are lots of nulls, I don't see why they'd behave any worse for insert speed than real data. One would like to think that the insert speed would follow a nice O(log N) rule. Are you doing the inserts all in one transaction, or several? If several, could you get a gprof profile of inserting the same number of rows (say a million or so) both before and after the unexpected dropoff occurs? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] 8.1beta3 performance
We're running 8.1beta3 on one server and are having ridiculous performance issues. This is a 2 cpu Opteron box and both processors are staying at 98 or 99% utilization processing not-that-complex queries. Prior to the upgrade, our I/O wait time was about 60% and cpu utilization rarely got very high, now I/O wait time is at or near zero. I'm planning to go back to 8.0 tonight or tomorrow night but thought I'd check the pqsql-performance prophets before I gave it up. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 8.x index insert performance
On Mon, Oct 31, 2005 at 03:27:31PM -0500, Merlin Moncure wrote: > > On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > > > if that index is causing the problem, you may want to consider setting > > > up partial index to exclude null values. > > Hey all. > > Pardon my ignorance. :-) > > I've been trying to figure out whether null values are indexed or not from > > the documentation. I was under the impression, that null values are not > > stored in the index. Occassionally, though, I then see a suggestion such > > as the above, that seems to indicate to me that null values *are* stored > > in the index, allowing for the 'exclude null values' to have effect? > > Which is it? :-) > I think I'm the ignorant one...do explain on any lookup on an indexed > field where the field value is null and you get a seqscan. Nahhh... I think the documentation could use more explicit or obvious explanation. Or, I could have checked the source code to see. In any case, I expect we aren't the only ones that lacked confidence. Tom was kind enough to point out that null values are stored. I expect that the seqscan is used if the null values are not selective enough, the same as any other value that isn't selective enough. Now we can both have a little more confidence! :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 8.1beta3 performance
On Mon, Oct 31, 2005 at 05:16:46PM -0600, PostgreSQL wrote: > We're running 8.1beta3 on one server and are having ridiculous performance > issues. This is a 2 cpu Opteron box and both processors are staying at 98 > or 99% utilization processing not-that-complex queries. Prior to the > upgrade, our I/O wait time was about 60% and cpu utilization rarely got very > high, now I/O wait time is at or near zero. It sounds like some query got planned a different way that happened to be really suboptimal -- I've seen really bad queries be quick on earlier versions "by accident" and then not have the same luck on later versions. Could you find out what queries are taking so long (use log_min_duration_statement), and post table definitions and EXPLAIN ANALYZE output here? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 8.1beta3 performance
On Mon, 2005-31-10 at 17:16 -0600, PostgreSQL wrote: > We're running 8.1beta3 on one server and are having ridiculous performance > issues. This is a 2 cpu Opteron box and both processors are staying at 98 > or 99% utilization processing not-that-complex queries. Prior to the > upgrade, our I/O wait time was about 60% and cpu utilization rarely got very > high, now I/O wait time is at or near zero. Have you done anything to verify that this is actually a problem with 8.1, and not some other change that was made as part of the upgrade process? For example, if ANALYZE hasn't been re-run, that could cause the plans chosen by the optimizer to be completely different. -Neil ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings