I cannot understand why the index is not being used when in the plpgsql
function?
I even tried to make a test function containing nothing more than the
single query. Still the index is not being used.
When running the same query in the sql prompt, the index is in use
though.
Please post
Actually, planner was smart in using a bitmap index scan in the prepared
query. Suppose you later EXECUTE that canned plan with a date range which
covers say half of the table : the indexscan would be a pretty bad choice
since it would have to access half the rows in the table in index
Note that's power draw per bit. dram is usually much more densely
packed (it can be with fewer transistors per cell) so the individual
chips for each may have similar power draws while the dram will be 10
times as densely packed as the sram.
Differences between SRAM and DRAM :
- price per
-- More explicit
select aggregate_using(max(date), city, temp, date) from bar group by
city, temp order by city;
select city, max(ROW(temp, date)) from bar group by city;
Does not work (alas) for lack of a default comparison for record type.
Another solution, which works wonders if you've
I always assumed SCSI disks had a write-through cache and therefore
didn't need a drive cache flush comment.
Maximum performance can only be reached with a writeback cache so the
drive can reorder and cluster writes, according to the realtime position
of the heads and platter rotation.
With 24 drives it'll probably be the controller that is the limiting
factor of bandwidth. Our HP SAN controller with 28 15K drives delivers
170MB/s at maximum with raid 0 and about 155MB/s with raid 1+0.
I get about 150-200 MB/s on a linux software RAID of 3 cheap Samsung
SATA 1TB
On Tue, 09 Mar 2010 08:00:50 +0100, Greg Smith g...@2ndquadrant.com
wrote:
Scott Carey wrote:
For high sequential throughput, nothing is as optimized as XFS on Linux
yet. It has weaknesses elsewhere however.
When files are extended one page at a time (as postgres does)
fragmentation
-My warnings about downsides related to checkpoint issues with larger
buffer pools isn't an opinion at all; that's a fact based on limitations
in how Postgres does its checkpoints. If we get something more like
Oracle's incremental checkpoint logic, this particular concern might go
I think the logic you are referring to is the clock sweep buffer
accounting
scheme. That just makes sure that the most popular pages stay in the
buffers. If your entire db fits in the buffer pool then it'll all get in
there real fast.
Actually, I meant that in the case of a seq scan, PG
I was thinking in something like that, except that the factor I'd use
would be something like 50% or 100% of current size, capped at (say) 1
GB.
Using fallocate() ?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
I also wonder why the reported runtime of 5.847 ms is so much different
to the runtime reported of my scripts (both php and ruby are almost the
same). What's the best tool to time queries in postgresql? Can this be
done from pgadmin?
I've seen differences like that. Benchmarking isn't
On Mon, 22 Mar 2010 12:14:51 +0100, Merlin Moncure mmonc...@gmail.com
wrote:
On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote:
Note however that Oracle offeres full transactionality and does in
place row
updates. There is more than one way to do it.
There's no free
This is one of the more-or-less solved problems in Unix/Linux. Ext*
file systems have a reserve usually of 10% of the disk space that
nobody except root can use. It's not for root, it's because with 10%
of the disk free, you can almost always do a decent job of allocating
contiguous
If you do large transactions, which emits large quantities of xlog, be
aware that while the previous xlog segment is being fsynced, no new writes
happen to the next segment. If you use large wal_buffers (more than 16 MB)
these buffers can absorb xlog data while the previous segment is
Hannu Krosing wrote:
Pulling the plug should not corrupt a postgreSQL database, unless it was
using disks which lie about write caching.
Didn't we recently put the old wife's 'the disks lied' tale to bed in
favour of actually admiting that some well known filesystems and
saftware raid
After fsync/syncronous_commit off
Do not use fsync off, it is not safe. Who cares about the performance of
fsync=off, when in practice you'd never use it with real data.
synchronnous_commit=off is fine for some applications, though.
More info is needed about your configuration (hardware,
1. VACUUM FULL ANALYZE once in a week during low-usage time and
VACUUM FULL compacts tables, but tends to bloat indexes. Running it weekly
is NOT RECOMMENDED.
A correctly configured autovacuum (or manual vacuum in some circumstances)
should maintain your DB healthy and you shouldn't
So am I to understand I don't need to do daily reindexing as a
maintenance measure with 8.3.7 on FreeBSD.
Sometimes it's better to have indexes with some space in them so every
insert doesn't hit a full index page and triggers a page split to make
some space.
Of course if the index is
Does the psql executable have any ability to do a fetch many, using a
server-side named cursor, when returning results? It seems like it tries
to retrieve the query entirely to local memory before printing to
standard out.
I think it grabs the whole result set to calculate the display
explain analyze SELECT * FROM telegrams WHERE ((recipient_id=508933
AND recipient_deleted=FALSE) OR (user_id=508933 AND user_deleted=FALSE))
ORDER BY id DESC LIMIT 10 OFFSET 0
If you need very fast performance on this query, you need to be able to
use the index for ordering.
Note that
How do you explain the cost is about ten times lower in the 2nd query
than the first ?
Function call cost ?
Can you EXPLAIN ANALYZE ?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
FYI, I had a query like this :
(complex search query ORDER BY foo LIMIT X)
LEFT JOIN objects_categories oc
LEFT JOIN categories c
GROUP BY ...
(more joins)
ORDER BY foo LIMIT X
Here, we do a search on objects (i'm not gonna give all the details,
they're not interesting for the problem at
DELETE FROM table1 WHERE table2_id = ?
For bulk deletes, try :
DELETE FROM table1 WHERE table2_id IN (list of a few thousands ids)
- or use a JOIN delete with a virtual VALUES table
- or fill a temp table with ids and use a JOIN DELETE
This will save cliet/server roundtrips.
Now, something
As promised, I did a tiny benchmark - basically, 8 empty tables are
filled with 100k rows each within 8 transactions (somewhat typically for
my application). The test machine has 4 cores, 64G RAM and RAID1 10k
drives for data.
# INSERTs into a TEMPORARY table:
[joac...@testsrv scaling]$
Since you have lots of data you can use parallel loading.
Split your data in several files and then do :
CREATE TEMPORARY TABLE loader1 ( ... )
COPY loader1 FROM ...
Use a TEMPORARY TABLE for this : you don't need crash-recovery since if
something blows up, you can COPY it again... and it
Within the data to import most rows have 20 till 50 duplicates. Sometime
much more, sometimes less.
In that case (source data has lots of redundancy), after importing the
data chunks in parallel, you can run a first pass of de-duplication on the
chunks, also in parallel, something like :
Within the data to import most rows have 20 till 50 duplicates.
Sometime much more, sometimes less.
In that case (source data has lots of redundancy), after importing the
data chunks in parallel, you can run a first pass of de-duplication on
the chunks, also in parallel, something like :
Can you give the config params for those :
fsync =
synchronous_commit =
wal_sync_method =
Also, some vmstat 1 output during the runs would be interesting.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
Have you tried connecting using a UNIX socket instead of a TCP socket on
localhost ? On such very short queries, the TCP overhead is significant.
Actually UNIX sockets are the default for psycopg2, had forgotten that.
I get 7400 using UNIX sockets and 3000 using TCP (host=localhost)
--
Sent
FYI I've tweaked this program a bit :
import psycopg2
from time import time
conn = psycopg2.connect(database='peufeu')
cursor = conn.cursor()
cursor.execute(CREATE TEMPORARY TABLE test (data int not null))
conn.commit()
cursor.execute(PREPARE ins AS INSERT INTO test VALUES ($1))
I'm not surprised that Python add is so slow, but I am surprised that
I didn't remember it was... ;-)
it's not the add(), it's the time.time()...
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
When you set up a server that has high throughput requirements, the last
thing you want to do is use it in a manner that cripples its throughput.
Don't try and have 1000 parallel Postgres backends - it will process
those queries slower than the optimal setup. You should aim to have
Especially as, in repeated tests, PostgreSQL with persistence turned off
is just as fast as the fastest nondurable NoSQL database. And it has a
LOT more features.
An option to completely disable WAL for such use cases would make it a lot
faster, especially in the case of heavy
Well I guess I'd prefer a per-transaction setting, allowing to bypass
WAL logging and checkpointing. Forcing the backend to care itself for
writing the data I'm not sure is a good thing, but if you say so.
Well if the transaction touches a system catalog it better be WAL-logged...
A
I'd like to point out the costs involved in having a whole separate
version
It must be a setting, not a version.
For instance suppose you have a session table for your website and a users
table.
- Having ACID on the users table is of course a must ;
- for the sessions table you can drop
On the new system the bulk loads are extremely slower than on the
previous
machine and so are the more complex queries. The smaller transactional
queries seem comparable but i had expected an improvement. Performing a
db
import via psql -d databas -f dbfile illustrates this problem.
If
Two problems to recognize. First is that building something in has the
potential to significantly limit use and therefore advancement of work
on external pools, because of the let's use the built in one instead of
installing something extra mentality. I'd rather have a great external
Essentially, we insert a set of columns into a table, and each row fires
a trigger function which calls a very large stored procedure
For inserting lots of rows, COPY is much faster than INSERT because it
parses data (a lot) faster and is more data-stream-friendly. However the
actual
Most (all?) hard drives have cache built into them. Many raid cards have
cache built into them. When the power dies, all the data in any cache is
lost, which is why it's dangerous to use it for write caching. For that
reason, you can attach a BBU to a raid card which keeps the cache alive
The bitmap heap scan is 3% faster,
3% isn't really significant. Especially if the new setting makes one query
100 times slower... Like a query which will, by bad luck, get turned into
a nested loop index scan for a lot of rows, on a huge table which isn't in
cache...
--
Sent via
It sounds horrendously complicated to keep track of to me, and in the
end it won't make query execution any faster, it'll just potentially
help the planner pick a better plan. I wonder if that'd be worth the
extra CPU time spent managing the cache and cache content stats, and
using those
I ran into a fine example of this when I was searching this mailing list,
Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3. Obviously at some point count(*) came into play here
Well, tsearch full text search is excellent, but it has to work inside the
suggest that 99% instances of the select count(*) idiom are probably
bad use of the SQL language.
Well, suppose you paginate results. If the user sees that the search query
returns 500 pages, there are two options :
- you're google, and your sorting algorithms are so good that the answer
I guess I have to comment here again and point out that while I am
having this
issue with text searches, I avoid using count(*) in such cases, I just
use
next and previous links.
Unfortunately sometimes you got to do an ORDER BY on search results, and
then all the rows got to be read...
emailok | numeric(2,0)|
Note that NUMERIC is meant for
- really large numbers with lots of digits
- or controlled precision and rounding (ie, order total isn't
99. $)
Accordingly, NUMERIC is a lot slower in all operations, and uses a lot
more
Even if somebody had a
great idea that would make things smaller without any other penalty,
which I'm not sure I believe either.
I'd say that the only things likely to bring an improvement significant
enough to warrant the (quite large) hassle of implementation would be :
- read-only /
Pierre C li...@peufeu.com wrote:
in-page compression
How would that be different from the in-page compression done by
TOAST now? Or are you just talking about being able to make it
more aggressive?
-Kevin
Well, I suppose lzo-style compression would be better used on data that is
written
My questions are: (1) Does the MVCC architecture introduce significant
delays between insert by a thread and visibility by other threads
As said by others, once commited it is immediately visible to all
(2) Are there any available benchmarks that can measure this delay?
Since you will
Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads? Or is this the price we have to pay for
using SQL instead of some NoSQL solution. (We actually tried Tokyo
Cabinet and found it
On Thu, 04 Nov 2010 15:42:08 +0100, Nick Matheson
nick.d.mathe...@noaa.gov wrote:
I think your comments really get at what our working hypothesis was, but
given that our experience is limited compared to you all here on the
mailing lists we really wanted to make sure we weren't missing any
Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that
I forget sometimes that people actually run with the default where this
becomes an important consideration.
Do you have any testing in favor of 16mb vs. lower/higher?
From some tests I had done some time ago, using
2. Why do both HashAggregate and GroupAggregate say the cost estimate
is 4 rows?
I've reproduced this :
CREATE TABLE popo AS SELECT (x%1000) AS a,(x%1001) AS b FROM
generate_series( 1,100 ) AS x;
VACUUM ANALYZE popo;
EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo
The table have 200 records now.
Select * from employee takes 15 seconds to fetch the data!!!
Which seems to be very slow.
But when I say select id,name from empoyee it executes in 30ms.
30 ms is also amazingly slow for so few records and so little data.
- please provide results of EXPLAIN
Note that your LEFT JOIN condition is probably quite slow...
Please post EXPLAIN ANALYZE for this simplified version :
SELECT
R.Osoba weryfikuj?ca,
R.LP,
A.NKA,
A.NTA,
Sum(Ile)
FROMNumeryA A
LEFT JOIN Rejestr stacji do naprawy R ON (
When I remove foreign constraints (drones_history_fk__samples and
drones_history_fk__drones) (I leave the primary key on drones_history)
than that INSERT, even for 50k rows, takes no more than a second.
So, my question is - is there anything I can do to make INSERTS with PK
faster? Or,
I pasted DDL at the begining of my post.
Ah, sorry, didn't see it ;)
The only indexes tables have are the ones created because of PK
constraints. Table drones has around 100k rows. Table drones_history has
around 30M rows. I'm not sure what additional info you'd want but I'll
be more
realm_51=# vacuum analyze verbose drones;
INFO: vacuuming public.drones
INFO: scanned index drones_pk to remove 242235 row versions
DETAIL: CPU 0.02s/0.11u sec elapsed 0.28 sec.
INFO: drones: removed 242235 row versions in 1952 pages
DETAIL: CPU 0.01s/0.02u sec elapsed 0.03 sec.
INFO: index
Yes, since (sample_id, drone_id) is primary key, postgres created
composite index on those columns. Are you suggesting I add two more
indexes, one for drone_id and one for sample_id?
(sample_id,drone_id) covers sample_id but if you make searches on drone_id
alone it is likely to be very
Having that many instances is not practical at all, so I'll have as many
databases as I have 'realms'. I'll use pg_dump | nc and nc | psql to
move databases
Mario
Then you can use schemas, too, it'll be easier.
--
Sent via pgsql-performance mailing list
Now I tried removing the constraints from the history table (including
the PK) and the inserts were fast. After few 'rounds' of inserts I added
constraints back, and several round after that were fast again. But then
all the same. Insert of some 11k rows took 4 seconds (with all
Is that true? I have no idea. I thought everything was done at the
512-byte block level.
Newer disks (2TB and up) can have 4k sectors, but this still means a page
spans several sectors.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your
Just once.
OK, another potential problem eliminated, it gets strange...
If I have 5000 lines in CSV file (that I load into 'temporary' table
using COPY) i can be sure that drone_id there is PK. That is because CSV
file contains measurements from all the drones, one measurement per
drone. I
So, I did. I run the whole script in psql, and here is the result for
the INSERT:
realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id,
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
tmp_drones_history;
The hardware it
is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.
For a database you'd want to consider replacing the RAID1 with a RAID1 (or
RAID10). RAID5 is slow for small random updates, which are common in
databases. Since you probably have enough harddisks anyway, this
On Thu, 09 Dec 2010 06:51:26 +0100, Alex Goncharov
alex-goncha...@comcast.net wrote:
,--- You/Divakar (Wed, 8 Dec 2010 21:17:22 -0800 (PST)) *
| So it means there will be visible impact if the nature of DB
interaction is DB
| insert/select. We do that mostly in my app.
You can't say a
If the data are stored as a byte array but retrieve into a ResultSet,
the unpacking time goes up by an order of magnitude and the
observed total throughput is 25 MB/s. If the data are stored in a
Postgres float array and unpacked into a byte stream, the
observed throughput is 20 MB/s.
Dear Friends,
I have a requirement for running more that 15000 queries per
second.
Can you please tell what all are the postgres parameters needs to be
changed
to achieve this.
Already I have 17GB RAM and dual core processor and this machine
is dedicated for database
fc=# explain analyse select collection, period, tariff, sum(bytesSent),
sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600
as startchunk from sample_20101001 where starttime between 1287493200
and 1290171599 and collection=128and ip = '10.9.125.207' group by
The real performance problem with RAID 5 won't show up until a drive
dies and it starts rebuilding
I don't agree with that. RAID5 is very slow for random writes, since
it needs to :
The real problem is when RAID5 loses a drive and goes from acceptable
kind of slow, to someone's
Do you mean I should use PREPARE?
Currently I use PHP to access the DB which use libpq. Is that cosidered a
fast call API ? if not, can you please refer me to the right info.
PHP pg_pconnect command open a persistent PostgreSQL connection. Is it
enough or I better use PgPool2 or something
On Wed, 22 Dec 2010 14:17:21 +0100, Michael Ben-Nes mich...@epoch.co.il
wrote:
Thanks, it is most interesting
--
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
I wonder how the OP configured effective_io_concurrency ; even on a single
drive with command queuing the fadvise() calls that result do make a
difference...
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
And the risks are rather asymmetric. I don't know of any problem from
too large a buffer until it starts crowding out shared_buffers, while
under-sizing leads to the rather drastic performance consequences of
AdvanceXLInsertBuffer having to wait on the WALWriteLock while holding
the
On Fri, 04 Feb 2011 21:37:56 +0100, Shaun Thomas stho...@peak6.com wrote:
On 02/04/2011 02:14 PM, felix wrote:
oh and there in the footnotes to django they say dont' forget to run
the delete expired sessions management every once in a while.
thanks guys.
Oh Django... :)
it won't run now
I have clustered that table, its still unbelievably slow.
Did you actually delete the old entries before clustering it? if it's
still got 4G of old sessions or whatever in it, clustering ain't gonna
help.
Also, IMHO it is a lot better to store sessions in something like
memcached, rather
On Sun, 06 Feb 2011 19:16:23 +0100, Linos i...@linos.es wrote:
I am searching what would be the best hardware combination to a new
server i have to install, the new server will have a Postgresql 9.0 with
a database of about 10gb, the database part it is not the problem for
me, in this size
select * from account_transaction where trans_type_id in ( ...
long, hard-coded list ...) and account_id=? order by created desc
limit 25;
You could use an index on (account_id, created, trans_type), in
replacement of your index on (account_id, created). This will not prevent
the Index
Thomas Pöhler wrote:
I remember you said you were using nginx and php-fastcgi, how many web
server boxes do you have, and what are the specs ?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
I have created two tables. In the first table i am using many fields to
store 3 address.
as well as in b table, i am using array data type to store 3 address.
is
there any issue would face in performance related things which one
will
cause the performance issue.
The array is
Any time the volume tables are queried it is to calculate the deltas
between each in_octets and out_octets from the previous row (ordered
by timestamp). The delta is used because the external system, where
the data is retrieved from, will roll over the value sometimes. I
have a function to do
What's more, this is already a new controller. It replaced the previous
one because of exactly the same persisting problem. I think tech support
people not knowing a solution just buy some time for them and say flash
this beta firmware maybe it helps or replace your hardware.
We had a
While reading about NoSQL,
MongoDB let's you store and search JSON objects.In that case, you don't
need to have the same columns in each row
The following ensued. Isn't it cute ?
CREATE TABLE mongo ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
INSERT INTO mongo (obj) SELECT
I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop
why even have multiple rows? just jam it all it there! :-D
LOL
But seriously, when using an ORM to stuff an object hierarchy into a
database, you usually get problems with class inheritance, and all
solutions suck more or less (ie, you get a zillion tables, with assorted
pile of
This is a problem I encounter constantly wherever I go. Programmer
selects millions of rows from giant table. Programmer loops through
results one by one doing some magic on them. Programmer submits queries
back to the database. Even in batches, that's going to take ages.
Reminds me of a
Hi Merlin
The analyze command gave the following result:
On the KVP table:
Index Scan using kvpidx on bench_kvp (cost=0.00..8.53 rows=1 width=180)
(actual time=0.037..0.038 rows=1 loops=1)
Index Cond: (bench_id = '20_20'::text)
Total runtime: 0.057 ms
And on the Hstore table:
You wrote
Try to create a btree index on (bench_hstore-bench_id) WHERE
(bench_hstore-bench_id) IS NOT NULL.
What do you mean exactly?
= CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps) WHERE
??? IS NOT NULL;
My table's def is:
CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps
If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it
takes close to 10 minutes.
Do you run those in a single transaction or do you use one transaction per
DELETE ?
In the latter case, postgres will ensure each transaction is commited to
disk, at each commit. Since this
Load testing of postgresql 8.4 for OLTP application
suitability showed that throughput of the
database significantly degraded over time from thousands of write
transactions per second to almost zero.
A typical postgres benchmarking gotcha is :
- you start with empty tables
-
My table is a statistics counters table, so I can live with a partial
data
loss, but not with a full data loss because many counters are weekly and
monthly.
Unlogged table can increase speed, this table has about 1.6 millions of
update per hour, but unlogged with a chance of loss all
It's a fairly tricky problem. I have a number of sensors producing
energy data about every 5 minutes, but at random times between 1 and
15 minutes. I can't change that as that's the way the hardware of the
sensors works. These feed into another unit, which accumulates them
and forwards them in
It's a fairly tricky problem. I have a number of sensors producing
energy data about every 5 minutes, but at random times between 1 and
15 minutes. I can't change that as that's the way the hardware of the
sensors works. These feed into another unit, which accumulates them
and forwards them in
That's almost identical to my tables.
You explained your problem very well ;)
I certainly will. Many thanks for those great lines of SQL!
You're welcome !
Strangely I didn't receive the mail I posted to the list (received yours
though).
--
Sent via pgsql-performance mailing list
93 matches
Mail list logo