I'm curious, why do you think that's serious ? What do you really expect
Simply because I don't like VB non .NET, but C# is a much much better
language, and even VB.NET is decent.
to do in the stored procedure ? Anything of consequence will seriously
degrade performance if you select it in
Hello,
Here I'm implementing a session management, which has a connections table
partitioned between active and archived connections. A connection
represents a connection between a user and a chatroom.
I use partitioning for performance reasons.
The active table contains all the
Try :
EXPLAIN SELECT get_year_trunc(pub_date) as foo FROM ... GROUP BY foo
Apart from that, you could use a materialized view...
db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
Ah, that makes sense. So is there a way to optimize SELECT DISTINCT
queries
that have no
Supposing your searches display results which are rows coming from one
specific table, you could create a cache table :
search_id serial primary key
index_n position of this result in the global result set
result_id id of the resulting row.
Then, making a search with 50k results
http://borg.postgresql.org/docs/8.0/interactive/storage-page-layout.html
If you vacuum as part of the transaction it's going to be more efficient
of resources, because you have more of what you need right there (ie:
odds are that you're on the same page as the old tuple). In cases like
that it
Every time I tested an idea to speed it up, I got exactly the same
loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as
compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server.
Now, why a dual opteron machine can't perform any faster than a lowly
1800+ athlon in
with about 8000 rows. For this table query:
SELECT MAX(MsgSeqNum),MAX(LogTimestamp) FROM ELT_tcli_MessageLog
WHERE LogTimestamp = '0' AND IsFromCounterParty = 'Y' AND
IsOutOfSequence = 'N'
AND ConnectionName = 'DB_BENCHMARK'
AND LogTimestamp IN (SELECT MAX(LogTimestamp)
The problem with this approach is TTFB (Time to first Byte). The
initial query is very slow, but additional requests are fast. In most
situations we do not want the user to have to wait a disproportionate
amount of time for the initial query. If this is the first time using
the system this will
However, it seems that integer integer[] does not exist :
Try intset(id) int[]. intset is an undocumented function :)
I'm going to add intset() to README.
SELECT * FROM table WHERE id int[]
Mm.
intset(x) seems to be like array[x] ?
Actually what I want is the opposite. I have a btree
What you want is called UNNEST. It didn't get done in time for 8.0. But
if
what you have is an array of integers the int_array_enum() function I
quoted
in the other post is basically that.
Yes, I used it, thanks. That's what I wanted. The query plans are good.
You don't really need the
So are you suggesting as a general rule then that sub-queries are the
way to force a specific join order in postgres? If that is the case, I
will do this from now on.
I'll try to explain a bit better...
Here's your original query :
select s.*, ss.*
from shipment s,
SELECT * FROM shipment_status WHERE release_code_id = constant ORDER BY
release_code_id DESC, date DESC LIMIT 100;
I have done this in other queries where sorting by both release code and
date were important. You are right, it is very fast and I do have this
index in play. However, most of
Do you have anything performing any updates or inserts to this table,
even if it does not update the gist column, even if it does not update
anything ?
---(end of broadcast)---
TIP 6: Have you searched our list archives?
As a side note, I learned something very interesting for our developers
here.
We had been doing a drop database and then a reload off a db dump from
our
live server for test data. This takes 8-15 minutes depending on the
server
(the one above takes about 8 minutes). I learned through
For example, I am a developer of Mambo, a PHP-based CMS application,
and am porting the mysql functions to ADOdb so I can use grown-up
databases ;-)
Just yesterday I optimized a query for a website running MySQL. It's
the 'new products' type query :
SELECT product_id, pd.product_name,
In terms of performance, Oracle is to Postgres as Postgres is to Mysql:
More
complexity, more overhead, more layers of abstraction, but in the long
run it
pays off when you need it. (Only without the user-friendliness of either
open-source softwares.)
I don't find postgres complex... I find
normally you shouldn't have to do anything, it should just work :
select field from table where field like 'abc%'
CREATE INDEX ... ON table( field );
that's all
If it does not use the index, I saw on the mailing list that the locale
could be an issue.
(For those not knowing - it's ReadFile/WriteFile where you pass an array
of this many bytes to this address as parameters)
Isn't that like the BSD writev()/readv() that Linux supports also? Is
that something we should be using on Unix if it is supported by the OS?
Nope, readv()/writev()
From the Linux Kernel (make menuconfig) there seem to be two new reliable
sources for timing information. Note the remark about Time Stamp Counter
below. Question is, which one of these (or others) are your API functions
using ? I have absolutely no idea !
CONFIG_HPET_TIMER:
My web app does lots of inserts that aren't read until a session is
complete. The plan is to put the heavy insert session onto a ramdisk
based pg-db and transfer the relevant data to the master pg-db upon
session completion. Currently running 7.4.6.
From what you say I'd think you want to
Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL or an
integer.
Your query seems of the form :
SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY sort_key LIMIT
N OFFSET M;
I would suggest to rewrite it in a simpler way : instead of generating
the whole result
I have asked him for the data and played with his queries, and obtained
massive speedups with the following queries :
http://boutiquenumerique.com/pf/miroslav/query.sql
http://boutiquenumerique.com/pf/miroslav/query2.sql
http://boutiquenumerique.com/pf/miroslav/materialize.sql
Note that my
On my machine (Laptop with Pentium-M 1.6 GHz and 512MB DDR333) I get the
following timings :
Big Joins Query will all the fields and no order by (I just put a SELECT
* in the first table) yielding about 6k rows :
= 12136.338 ms
Replacing the SELECT * from the table with many fields by
Try :
SELECT owner from pictures group by owner;
Any ideas, apart from more or less manually maintaining a list of
distinct owners in another table ?
That would be a good idea too for normalizing your database.
---(end of
This is really great !
Think about altering the partitioning (this is quite complex) : imagine a
table split in several partitions archive and current where a row is
moved from current to archive when it will not be updated anymore.
Sometimes you can partition on a simple numeric value,
tableoid would accomplish that already, assuming that the partitioned
table is effectively a view on separate physical tables.
regards, tom lane
Very good.
Also note the possibility to mark a partition READ ONLY. Or even a table.
It does not seem very useful but just think that for
It would also be reasonable to allow clustering individual partitions;
creating table or column constraints on some partitions and not others;
I have a session mamagement which works like that, using views now.
sessions.online is a table of the online sessions. It has a UNIQUE on
user_id.
Can I use an index on a varchar column to optimize the SELECT queries
that
use column LIKE 'header%' ?
Yes
If yes what is the best tree algotithm to use ?
Btree
Note that if you want case insensitive matching you need to make an index
on lower(column) and SELECT WHERE lower(column)
With hardware tuning, I am sure we can do better than 35Mb per sec. Also
WTF ?
My Laptop does 19 MB/s (reading 10 KB files, reiser4) !
A recent desktop 7200rpm IDE drive
# hdparm -t /dev/hdc1
/dev/hdc1:
Timing buffered disk reads: 148 MB in 3.02 seconds = 49.01 MB/sec
# ll
which is evaluated and replanned on each entry to the
FOR loop. This allows the programmer to choose the speed
On each entry is not the same as on each iteration. It would means every
time the loop is started...
Regards,
PFC
---(end
The reason: if the power cord is yanked, the OS _must_ boot back up in
good condition. If the DB is corrupted, whatever, nuke it then re-
initialize it. But the OS must survive act-of-god events.
Well, in that case :
- Use reiserfs3 for your disks
- Use MySQL with MyISAM
So I need to either find an Index that will work with regexes or
re-write my code to update each subdirectory separately and use simpler
UPDATE statement for each.
Why don't you use a LTREE type to model your directory tree ? It's been
designed specifically for this purpose and has
Noticed this problem,too.
You can always make the calculation you want done once inside a set
returning function so it'll behave like a table, but that's ugly.
On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
[EMAIL PROTECTED] wrote:
hm, a few days and not a single reply :|
any more
This is done using COPY syntax, not INSERT syntax. So I suppose yes
I do. The file that is being used for COPY is kept on a ramdisk.
COPY or psql \copy ?
If you wanna be sure you commit after each COPY, launch a psql in a shell
and check if the inserted rows are visible (watching SELECT
But with all due respect to Joe, I think the reason that stuff got
trimmed is that it didn't work very well. In most cases it's
*hard* to write an estimator for a SRF. Let's see you produce
one for dblink() for instance ...
Good one...
Well in some cases it'll be impossible, but suppose I
My solution would be a lot simpler, since we could simply populate
pg_proc.proestrows with 1000 by default if not changed by the DBA. In
an
even better world, we could tie it to a table, saying that, for example,
proestrows = my_table*0.02.
What if the estimated row is a function of a
I have a table A with an int column ID that references table B column
ID. Table B has about 150k rows, and has an index on B.ID. When trying
to copy 1 million rows into A, I get the following \timings:
You're using 7.4.5. It's possible that you have a type mismatch in your
foreign keys which
My argument is that a sufficiently smart kernel scheduler *should*
yield performance results that are reasonably close to what you can
get with that feature. Perhaps not quite as good, but reasonably
close. It shouldn't be an orders-of-magnitude type difference.
And a controller card (or
platter compared to the rotational speed, which would agree with the
fact that you can read 70MB/sec, but it takes up to 13ms to seek.
Actually :
- the head has to be moved
this time depends on the distance, for instance moving from a cylinder to
the next is very fast (it needs to, to get
Index Scan using ix_B on B (cost=0.04..3.06 rows=1 width=329) (actual
time=93.824..93.826 rows=1 loops=1)
Index Cond: (id = $0)
InitPlan
- Limit (cost=0.00..0.04 rows=1 width=4) (actual
time=15.128..15.129 rows=1 loops=1)
- Seq Scan on A (cost=0.00..47569.70
My laptop reads an entire compiled linux kernel (23000 files totalling
250 MBytes) in about 1.5 seconds if they're in cache. It's about 15.000
files/second. You think it's slow ? If you want to read them in random
order, you'll probably use something else than a laptop drive, but you get
SELECT row1, row2 FROM table1_on_machine_a NATURAL JOIN
table2_on_machine_b
WHERE restrict_table_1 AND restrict_table_2
AND restrict_1_based_on_2;
I don't think that's ever going to be efficient...
What would be efficient would be, for instance, a Join of a part of a
table against another
However, memcached (and for us, pg_memcached) is an excellent way to
improve
horizontal scalability by taking disposable data (like session
information)
out of the database and putting it in protected RAM.
So, what is the advantage of such a system versus, say, a sticky
sessions system
Filesystems with many Filesystem Objects can slow down the Performance
at opening and reading Data.
On my laptop, lighttpd takes upto 15000 hits PER SECOND on static 2-3 Kb
files (tested with apachebench 2).
Apache is slower, of course : 3-4000 hits per second which is not that
bad.
Using
machines. Which has it's own set of issues entirely. I am not entirely
sure that memcached actually does serialize data when it's comitted into
I think it does, ie. it's a simple mapping of [string key] = [string
value].
memcached either, although I could be wrong, I have not looked at the
100 hits a second = 8,640,000 hits a day. I work on a site which does
100 million dynamic pages a day. In comparison Yahoo probably does
100,000,000,000 (100 billion) views a day
if I am interpreting Alexa's charts correctly. Which is about
1,150,000 a second.
Read the help on
If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth.
So if you count on a
It's common knowledge, it seems, that MySQL without transactions will be
a lot faster than Postgres on Inserts. And on Updates too, that is, unless
you have more than a few concurrent concurrent connections, at which point
the MySQL full table lock will just kill everything. And you don't
Pretty much. There has been discussion about allowing index-only access
to
frozen tables, i.e. archive partitions. But it all sort of hinges on
someone implementing it and testing
Would be interesting as a parameter to set at index creation (ie. if you
know this table will have a
The OID column is an extra few bytes on each row. If you don't have any
use for it (and let's face it: most of us don't), then create your
tables without OID.
Also there are some useful hacks using the oid which don't work if it
wraps around, thus preventing it from wrapping around by
I am using PostgreSQL (7.4) with a schema that was generated
automatically (using hibernate).
The schema consists of about 650 relations. One particular query (also
generated
automatically) consists of left joining approximately 350 tables. At this
Just out of curiosity, what
select advert_id from acr_cache where category_id = ? and region_id = ?
order by XXX {asc|desc} limit 20;
where XXX is one of 5 possible fields,
timestamp,
timestamp,
text,
text,
numeric
Create 5 indexes on ( category_id, region_id, a field )
where a field is one of your 5
select advert_id from acr_cache where category_id = ? and region_id = ?
order by category_id, region_id, XXX limit 20;
don't forget to mention all the index columns in the order by, or the
planner won't use it.
---(end of broadcast)---
TIP
Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but
the performance was no better:
select PlayerID,AtDate from Player where PlayerID='0' order by
PlayerID desc, AtDate desc LIMIT 1
The DISTINCT query will pull out all the rows and keep only one, so the
one with
If you want something more embedded in your application, you could
consider :
http://firebird.sourceforge.net/
http://hsqldb.sourceforge.net/
http://sqlite.org/
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose
PostgreSQL and say to use MySQL if you want reasonable performance.
If you want MySQL performance and reliability with postgres, simply run
it with fsync deactivated ;)
I'd suggest a controller with battery backed up cache to get rid of the 1
commit = 1 seek boundary.
Makes it
real
postgres - returns results in 2.8 seconds
What kind of plan does it do ? seq scan on the big tables and hash join
on the small tables ?
mysql - takes around 16 seconds (This is with myisam ... with
innodb it takes 220 seconds)
I'm not surprised at all.
Try the same
Please pardon my ignorance, but from whatever I had heard, mysql was
supposedly always faster than postgres Thats why I was so surprised
!!
I heard a lot of this too, so much it seems common wisdom that postgres
is slow... well maybe some old version was, but it's getting better at
My tests included using aqua studios connection to both databases and
.asp
page using odbc connections.
Performance also depends a lot on the driver.
For instance, the PHP driver for MySQL is very very fast. It is also very
dumb, as it returns everything as a string and doesn't
transaction, delete everything and then just dump new data in (copy
perhaps). The old data would be usable to other transactions until I
commit my insert. This would be the fastest way, but how much memory
would this use? Will this cause performance issues on a heavily loaded
server with too
I do not know what clustering would do for you. But striping will
provide a
high level of assurance that each of your hard drives will process
equivalent
amounts of IO operations.
I don't know what I'm talking about, but wouldn't mirorring be faster
than striping for random reads
However, I donot know if the query that creates the view is executed
everytime I select something from the view. Because if that is the case,
then I think my queries will again be slow. But if that is the way views
work, then what would be the point in creating them ..
Views are more for
From what you say I understand that you have a huge table like this :
( name, value, id )
And you want to make statistics on (value) according to (name,id).
***
First of all a materialized view doen't exist in postgres, it's just a
word to
use CURRENT_TIME which is a constant instead of now() which is not
considered constant... (I think)
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
database= explain select date_trunc('hour', time),count(*) as total from
test where p1=53 and time now() - interval '24 hours' group by
date_trunc order by date_trunc ;
1. Use CURRENT_TIMESTAMP (which is considered a constant by the planner)
instead of now()
2. Create a multicolumn
So, it seems that for my application (database in memory, 14 millions
of very small requests), Centrino (aka Pentium M) has a build-in
hardware to boost Postgres performance :-)
Any experience to confirm this fact ?
On my Centrino, Python flies. This might be due to the very large
Hello,
I once upon a time worked in a company doing backup software and I
remember these problems, we had exactly the same !
The file tree was all into memory and everytime the user clicked on
something it haaad to update everything. Being C++ it was very fast, but
to backup a
Ok, I tried this one. My ssh keeps getting cut off by a router
somewhere between me and the server due to inactivity timeouts, so
all I know is that both the select and explain analyze are taking
over an hour to run. Here's the explain select for that one, since
that's the best I can get.
What happens if, say at iteration 6000 (a bit after the mess starts), you
pause it for a few minutes and resume. Will it restart with a plateau like
at the beginning of the test ? or not ?
What if, during this pause, you disconnect and reconnect, or restart the
postmaster, or vacuum, or
total. If the insertion pattern is sufficiently random that the entire
index ranges are hot then you might not have enough RAM.
Try doing the test dropping some of your indexes and see if it moves the
number of iterations after which it becomes slow.
---(end of
I think PFC's question was not directed towards modeling your
application, but about helping us understand what is going wrong
(so we can fix it).
Exactly, I was wondering if this delay would allow things to get flushed,
for instance, which would give information about the problem (if
What programming language are these scripts written in ?
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
It is a subprocess of a Python process, driven using a pexpect
interchange. I send the COPY command, then wait for the '=#' to come
back.
did you try sending the COPY as a normal query through psycopg ?
---(end of broadcast)---
TIP 9:
I can't say wether MySQL is faster for very small queries (like
SELECT'ing one row based on an indexed field).
That's why I was asking you about the language...
I assume you're using a persistent connection.
For simple queries like this, PG 8.x seemed to be a lot faster
PS: Where can I find benchmarks comparing PHP vs Perl vs Python in
terms of speed of executing prepared statements?
I'm afraid you'll have to do these yourself !
And, I don't think the Python drivers support real prepared statements
(the speed of psycopy is really good though).
I
You could have a program pre-parse your log and put it in a format
understandable by COPY, then load it in a temporary table and write a part
of your application simply as a plpgsql function, reading from this table
and doing queries (or a plperl function)...
So... (bear with me here..
Which row do you want ? Do you want 'a row' at random ?
I presume you want the N latest rows ?
In that case you should use an ORDER BY on an indexed field, the serial
primary key will do nicely (ORDER BY id DESC) ; it's indexed so it will
use the index and it will fly.
I'm a little leary as it is definitely a version 1.0 product (it is
still using an FPGA as the controller, so they were obviously pushing to
get the card into production).
Not necessarily. FPGA's have become a sensible choice now. My RME studio
soundcard uses a big FPGA.
The performance
I had in mind the extra tables that an application sometimes needs to
operate faster. Denormalisations, pre-joined tables, pre-calculated
results, aggregated data. These are not temporary tables, just part of
the application - multi-user tables that stay across shutdown/restart.
You could
Also, this test goes a bit faster with sync turned off, if mysql isn't
using sync that would be why it's so much faster. Anyone know what the
default for mysql is?
For InnoDB I think it's like Postgres (only slower) ; for MyISAM it's no
fsync, no transactions, no crash tolerance of any
What I really want to do is have the token array available as a record
so that I can query against it, but not have it take up the resources of
a real table. If I could copy from an array into a record then I can
even get rid of the loop. Anyone have any thoughts on how to do this?
You
why not simply create an index on (game_end, state) ?
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
You could lock the count table to prevent the problem
where 2 competing transactions do an insert, read the
start value and add 1 to it and then write the result
- which is n+1 rather then n+2 - so you are off by one.
Think of the same when one transaction inserts 100
and the other 120. Then
Josh Berkus has already mentioned this as conventional wisdom as written
by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been
around for a long time; it was probably a clear performance win way back
when. Nowadays with how far open-source OS's have advanced, I'd take it
of effort reinventing the wheel ... but our time will be repaid much
more if we work at levels that the OS cannot have knowledge of, such as
join planning and data statistics.
Considering a global budget of man-hours which is the best ?
1- Spend it on reimplementing half of VFS in
Really? Cool, I'd like to see that. Could you follow up with Hans?
Or give
me his e-mail?
You can subscribe to the Reiser mailinglist on namesys.com or :
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 1: if posting/reading
At least on Sparc processors, v8 and newer, any double precision math
(including longs) is performed with a single instruction, just like for
a 32 bit datum. Loads and stores of 8 byte datums are also handled via
a single instruction. The urban myth that 64bit math is
DELETE FROM statistics_sasme WHERE statistic_id = 9832;
As Michael said, why use a NUMERIC when a bigint is faster and better for
your use case, as you only need an integer and not a fixed precision
decimal ?
Also if you use postgres 8, the index will not be used if you search on
I know I haven't provided a whole lot of application-level detail here,
You did !
What about :
- using COPY instead of INSERT ?
(should be easy to do from the aggregators)
- using Bizgres ?
(which was designed for your
Hello fellow Postgresql'ers.
I've been stumbled on this RAID card which looks nice. It is a PCI-X SATA
Raid card with 6 channels, and does RAID 0,1,5,10,50.
It is a HP card with an Adaptec chip on it, and 64 MB cache.
HP Part # : 372953-B21
Adaptec Part # :
It looks like a rebranded low end Adaptec 64MB PCI-X - SATA RAID card.
Looks like the 64MB buffer is not upgradable.
Looks like it's SATA, not SATA II
Yeah, that's exactly what it is. I can get one for 150 Euro, the Areca is
at least 600. This is for a budget server so while it would be
The common explanation is that CPUs are so fast now that it doesn't make
a difference.
From my experience software raid works very, very well. However I have
never put
software raid on anything that is very heavily loaded.
Even for RAID5 ? it uses a bit more CPU for the parity
It appears that PostgreSQL is two to three times slower than MySQL. For
example, some pages that have some 30,000 characters (when saved as
HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with
PostgreSQL. I had read that the former was generally faster than the
latter,
Just to add a little anarchy in your nice debate...
Who really needs all the results of a sort on your terabyte table ?
I guess not many people do a SELECT from such a table and want all the
results. So, this leaves :
- Really wanting all the results, to fetch using
Total runtime: 16.000 ms
Even though this query isn't that optimized, it's still only 16
milliseconds.
Why does it take this long for PHP to get the results ?
Can you try pg_query'ing this exact same query, FROM PHP, and timing it
with getmicrotime() ?
You can even do an
Bulk loading speed is irrelevant here - that is dominated by parsing,
which
we have covered copiously (har har) previously and have sped up by 500%,
which still makes Postgres 1/2 the loading speed of MySQL.
Let's ask MySQL 4.0
LOAD DATA INFILE blah
0 errors, 666 warnings
SHOW
It's more understandable if the table names are in front of the column
names :
SELECT relationship.topic_id1, relationship.topic_id2, topic.topic_name,
topic.categ_id, topic.list_name, topic.title,
topic.url, topic.page_type, relationship.rel_type, entry_type.inverse_id,
I am running Postgre 7.4 on FreeBSD. The main table have 2 million record
(we would like to do at least 10 mil or more). It is mainly a FIFO
structure
with maybe 200,000 new records coming in each day that displace the older
records.
I'm so sorry, but I have to rant XDDD
People
I've done it...
First of all I totally agree with PFC's rant regarding absolute
positioning while browsing datasets. Among other things, it has serious
problems if you have multiple updating your table. Also it's kind of
silly to be doing this in a set based data paradigm.
Recently I've
create view v1 (code,num) as
select 'AAA',id from table1
union
select 'BBB',id from table2;
As your rows are, by definition, distinct between each subquery, you
should use UNION ALL instead of UNION to save postgres the trouble of
hunting non-existing duplicates. This will save you a few
1 - 100 of 280 matches
Mail list logo