Therefore, I propose defining new data types like "CRC32", "CRC64",
"RIPEMD", whatever (rather than pluggable arbitrary CRCs).
I suspect that you are really looking at the problem from the wrong end.
CRC checking should not need to be done by the database user, with a fancy
type. The postgres
I suspect that you are really looking at the problem from the wrong end.
CRC checking should not need to be done by the database user, with a fancy
type. The postgres server itself should guarantee data integrity - you
shouldn't have to worry about it in userland.
I agree in principle.
"Mikheev, Vadim" wrote:
recently I have downloaded a pre-beta postgresql, I found
insert and update speed is slower then 7.0.3,
even I turn of sync flag, it is still slow than 7.0, why?
How much slower do you see it to be ?
how can I make it faster?
Try to compare 7.0.3 7.1beta
Hi,
We've done some work with GiST indices and found a little problem
with optimizer. The problem could be reproduced with Gene's code
(link is in original message below). test data and sql I could send -
it's just 52Kb gzipped file. What is a reason for optimizer to decide
that sequential scan
Oleg Bartunov [EMAIL PROTECTED] writes:
We've done some work with GiST indices and found a little problem
with optimizer.
test=# set enable_seqscan = off;
SET VARIABLE
test=# explain select * from test where s @ '1.05 .. 3.95';
NOTICE: QUERY PLAN:
Index Scan using test_seg_ix on test
I have a working version of a text search engine. I want to make it work
for Postgres (I will be releasing it GPL). It can literally find the
occurrence of a string of words within 5 million records in a few
milliseconds. It is very fast, it works similarly to many web search
engines.
I have
"Horst Herb" [EMAIL PROTECTED] writes:
AFAIK the thread for "built in" crcs referred only to CRCs in the
transaction log. This here is a different thing. CRCs in the transaction log
are crucial to proof integrity of the log, CRCs as datatype are neccessary
to proof integrity of database
I have just returned from a seven-day trip to Japan. I spoke for seven
hours to three separate groups, totalling 200 people. I spoke to a
Linux Conference, a PostgreSQL user's group, and to SRA, a PostgreSQL
support company. You can get more information on my home page under
"Writings". Here
On Fri, 8 Dec 2000, Tom Lane wrote:
Date: Fri, 08 Dec 2000 10:47:37 -0500
From: Tom Lane [EMAIL PROTECTED]
To: Oleg Bartunov [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], 'pgsql-hackers ' [EMAIL PROTECTED]
Subject: Re: [HACKERS] Indexing for geographic objects?
Oleg Bartunov [EMAIL
Hi all
Small intrusion in the Threads discussion
1) don' t forget other OS .. the linux is not the only one (for now :-)
)
For example check the performance under Solaris
http://www2.linuxjournal.com/lj-issues/issue70/3184.html
2) unfortunatly some platforms that had no threads
"Horst Herb" [EMAIL PROTECTED] writes:
Surely you don't trust your TCP connection to the
server, either?
TCP _IS_ heavily checksummed.
Yes, and so are the disk drives that you are asserting you don't trust.
My point is that in both cases, there are lots and lots of failure
mechanisms that
Oleg Bartunov [EMAIL PROTECTED] writes:
5000 looks like a suspiciously round number ... how many rows are in
the table? Have you done a vacuum analyze on it?
about 10,000 rows,
So the thing is estimating 0.5 selectivity, which is a fallback for
operators it knows nothing whatever about.
[
On Fri, 8 Dec 2000, Oleg Bartunov wrote:
On Fri, 8 Dec 2000, The Hermit Hacker wrote:
Date: Fri, 8 Dec 2000 12:19:56 -0400 (AST)
From: The Hermit Hacker [EMAIL PROTECTED]
To: Tom Lane [EMAIL PROTECTED]
Cc: Oleg Bartunov [EMAIL PROTECTED], [EMAIL PROTECTED],
'pgsql-hackers '
Try to compare 7.0.3 7.1beta in multi-user environment.
As I understand it you claim it to be faster in multi-user
environment ?
Could you give some brief technical background why is it so
and why must it make single-user slower ?
Because of commit in 7.1 does fsync, with ot without
"Mikheev, Vadim" [EMAIL PROTECTED] writes:
Because of commit in 7.1 does fsync, with ot without -F
(we can discuss and change this), but in multi-user env
a number of commits can be made with single fsync.
I was planning to ask why you disabled the -F switch. Seems to me that
people who
On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote:
Bruce Guenter [EMAIL PROTECTED] writes:
... Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
... but that would be an algorithm that you know
On Fri, 8 Dec 2000, Tom Lane wrote:
Date: Fri, 08 Dec 2000 12:59:27 -0500
From: Tom Lane [EMAIL PROTECTED]
To: Oleg Bartunov [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], 'pgsql-hackers ' [EMAIL PROTECTED]
Subject: Re: [HACKERS] Indexing for geographic objects?
Oleg Bartunov [EMAIL
Because of commit in 7.1 does fsync, with ot without -F
(we can discuss and change this), but in multi-user env
a number of commits can be made with single fsync.
I was planning to ask why you disabled the -F switch. Seems
to me that people who trusted their OS+hardware before would
Bruce Guenter [EMAIL PROTECTED] writes:
MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts.
True but irrelevant. What we need to worry about is the probability
that a random error will be
I just didn't care about -F functionality, sorry.
I agreed that we should resurrect it.
OK. Do you want to work on that, or shall I?
In near future I'll be busy doing CRC + "physical log"
things...
Vadim
On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote:
This is very interesting. MD4 is faster than MD5. (MD5, described as
"MD4 with suspenders on", does some extra stuff to protect against more-
obscure attacks, of no interest to us.) Which 64-bit CRC code did you
use, Mark
Bruce Guenter [EMAIL PROTECTED] writes:
Are you really saying MD5 was faster than CRC-32?
Yes. I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
is mostly non-parallelizable, uses a table lookup, and operates
It was just pointed out on pggeneral that hash indexes on macaddr
columns don't work. Looking into it, I find that someone (me :-()
made a booboo: pg_amproc claims that hashvarlena is the appropriate
hash function for macaddr --- but macaddr isn't a varlena type,
it's a fixed-length
We could fix this either by adding a new hash function to support
macaddr, or by removing the pg_amXXX entries that claim macaddr is
hashable. Either change will not take effect without an initdb,
however, and I'm loath to force one now that we've started beta.
If we're going to add CRC to
We could fix this either by adding a new hash function to support
macaddr, or by removing the pg_amXXX entries that claim macaddr is
hashable. Either change will not take effect without an initdb,
however, and I'm loath to force one now that we've started beta.
If we're going to add
We could fix this either by adding a new hash function to support
macaddr, or by removing the pg_amXXX entries that claim macaddr is
hashable. Either change will not take effect without an initdb,
however, and I'm loath to force one now that we've started beta.
How about creating an SQL
On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote:
Bruce Guenter [EMAIL PROTECTED] writes:
Are you really saying MD5 was faster than CRC-32?
Yes. I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
Hello,
Why is this happening ?
ctonet=# show datestyle;
NOTICE: DateStyle is ISO with European conventions
SHOW VARIABLE
ctonet=# select creation_date from users limit 1;
creation_date
2000-12-07 04:40:23+01
^^
Datestyle has been set either with
Daniele Orlandi [EMAIL PROTECTED] writes:
Hello,
Why is this happening ?
ctonet=# show datestyle;
NOTICE: DateStyle is ISO with European conventions
SHOW VARIABLE
ctonet=# select creation_date from users limit 1;
creation_date
2000-12-07
Bruce Guenter [EMAIL PROTECTED] writes:
Would you like to see the simple benchmarking setup I used? The amount
of code involved (once all the hashes are factored in) is fairly large,
so I'm somewhat hesitant to just send it to the mailing list.
I agree, don't send it to the whole list. But
I've run tests (with 50 .. 250 simult users) for some PG project
of my company. 7.1 was 3 times faster than 7.0.3 (fsync) but near
3 times slower than 7.0.3 (nofsync). It was not the best day in
my life - WAL looked like big bottleneck -:(
But finally I've realized that this test makes ~3 FK
"Mikheev, Vadim" [EMAIL PROTECTED] writes:
So, I've run simple test (below) to check this. Seems that 7.1
is faster than 7.0.3 (nofsync), and that SELECT FOR UPDATE in RI
triggers is quite bad for performance.
Also, we should add new TODO item: implement dirty reads
and use them in RI
Tom Lane wrote:
mlw [EMAIL PROTECTED] writes:
I have a working version of a text search engine. I want to make it work
for Postgres (I will be releasing it GPL). It can literally find the
occurrence of a string of words within 5 million records in a few
milliseconds.
Where are the
So, I've run simple test (below) to check this. Seems that 7.1
is faster than 7.0.3 (nofsync), and that SELECT FOR UPDATE in RI
triggers is quite bad for performance.
Also, we should add new TODO item: implement dirty reads
and use them in RI triggers.
That would fix RI triggers, I
Barry Lind [EMAIL PROTECTED] writes:
Now the real problem comes in when either the like or regex operators
are used in a sql statement.
Right. As of 7.1beta1 we are dealing with this by suppressing LIKE/regex
index optimization in all locales other than "C". That's a pretty crude
answer but
In researching a problem I have uncovered the following bug in index
scans when Locale support is enabled.
Given a 7.0.3 postgres installation built with Locale support enabled
and a default US RedHat 7.0 Linux installation (meaning that the LANG
environment variable is set to en_US) to enable
Here is a late report on the OSDN database summit.
It was great to meet so many PostgreSQL users, and to meet the major
developers of MySQL, Interbase, and Sleepycat. We clearly have many of
the same hopes and concerns for open-source databases.
PostgreSQL had half of all attendees. Seems we
Bruce,
what was the camera ?
No idea. It was not mine. I brought a video camera, and made 30
minutes of video for my family and company. I don't know how to make an
MP3 of that.
My wife wants a digital camera now, so it looks like I will have one
soon. :-)
Mine is a Nikon
I have only one explanation: it reduces number of transactions ready
to commit (because of the same FK writers will wait till first one
committed - ie log fsynced) and WAL commit performance
greatly depends on how many commits were done by single log fsync.
7.0.3+nofsync commit
Could you perhaps post the code you have for splitting a text field up into
keys, then I could work on turning into a new type of index with a new
operator, as Tom suggested?
(Or is this already what the text search code in contrib already does??)
- Andrew
I would like to mention that I met Tatsuo Ishii and Hiroshi Inoue while
in Japan. This was the first time I met them, though I have worked with
them on PostgreSQL for many years. Tatsuo is really the voice of
PostgreSQL in Japan. It was a real thrill.
There is a picture somewhere of the
"Mikheev, Vadim" [EMAIL PROTECTED] writes:
And we always will have to enable fsync when comparing our
performance with other DBes.
Of course, but when people say "it's slower than 7.0.3+nofsync"
I think that turning off fsync makes a fairer comparison there.
also reduce the WAL commit delay
Bruce Guenter [EMAIL PROTECTED] writes:
I agree, don't send it to the whole list. But I'd like a copy.
Here you go.
As near as I could tell, the test as you have it (one CRC computation per
fread) is purely I/O bound. I changed the main loop to this:
int main() {
static char buf[8192];
Andrew Snow wrote:
Could you perhaps post the code you have for splitting a text field up into
keys, then I could work on turning into a new type of index with a new
operator, as Tom suggested?
(Or is this already what the text search code in contrib already does??)
Go to a search engine
There have been some misconceptions in previous mails.
1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash domain,
defined as "a fast error-check hash based on mod 2 polynomial operations"
which has typically no crypto strength (and does not need it either for most
purposes).
Andrew Snow wrote:
Could you perhaps post the code you have for splitting a text field up into
keys, then I could work on turning into a new type of index with a new
operator, as Tom suggested?
(Or is this already what the text search code in contrib already does??)
- Andrew
OK, I
Trond Eivind GlomsrØd wrote:
2000-12-07 04:40:23+01
^^
That is the ISO-style, isn't it?
Yes, it is; but according to the documentation (and how it used to be on
other machines running PG 6.x) it should be ordered in european format,
I don't know
if I'm missing something
Please apply following one-line-patch and test it again. If it's ok, I
will commit it to both current and stable trees.
! return (b2c3[i][1] | 0x8080U);
Yes, it's OK. Thank you!
But I wonder why we need to "| 0x8080U"?
b2c3[][] and BIG5toCNS()'s return
Tom Lane wrote:
Oleg Bartunov [EMAIL PROTECTED] writes:
We've done some work with GiST indices and found a little problem
with optimizer.
test=# set enable_seqscan = off;
SET VARIABLE
test=# explain select * from test where s @ '1.05 .. 3.95';
NOTICE: QUERY PLAN:
Index Scan
I would like to mention that I met Tatsuo Ishii and Hiroshi Inoue while
in Japan. This was the first time I met them, though I have worked with
them on PostgreSQL for many years. Tatsuo is really the voice of
PostgreSQL in Japan. It was a real thrill.
I was thrilled to meet with you
Please apply following one-line-patch and test it again. If it's ok, I
will commit it to both current and stable trees.
! return (b2c3[i][1] | 0x8080U);
Yes, it's OK. Thank you!
Thanks for the testings. I will commit soon.
But I wonder why we need to "|
Adam Haberlach writes:
Typically (on a well-written OS, at least), the spawning of a thread
is much cheaper then the creation of a new process (via fork()).
This would be well worth testing on some representative sample
systems.
Within the past year and a half at one of my gigs some
Bruce Guenter [EMAIL PROTECTED] writes:
[ some very interesting datapoints ]
So, forking a process with lots of data is expensive. However, most of
the PostgreSQL data is in a SysV IPC shared memory segment, which
shouldn't affect the fork numbers.
I believe (but don't have numbers
On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote:
Bruce Guenter [EMAIL PROTECTED] writes:
I agree, don't send it to the whole list. But I'd like a copy.
Here you go.
As near as I could tell, the test as you have it (one CRC computation per
fread) is purely I/O bound.
Nope. They
I think a row-level CRC is rather pointless. Perhaps it'd be a good
idea to have a disk-page-level CRC, though. That would check the rows
on the page *and* allow catching errors in the page and tuple overhead
structures, which row-level CRCs would not cover.
row level is neccessary to be
Bruce Guenter [EMAIL PROTECTED] writes:
... Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
... but that would be an algorithm that you know NOTHING about the
properties of. What is your basis for asserting
Bruce Guenter wrote:
CRCs are designed to catch N-bit errors (ie N bits in a row with their
values flipped). N is (IIRC) the number of bits in the CRC minus one.
So, a 32-bit CRC can catch all 31-bit errors. That's the only guarantee
a CRC gives. Everything else has a 1 in 2^32-1 chance
57 matches
Mail list logo