Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Pomarede Nicolas
On Thu, 10 Jan 2008, Jonah H. Harris wrote: On Jan 10, 2008 6:25 PM, Steve Atkins [EMAIL PROTECTED] wrote: http://pgfoundry.org/projects/ip4r/ That has the advantage over using integers, or the built-in inet type, of being indexable for range and overlap queries. Agreed. ip4r is da bomb.

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-11 Thread Jakub Ouhrabka
Hi Tom, I doubt we'd risk destabilizing 8.3 at this point, for a problem that affects so few people; let alone back-patching into 8.2. understand. OK, that confirms the theory that it's sinval-queue contention. We'we tried hard to identify what's the cause of filling sinval-queue. We went

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Pomarede Nicolas [EMAIL PROTECTED] writes: As ip4r seems to work very well with postgresql, is there a possibility to see it merged in postgresql, to have a native 4 bytes IPv4 address date type ? Given that the world is going to IPv6 in a few years whether you like it or not, that seems

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-11 Thread Tom Lane
Jakub Ouhrabka [EMAIL PROTECTED] writes: We'we tried hard to identify what's the cause of filling sinval-queue. We went through query logs as well as function bodies stored in the database. We were not able to find any DDL, temp table creations etc. Strange. The best idea that comes to mind

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Pomarede Nicolas
On Fri, 11 Jan 2008, Tom Lane wrote: Pomarede Nicolas [EMAIL PROTECTED] writes: As ip4r seems to work very well with postgresql, is there a possibility to see it merged in postgresql, to have a native 4 bytes IPv4 address date type ? Given that the world is going to IPv6 in a few years

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Andrew Sullivan
On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote: Given that the world is going to IPv6 in a few years whether you like it or not, that seems pretty darn short-sighted to me. Indeed. Even ARIN has finally started to tell people that IPv4 is running out. There are currently significant

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote: Given that the world is going to IPv6 in a few years whether you like it or not, that seems pretty darn short-sighted to me. Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Mike

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
Steve Atkins wrote: I'd actually support removing inet/cidr from core completely in the longer run. Postgresql is extensible, so we really don't need types used only by niche users in core, once we have pgfoundry and something like mysqludf.org/CPAN. But that's a longer term thought. I

Re: [PERFORM] Evaluating boolean formula: slow performance

2008-01-11 Thread Jörg Kiegeland
I have to correct my statement, a query seems to be only linear when using OR. This however is strange since some weeks ago, I could get good response times when using exclusively AND. I will investigate on this.. ---(end of broadcast)--- TIP

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins
On Jan 11, 2008, at 7:19 AM, Tom Lane wrote: Pomarede Nicolas [EMAIL PROTECTED] writes: As ip4r seems to work very well with postgresql, is there a possibility to see it merged in postgresql, to have a native 4 bytes IPv4 address date type ? Given that the world is going to IPv6 in a

[PERFORM] Evaluating boolean formula: slow performance

2008-01-11 Thread Jörg Kiegeland
I have the following problem: In my table T, there are a fixed number of boolean columns, C1, .., Cn. Now, a common query is to search in this table for tuples fullfilling an arbitrary boolean condition (however only using AND and OR), e.g. SELECT * FROM T WHERE (C1 OR C2) AND (C3 OR C4 AND

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread D'Arcy J.M. Cain
On Fri, 11 Jan 2008 15:07:38 -0500 Tom Lane [EMAIL PROTECTED] wrote: Michael Stone [EMAIL PROTECTED] writes: Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Nonsense. 3 bytes overhead on a 16-byte address is not ridiculously bloated,

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote: Besides, there are many cases where you want to track both ipv4 and ipv6 for the same purpose and requiring two different fields would be less than ideal. And, there are many cases where you don't. I've got two kinds of db's

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Alvaro Herrera
Michael Stone wrote: On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote: Michael Stone [EMAIL PROTECTED] writes: Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Nonsense. 3 bytes overhead on a 16-byte address is not ridiculously

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Michael Stone [EMAIL PROTECTED] writes: Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Nonsense. 3 bytes overhead on a 16-byte address is not ridiculously bloated, especially if you want a netmask with it. regards,

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 04:32:05PM -0500, Michael Stone wrote: On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote: Besides, there are many cases where you want to track both ipv4 and ipv6 for the same purpose and requiring two different fields would be less than ideal. And,

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote: Michael Stone [EMAIL PROTECTED] writes: Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Nonsense. 3 bytes overhead on a 16-byte address is not ridiculously bloated, especially if

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Pomarede Nicolas
On Fri, 11 Jan 2008, Andrew Sullivan wrote: On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote: networks), but there's a conspicuous lack of a type for (hosts). I suppose if you really are sure that you want to store hosts and not networks Well, part of the trouble is that in the

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 06:37:10PM -0300, Alvaro Herrera wrote: So what this means is that our type oughta be optimized. How about having a separate bit to indicate whether there is a netmask or not, and chop the storage earlier. (I dunno if this already done) Why not just have a type that

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Andrew Sullivan
On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote: networks), but there's a conspicuous lack of a type for (hosts). I suppose if you really are sure that you want to store hosts and not networks Well, part of the trouble is that in the CIDR world, an IP without a netmask can be

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins
On Jan 11, 2008, at 2:24 PM, Andrew Sullivan wrote: On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote: networks), but there's a conspicuous lack of a type for (hosts). I suppose if you really are sure that you want to store hosts and not networks Well, part of the trouble is

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Andrew Sullivan
On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote: I don't think there's ambiguity about what an dotted-quad without a netmask means, and hasn't been for a long time. Am I missing something? Well, maybe. The problem is actually that, without a netmask under CIDR, the address

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins
On Jan 11, 2008, at 3:00 PM, Andrew Sullivan wrote: On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote: I don't think there's ambiguity about what an dotted-quad without a netmask means, and hasn't been for a long time. Am I missing something? Well, maybe. The problem is actually

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, 11 Jan 2008 15:37:37 -0800 Steve Atkins [EMAIL PROTECTED] wrote: Well, maybe. The problem is actually that, without a netmask under CIDR, the address alone isn't really enough. You have to have a netmask to get the packets to

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins
On Jan 11, 2008, at 3:52 PM, Joshua D. Drake wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, 11 Jan 2008 15:37:37 -0800 Steve Atkins [EMAIL PROTECTED] wrote: Well, maybe. The problem is actually that, without a netmask under CIDR, the address alone isn't really enough. You

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 06:00:55PM -0500, Andrew Sullivan wrote: another way, without context, a dotted-quad is insufficient on its own. What you're really arguing is that the context ought to be storable somewhere else (maybe in a human's brain) Or, say, in a database schema, where you say

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote: Steve I think you are speaking of practicality and implementation versus RFC compliance. I believe per the RFC Andrew is correct. There's an RFC for storing IPs in a database? Mike Stone ---(end of

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Michael Stone [EMAIL PROTECTED] writes: On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote: Nonsense. 3 bytes overhead on a 16-byte address is not ridiculously bloated, especially if you want a netmask with it. Big if, no? There's a very large set of users that *don't* want/need a

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote: It was correct, but not as of 8.3. Considering you could save a whole one byte by not storing the netmask Hmm. One for the netmask, plus the other two mystery bytes. :-) A byte here and a byte there is fine. 20% of a few billion IPs

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, 11 Jan 2008 19:21:29 -0500 Michael Stone [EMAIL PROTECTED] wrote: On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote: Steve I think you are speaking of practicality and implementation versus RFC compliance. I believe per the

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Pomarede Nicolas [EMAIL PROTECTED] writes: And it's true for IPv6 too, storing an IP that refer to an end point and not a subnet is requiring twice as much data as needed, because the netmask would always be ff:ff:ff:..:ff So, for people dealing with large database of IPs, it would be nice

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, 11 Jan 2008 19:36:02 -0500 Michael Stone [EMAIL PROTECTED] wrote: On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote: It was correct, but not as of 8.3. Considering you could save a whole one byte by not storing the netmask Hmm.

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 04:43:18PM -0800, Joshua D. Drake wrote: Sigh. No. But there is an RFC that declare how IPs are denoted and used. Sigh, I'm honestly curious about what RFC says that endpoint IPs must have netmasks associated with them. Mike Stone ---(end of

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
On Fri, Jan 11, 2008 at 04:57:46PM -0800, Joshua D. Drake wrote: Question is.. are you going to have a few billion IPs in your database? Doubtful. Really depends on what you're using the database for, doesn't it? Mike Stone ---(end of

[PERFORM] Simple select, but takes long time

2008-01-11 Thread James DeMichele
Hi, I am having a really hard time trying to figure out why my simple count(*) query is taking so long. I have a table with 1,296,070 rows in it. There are 2 different types of information that each row has that I care about: status : character(1) source_id : bigint Then, I have the

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone
Tom wrote: There seem to be a number of people in this thread laboring under the illusion that we store a netmask as a mask. It's a bit count (think /32 or /128) and occupies a whole one byte on disk. Killer overhead, for sure. There's no need to be quite so snarky. The netmask isn't the

Re: [PERFORM] Simple select, but takes long time

2008-01-11 Thread Tom Lane
James DeMichele [EMAIL PROTECTED] writes: I am having a really hard time trying to figure out why my simple count(*) query is taking so long. I have a table with 1,296,070 rows in it. There are 2 different types of information that each row has that I care about: Hmm, the EXPLAIN output works

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Greg Smith
On Fri, 11 Jan 2008, Steve Atkins wrote: You may well need netmasks to configure your interface, but there's absolutely no need for them to identify an IP endpoint, which is all you need to identify the destination the packet is going to, and that is the most common use of IP addresses.

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread david
On Fri, 11 Jan 2008, Greg Smith wrote: You may well need netmasks to configure your interface, but there's absolutely no need for them to identify an IP endpoint, which is all you need to identify the destination the packet is going to, and that is the most common use of IP addresses.

Re: [PERFORM] Simple select, but takes long time

2008-01-11 Thread andrew
James DeMichele [EMAIL PROTECTED] wrote .. Then, I have the following index on the table: this_index (status, source_id, another_column) If you have many queries of this type, do CLUSTER this_index ON tablename; and retry the SELECT. MySQL is using some sort of auto-clustering ISAM on the

Re: [PERFORM] Simple select, but takes long time

2008-01-11 Thread Tom Lane
[EMAIL PROTECTED] writes: MySQL is using some sort of auto-clustering ISAM on the other box mayhaps? I think they probably count the index entries without ever visiting the table proper. Works great for mainly-static data ... regards, tom lane

Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread D'Arcy J.M. Cain
On Fri, 11 Jan 2008 21:56:38 -0500 (EST) Greg Smith [EMAIL PROTECTED] wrote: On Fri, 11 Jan 2008, Steve Atkins wrote: You may well need netmasks to configure your interface, but there's absolutely no need for them to identify an IP endpoint, which is all you need to identify the