Re: [PERFORM] Best way to index IP data?

2008-01-14 Thread Florian Weimer
* Steve Atkins:

 I don't think there's ambiguity about what an dotted-quad without a
 netmask means, and hasn't been for a long time. Am I missing
 something?

Classful addressing is still part of many user interfaces, for
instance Cisco's IOS.  It's not just that the CLI accepts it, it's
also the standard[*] output format (that is, 192.0.2.0 instead of
192.0.2.0/24; if no prefix length is given, the one based on the
class is used).

[*] I don't think you can switch it off.  Obviously, for backwards
compatibility reasons, the default has to stay anyway.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Best way to index IP data?

2008-01-14 Thread Michael Stone

On Mon, Jan 14, 2008 at 09:27:52AM +0100, Florian Weimer wrote:

* Steve Atkins:

I don't think there's ambiguity about what an dotted-quad without a
netmask means, and hasn't been for a long time. Am I missing
something?


Classful addressing is still part of many user interfaces, for
instance Cisco's IOS.  It's not just that the CLI accepts it, it's
also the standard[*] output format (that is, 192.0.2.0 instead of
192.0.2.0/24; if no prefix length is given, the one based on the
class is used).


Again, is this at all ambiguous *in the context of a host IP*? (IOW, if 
I say that I recieved an IP packet from 10.0.0.1, are you really going 
to ask me to clarify the netmask?)


Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to index IP data?

2008-01-14 Thread Michael Stone

On Mon, Jan 14, 2008 at 03:05:27PM +0100, Florian Weimer wrote:

Hmm.  It's an argument for a separate CIDR type, not against a host
type.


I don't think anyone argued against the CIDR type. :-)

Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Pomarede Nicolas

On Thu, 10 Jan 2008, Jonah H. Harris wrote:


On Jan 10, 2008 6:25 PM, Steve Atkins [EMAIL PROTECTED] wrote:

http://pgfoundry.org/projects/ip4r/

That has the advantage over using integers, or the built-in inet type,
of being indexable for range and overlap queries.


Agreed.  ip4r is da bomb.


Hello to all,

I also have to store a lot of IP v4 addresses, and I think the internal 
inet type is somewhat overkill for that, since it always require 8 bytes, 
even if you don't need to store a netmask.
When storing millions of IP add, this means MB of space used for nothing 
in that case.


As ip4r seems to work very well with postgresql, is there a possibility to 
see it merged in postgresql, to have a native 4 bytes IPv4 address date 
type ?


Nicolas




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Pomarede Nicolas [EMAIL PROTECTED] writes:
 As ip4r seems to work very well with postgresql, is there a possibility to 
 see it merged in postgresql, to have a native 4 bytes IPv4 address date 
 type ?

Given that the world is going to IPv6 in a few years whether you like it
or not, that seems pretty darn short-sighted to me.

What would make sense IMHO is to adapt the improved indexing support in
ip4r to work on the native inet/cidr types.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Pomarede Nicolas

On Fri, 11 Jan 2008, Tom Lane wrote:


Pomarede Nicolas [EMAIL PROTECTED] writes:

As ip4r seems to work very well with postgresql, is there a possibility to
see it merged in postgresql, to have a native 4 bytes IPv4 address date
type ?


Given that the world is going to IPv6 in a few years whether you like it
or not, that seems pretty darn short-sighted to me.

What would make sense IMHO is to adapt the improved indexing support in
ip4r to work on the native inet/cidr types.



I understand your point on IPv6, but still being able to store IPv4 
addresses with as little overhead as possible is important.


IPv6 will certainly grow in the year to come, but if you consider the case 
of a very large private lan, with ip in the form 10.x.y.z, the fact that 
the outside world is now ipv6 doesn't necessarily imply you will rename 
all your internal equipments to be ipv6 if you don't need more addresses 
(you can do the translation when packets cross the ipv6/ipv4 gateway in 
your network).


To be more concret, I'm working for a large french ISP, so far we have 2+ 
millions boxes (triple play adsl equipments) at our customers' home.


All theses boxes have a private IPv4 address in 10.x.y.z as well as a 
public IPV4 address, and although we recently activated a public IPv6 addr 
on these boxes too (which certainly gives one of the biggest IPv6 network 
so far), we still need to store one ipv4 and one ipv6 addr for each box.


So, my point was not to be short-sighted, we will go IPv6 for sure, it's 
just that there're a lot of applications where storing ipv4 addr could be 
needed (whether ipv6 is required for other applications or not), and in 
this regard, I think that being able to store ipv4 addr with 4 bytes 
instead of 8 could be appreciated.


Or perhaps another solution would be to have built-in inet_aton / 
inet_ntoa functions in postgres, to store the result using an integer 
(unsigned) ?



Nicolas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Andrew Sullivan
On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote:
 Given that the world is going to IPv6 in a few years whether you like it
 or not, that seems pretty darn short-sighted to me.

Indeed.  Even ARIN has finally started to tell people that IPv4 is running
out.  There are currently significant deployments of IPv6 in the
Asia-Pacific region.  And it appears that Comcast is planning to move to
IPv6 for its own network deployment, which may mean that many U.S. homes
will have native v6 in the near future (the upshot of their plans aren't
actually clear to me yet, but if you're interested in some of what they're
telling people they're doing, look for Alain Durand's presentation to the
v6ops working group at the last IETF meeting).  

 What would make sense IMHO is to adapt the improved indexing support in
 ip4r to work on the native inet/cidr types.

This seems like a good idea to me.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote:

Given that the world is going to IPv6 in a few years whether you like it
or not, that seems pretty darn short-sighted to me.


Well, a native IPv6 type would also be nice; inet is ridiculously 
bloated for both IPv4 *and* IPv6. 


Mike Stone

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake

Steve Atkins wrote:


I'd actually support removing inet/cidr from core completely in the longer
run. Postgresql is extensible, so we really don't need types used only
by niche users in core, once we have pgfoundry and something like
mysqludf.org/CPAN. But that's a longer term thought.


I believe this is going to be solved by postgresqlpackages.org.

Joshua D. Drake



Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins


On Jan 11, 2008, at 7:19 AM, Tom Lane wrote:


Pomarede Nicolas [EMAIL PROTECTED] writes:
As ip4r seems to work very well with postgresql, is there a  
possibility to
see it merged in postgresql, to have a native 4 bytes IPv4 address  
date

type ?


Given that the world is going to IPv6 in a few years whether you  
like it

or not, that seems pretty darn short-sighted to me.

What would make sense IMHO is to adapt the improved indexing support  
in

ip4r to work on the native inet/cidr types.


Can't be done. The native types are too limited to be effectively  
indexed
in that  way - they cannot represent arbitrary ranges. ip4r started  
with me
trying to retrofit decent indexing onto the cidr type and failing  
miserably.


I'll likely be rolling out ip6r/ipr sometime in 2008, as users are  
beginning to

express an interest. But even then I don't expect it to replace the inet
and cidr types in core, because it isn't compatible with them.

I'd actually support removing inet/cidr from core completely in the  
longer

run. Postgresql is extensible, so we really don't need types used only
by niche users in core, once we have pgfoundry and something like
mysqludf.org/CPAN. But that's a longer term thought.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread D'Arcy J.M. Cain
On Fri, 11 Jan 2008 15:07:38 -0500
Tom Lane [EMAIL PROTECTED] wrote:
 Michael Stone [EMAIL PROTECTED] writes:
  Well, a native IPv6 type would also be nice; inet is ridiculously 
  bloated for both IPv4 *and* IPv6. 
 
 Nonsense.  3 bytes overhead on a 16-byte address is not ridiculously
 bloated, especially if you want a netmask with it.

Besides, there are many cases where you want to track both ipv4 and
ipv6 for the same purpose and requiring two different fields would be
less than ideal.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote:

Besides, there are many cases where you want to track both ipv4 and
ipv6 for the same purpose and requiring two different fields would be
less than ideal.


And, there are many cases where you don't. I've got two kinds of db's 
that have IPs in them. In some, the IP is a small part of a table which 
is focused on something else. For those I use inet, which provides a 
nice bit of future-proofing. In other db's the IPs are the primary 
focus. There are lots and lots of IPs, and the space used by IPs may be 
the largest chunk of a particular table. For those tables, I don't use 
inet because the overhead really is a significant fraction of the space.


Mike Stone

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Alvaro Herrera
Michael Stone wrote:
 On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote:
 Michael Stone [EMAIL PROTECTED] writes:
 Well, a native IPv6 type would also be nice; inet is ridiculously bloated 
 for both IPv4 *and* IPv6. 

 Nonsense.  3 bytes overhead on a 16-byte address is not ridiculously
 bloated, especially if you want a netmask with it.

 Big if, no? There's a very large set of users that *don't* want/need a 
 netmask, which is why the topic keeps coming back. (Also, according to the 
 docs, inet requires 24 bytes, which is 50% more than needed; is that not 
 correct?)

So what this means is that our type oughta be optimized.  How about
having a separate bit to indicate whether there is a netmask or not, and
chop the storage earlier.  (I dunno if this already done)

Also, with packed varlenas the overhead is reduced AFAIK.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Michael Stone [EMAIL PROTECTED] writes:
 Well, a native IPv6 type would also be nice; inet is ridiculously 
 bloated for both IPv4 *and* IPv6. 

Nonsense.  3 bytes overhead on a 16-byte address is not ridiculously
bloated, especially if you want a netmask with it.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 04:32:05PM -0500, Michael Stone wrote:

On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote:

Besides, there are many cases where you want to track both ipv4 and
ipv6 for the same purpose and requiring two different fields would be
less than ideal.


And, there are many cases where you don't. I've got two kinds of db's 
that have IPs in them. In some, the IP is a small part of a table which 
is focused on something else. For those I use inet, which provides a 
nice bit of future-proofing. In other db's the IPs are the primary 
focus. There are lots and lots of IPs, and the space used by IPs may be 
the largest chunk of a particular table. For those tables, I don't use 
inet because the overhead really is a significant fraction of the space.


Oh, yeah, the latter type also has seperate IPv4 and IPv6 tables, 
because there's no point in bloating 99% of the data for the 1% that's 
IPv6. Is that a niche requirement? Maybe--but I think that storing 
netmasks is even *more* of a niche...


I'm not arguing for the removal of inet, but I do think there's room for 
more than one type--and I certainly think its nuts to pretend that inet 
can meet every requirement well.


Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote:

Michael Stone [EMAIL PROTECTED] writes:
Well, a native IPv6 type would also be nice; inet is ridiculously 
bloated for both IPv4 *and* IPv6. 


Nonsense.  3 bytes overhead on a 16-byte address is not ridiculously
bloated, especially if you want a netmask with it.


Big if, no? There's a very large set of users that *don't* want/need a 
netmask, which is why the topic keeps coming back. (Also, according to 
the docs, inet requires 24 bytes, which is 50% more than needed; is that 
not correct?)


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Pomarede Nicolas

On Fri, 11 Jan 2008, Andrew Sullivan wrote:


On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote:


networks), but there's a conspicuous lack of a type for (hosts). I
suppose if you really are sure that you want to store hosts and not
networks


Well, part of the trouble is that in the CIDR world, an IP without a netmask
can be dangerously ambiguous.  I can see why the design is as it is for that
reason.  (But I understand the problem.)

A



Yes, in fact it all depends on the meaning you give to an IP.

If you want to store subnets, then you need an IP and a netmask, but if 
you just want to store the IP of a particular equipment (that is, the IP 
that will be refered to in the TCP/IP header), then there's no ambiguity, 
you just need 4 bytes to describe this IP.


And it's true for IPv6 too, storing an IP that refer to an end point and 
not a subnet is requiring twice as much data as needed, because the 
netmask would always be ff:ff:ff:..:ff


So, for people dealing with large database of IPs, it would be nice to be 
able to save 50% of the corresponding disk/cache/ram space for these IPs.



Nicolas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 06:37:10PM -0300, Alvaro Herrera wrote:

So what this means is that our type oughta be optimized.  How about
having a separate bit to indicate whether there is a netmask or not, and
chop the storage earlier.  (I dunno if this already done)


Why not just have a type that indicates whether there is a netmask or 
not? We currently have this (8.3 docs, which I see reflects the 3 byte 
overhead--down to 20% rather than 50% for IPv6):


cidr7 or 19 bytes   IPv4 and IPv6 networks
inet7 or 19 bytes   IPv4 and IPv6 hosts and networks

Note that there's a type for (networks), and there's a type for (hosts and 
networks), but there's a conspicuous lack of a type for (hosts). I 
suppose if you really are sure that you want to store hosts and not 
networks you should use inet and then set a constraint like

 if (family() == 4  masklen() == 32)
 elsif (family() == 6  masklen() == 128)

(For people whose databases don't resolve around network data, this 
probably seems like not a big deal. OTOH, I can only imagine the outcry 
if the only available arithmetic type was an intfloat, which can be 
either an integer or a real number, has very low overhead to keep track 
of whether there's a decimal point, and can easily be made to behave 
like an integer if you set a constraint forbidding fractional parts.
Because, hey, you *never know* when you might need a real number, and 
wouldn't want to paint yourself into a corner by stupidly specifying an 
integer-only type.)


Mike Stone

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Andrew Sullivan
On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote:

 networks), but there's a conspicuous lack of a type for (hosts). I 
 suppose if you really are sure that you want to store hosts and not 
 networks 

Well, part of the trouble is that in the CIDR world, an IP without a netmask
can be dangerously ambiguous.  I can see why the design is as it is for that
reason.  (But I understand the problem.)

A


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins


On Jan 11, 2008, at 2:24 PM, Andrew Sullivan wrote:


On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote:


networks), but there's a conspicuous lack of a type for (hosts). I
suppose if you really are sure that you want to store hosts and not
networks


Well, part of the trouble is that in the CIDR world, an IP without a  
netmask
can be dangerously ambiguous.  I can see why the design is as it is  
for that

reason.  (But I understand the problem.)


I don't think there's ambiguity about what an dotted-quad without a  
netmask

means, and hasn't been for a long time. Am I missing something?

There is ambiguity when you feed non dotted-quads into the
existing cidr I/O functions[1], but that's both a dead horse,
and not something likely to actually affect users negatively.

Cheers,
  Steve

[1] Because postgresql copied obsolete pre-CIDR code from libbind.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Andrew Sullivan
On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote:
 I don't think there's ambiguity about what an dotted-quad without a  
 netmask
 means, and hasn't been for a long time. Am I missing something?

Well, maybe.  The problem is actually that, without a netmask under CIDR,
the address alone isn't really enough.  You have to have a netmask to get
the packets to the destination.  As it happens, we have some nice
conventions, defined in the RFCs, for how to interpret hosts with no
netmask; note though that some of those are only for humans.  Or, to put it
another way, without context, a dotted-quad is insufficient on its own. 
What you're really arguing is that the context ought to be storable
somewhere else (maybe in a human's brain).  I'm not suggesting that's wrong,
but I can see the correctness argument that someone might have made to get
to the datatype as it exists.  I think calling it needless bloat is just
holding it to the wrong criteria.

If you look at the binary wire data, that netmask is always represented in
some sense.  It can sometimes be more compact than the general-purpose data
type, though, no question.  This is why somewhere in this thread someone
talked about optimisation: there certainly are ways to make these things
more compact.

A

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins


On Jan 11, 2008, at 3:00 PM, Andrew Sullivan wrote:


On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote:

I don't think there's ambiguity about what an dotted-quad without a
netmask
means, and hasn't been for a long time. Am I missing something?


Well, maybe.  The problem is actually that, without a netmask under  
CIDR,
the address alone isn't really enough.  You have to have a netmask  
to get

the packets to the destination.


Not really. You may well need netmasks to configure your interface, but
there's absolutely no need for them to identify an IP endpoint, which  
is all
you need to identify the destination the packet is going to, and that  
is the

most common use of IP addresses.


As it happens, we have some nice
conventions, defined in the RFCs, for how to interpret hosts with no
netmask; note though that some of those are only for humans.  Or, to  
put it
another way, without context, a dotted-quad is insufficient on its  
own.

What you're really arguing is that the context ought to be storable
somewhere else (maybe in a human's brain).


A dotted quad without any additional information is an IPv4 address,
the same as it would be if followed by /32.

Netmasks are rarely needed at any level above routing
or (some forms of) address assignment, and that's where an awful lot  
of use of IP

addresses happens. When you do need netmasks then the cidr type is
great, but that's not the common case.

(And the inet-with-a-netmask is an even odder duck, as it's packing  
two mostly

unrelated bits of information into a single type).


I'm not suggesting that's wrong,
but I can see the correctness argument that someone might have  
made to get
to the datatype as it exists.  I think calling it needless bloat  
is just

holding it to the wrong criteria.


If you look at the binary wire data, that netmask is always  
represented in
some sense.  It can sometimes be more compact than the general- 
purpose data
type, though, no question.  This is why somewhere in this thread  
someone
talked about optimisation: there certainly are ways to make these  
things

more compact.


I think we're drifting a long way away from a -performance topic here,  
as
we're agreed that inet or cidr are likely not the best types for the  
original

poster to use.

Cheers,
  Steve


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 11 Jan 2008 15:37:37 -0800
Steve Atkins [EMAIL PROTECTED] wrote:

  Well, maybe.  The problem is actually that, without a netmask
  under CIDR,
  the address alone isn't really enough.  You have to have a netmask  
  to get
  the packets to the destination.
 
 Not really. You may well need netmasks to configure your interface,
 but there's absolutely no need for them to identify an IP endpoint,
 which is all
 you need to identify the destination the packet is going to, and
 that is the
 most common use of IP addresses.

Steve I think you are speaking of practicality and implementation
versus RFC compliance. I believe per the RFC Andrew is correct.

Sincerely,

Joshua D. Drake




- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHiAEtATb/zqfZUUQRAqVjAJ9cXrRmDyCYy1vwP6nYI2kbOlYxKgCgga9q
jIMuXCy8LKquevyPdehaQaA=
=FNIf
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Steve Atkins


On Jan 11, 2008, at 3:52 PM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 11 Jan 2008 15:37:37 -0800
Steve Atkins [EMAIL PROTECTED] wrote:


Well, maybe.  The problem is actually that, without a netmask
under CIDR,
the address alone isn't really enough.  You have to have a netmask
to get
the packets to the destination.


Not really. You may well need netmasks to configure your interface,
but there's absolutely no need for them to identify an IP endpoint,
which is all
you need to identify the destination the packet is going to, and
that is the
most common use of IP addresses.


Steve I think you are speaking of practicality and implementation
versus RFC compliance. I believe per the RFC Andrew is correct.


I don't believe that's the case, but really we're at how many angels
dance on the head of a pin level quibbling by this point. :)

Cheers,
  Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 06:00:55PM -0500, Andrew Sullivan wrote:
another way, without context, a dotted-quad is insufficient on its own. 
What you're really arguing is that the context ought to be storable

somewhere else (maybe in a human's brain)


Or, say, in a database schema, where you say this column contains host 
IPs? Are you suggesting that the IP associated with the host can 
somehow be ambiguous? IMO, the only ambiguity is that the type is 
defined as network or host.


Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote:

Steve I think you are speaking of practicality and implementation
versus RFC compliance. I believe per the RFC Andrew is correct.


There's an RFC for storing IPs in a database?

Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Michael Stone [EMAIL PROTECTED] writes:
 On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote:
 Nonsense.  3 bytes overhead on a 16-byte address is not ridiculously
 bloated, especially if you want a netmask with it.

 Big if, no? There's a very large set of users that *don't* want/need a 
 netmask, which is why the topic keeps coming back. (Also, according to 
 the docs, inet requires 24 bytes, which is 50% more than needed; is that 
 not correct?)

It was correct, but not as of 8.3.  Considering you could save a whole
one byte by not storing the netmask (well, maybe more depending on
alignment considerations), the complaint level is unjustified.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote:

It was correct, but not as of 8.3.  Considering you could save a whole
one byte by not storing the netmask


Hmm. One for the netmask, plus the other two mystery bytes. :-) A byte 
here and a byte there is fine. 20% of a few billion IPs does start to

add up.

Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 11 Jan 2008 19:21:29 -0500
Michael Stone [EMAIL PROTECTED] wrote:

 On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote:
 Steve I think you are speaking of practicality and implementation
 versus RFC compliance. I believe per the RFC Andrew is correct.
 
 There's an RFC for storing IPs in a database?

Sigh. No. But there is an RFC that declare how IPs are denoted and used.

Joshua D. Drake

 
 Mike Stone
 
 ---(end of
 broadcast)--- TIP 1: if posting/reading
 through Usenet, please send an appropriate subscribe-nomail command
 to [EMAIL PROTECTED] so that your message can get through to
 the mailing list cleanly
 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHiA0oATb/zqfZUUQRAgWNAJ9F+9Kxxazh0QK0z9zcskkG1UhFnQCgpqA1
hGiNsL5wjqxM7bzW6qNJfrE=
=/FxX
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Tom Lane
Pomarede Nicolas [EMAIL PROTECTED] writes:
 And it's true for IPv6 too, storing an IP that refer to an end point and 
 not a subnet is requiring twice as much data as needed, because the 
 netmask would always be ff:ff:ff:..:ff
 So, for people dealing with large database of IPs, it would be nice to be 
 able to save 50% of the corresponding disk/cache/ram space for these IPs.

There seem to be a number of people in this thread laboring under the
illusion that we store a netmask as a mask.  It's a bit count (think
/32 or /128) and occupies a whole one byte on disk.  Killer overhead,
for sure.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 11 Jan 2008 19:36:02 -0500
Michael Stone [EMAIL PROTECTED] wrote:

 On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote:
 It was correct, but not as of 8.3.  Considering you could save a
 whole one byte by not storing the netmask
 
 Hmm. One for the netmask, plus the other two mystery bytes. :-) A
 byte here and a byte there is fine. 20% of a few billion IPs does
 start to add up.

About a 65.00 hard disk:

select pg_size_pretty((5000 * 0.2)::bigint);
 pg_size_pretty 
- 
 93 GB

But wumpf... that is a lot of ip addresses. Question is.. are you going
to have a few billion IPs in your database? Doubtful.

Sincerely,

Joshua D. Drake


 
 Mike Stone
 
 ---(end of
 broadcast)--- TIP 9: In versions below 8.0,
 the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHiBCMATb/zqfZUUQRAotjAJ4r6kjuO8pOZzD316Va1AE8VNt6TgCggQcT
lI/kT2DF59Zuu7cbipdBpPI=
=/xl5
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 04:43:18PM -0800, Joshua D. Drake wrote:

Sigh. No. But there is an RFC that declare how IPs are denoted and used.


Sigh, I'm honestly curious about what RFC says that endpoint IPs must 
have netmasks associated with them.


Mike Stone

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

On Fri, Jan 11, 2008 at 04:57:46PM -0800, Joshua D. Drake wrote:

Question is.. are you going
to have a few billion IPs in your database? Doubtful.


Really depends on what you're using the database for, doesn't it?

Mike Stone

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Michael Stone

Tom wrote:
There seem to be a number of people in this thread laboring under the 
illusion that we store a netmask as a mask.  It's a bit count (think 
/32 or /128) and occupies a whole one byte on disk.  Killer overhead, 
for sure.


There's no need to be quite so snarky. The netmask isn't the only part 
of the overhead, but you did invite discussion of the netmask in 
particular when you said 3 bytes overhead on a 16-byte address is not 
'ridiculously bloated', *especially if you want a netmask with it*. You 
might be hearing less about netmasks if you hadn't used them to justify 
the size of the inet type. :-) There's also a number of issues being 
conflated, as tends to happen when the pent up displeasure about inet 
erupts on its semi-annual schedule. For myself, I mentioned two distinct 
issues:


1) overhead (over absolute minimum required): 20% for IPv6 and *75%* for 
IPv4. (In fairness, I actually am testing using inet for ipv6 tables, on 
the assumption that I'll get another order of magnitude out of the 
hardware before I really need high-volume ipv6 storage, and then it 
really won't matter. But today, for the kind of addresses seen in the 
real world, it really does matter. Also, recall that while you live in 
the development version, those of us in the release world are dealing 
with overheads of 50% for IPv6 and *200%* for IPv4. It'll take us a 
while to recalibrate. :-)


2) ambiguity/completeness of data types (is it a host? is it a network? 
what data type do I use if I really want to ensure that people don't 
stick routing information into my host column?)


netmasks are kinda part of both, but they aren't the main point of 
either. 

As far as the hostility surrounding discussions of inet, I understand 
and appreciate that there are reasons inet  cidr work the way they do, 
and I find inet to be very useful in certain cases. But there are also 
cases where they suck, and responses along the lines of you should be 
using ipv6 anyway don't ease the pain any. :-) Responses like it would 
just be too much work to support seperate ipv4  ipv6 data types would 
still suck :-) but at least they wouldn't be telling people that they're 
imagining the problems that inet has had meeting their particular 
requirements. And regardless of whether postgres gets a seperate ipv4 
data type, I'd still like to have an inethost data type as a 
complement to the cidr data type. :-)


Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread Greg Smith

On Fri, 11 Jan 2008, Steve Atkins wrote:

You may well need netmasks to configure your interface, but there's 
absolutely no need for them to identify an IP endpoint, which is all you 
need to identify the destination the packet is going to, and that is the 
most common use of IP addresses.


Technically you can't ever send a packet unless you know both the endpoint 
and your local netmask.  As the sender, you're obligated to determine if 
the destination is on your local LAN (in which case you send it there) or 
if it goes to the gateway.  That's similar to a routing decision, but it's 
not quite--if you don't have to look in a routing table, it's not actually 
part of routing.


I believe this sort of detail is why subnet masks are considered required 
for some things even though it doesn't seem like they are needed. 
Regardless, the details of how the packets move aren't important to some 
applications, and arguing over what the RFCs do and don't require doesn't 
change that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread david

On Fri, 11 Jan 2008, Greg Smith wrote:

You may well need netmasks to configure your interface, but there's 
absolutely no need for them to identify an IP endpoint, which is all you 
need to identify the destination the packet is going to, and that is the 
most common use of IP addresses.


Technically you can't ever send a packet unless you know both the endpoint 
and your local netmask.  As the sender, you're obligated to determine if the 
destination is on your local LAN (in which case you send it there) or if it 
goes to the gateway.  That's similar to a routing decision, but it's not 
quite--if you don't have to look in a routing table, it's not actually part 
of routing.


you also need to know your local IP address, but there is no reason to 
need the netmask of the other end


my IP address is 64.81.33.126 why do you need to know my netmaask? how 
would you find out what it is?


DNS doesn't report the netmask,and it's arguably the biggest database of 
IP addresses around ;-)


one of the biggest reasons for storing IP addresses in a SQL database is 
as part of log analysis.


David Lang

I believe this sort of detail is why subnet masks are considered required for 
some things even though it doesn't seem like they are needed. Regardless, the 
details of how the packets move aren't important to some applications, and 
arguing over what the RFCs do and don't require doesn't change that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread D'Arcy J.M. Cain
On Fri, 11 Jan 2008 21:56:38 -0500 (EST)
Greg Smith [EMAIL PROTECTED] wrote:
 On Fri, 11 Jan 2008, Steve Atkins wrote:
 
  You may well need netmasks to configure your interface, but there's 
  absolutely no need for them to identify an IP endpoint, which is all you 
  need to identify the destination the packet is going to, and that is the 
  most common use of IP addresses.
 
 Technically you can't ever send a packet unless you know both the endpoint 
 and your local netmask.  As the sender, you're obligated to determine if 
 the destination is on your local LAN (in which case you send it there) or 
 if it goes to the gateway.  That's similar to a routing decision, but it's 
 not quite--if you don't have to look in a routing table, it's not actually 
 part of routing.

Not sure what your point is here.  Sure, you need the netmask but not
of every IP address you send to, only for the IP/network that you are
on.  That's a grand total of one netmask per interface that you need to
know.  And you don't store it in your database.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq