Re: [vox-tech] Memory addressing?
On Wed, Jun 23, 2010 at 06:37:41PM -0700, Bill Broadley wrote: > On 06/23/2010 10:42 AM, timri...@appahost.com wrote: > > The reason for the discussion was Brian's intrusion detection > > implementation stored the incoming packets in a hash > > table. The key to the hash table was quite > > large -- inbound IP address, outbound IP address, inbound port, and > > outbound port. I thought to my self, on a very large implementation (say > > Google) the table could grow to a billion entries. Could a hash table > > store this amount in memory? Could you allocate an array of half the > > total memory? Could you allocate an array of a billion integers? Brian, > > on his laptop, couldn't allocate a billion integers. But he could > > allocate a billion characters (bytes). Since I thought both bytes and > > integers were words, and since I thought memory stored words > > like registers stored words, we had our discussion. > > Sounds like an interesting discussion, sorry I missed it. Kinda of > amusing trying to handle such a hash table on a (older I assume) single > 32 bit laptop. So, I am about to put up the code for my mods to nProbe. I believe it runs on 64 bit. I really haven't done any work on the hashing part, but I am sure that the work that Luca Deri has done works well. How do you put a public git repository online? Somewhat like the following. http://www.gtk.org/download.html I found the following link. Does it look like it has the correct details? http://toolmantim.com/thoughts/setting_up_a_new_remote_git_repository brian -- Brian Lavender http://www.brie.com/brian/ "There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies." Professor C. A. R. Hoare The 1980 Turing award lecture ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On 06/23/2010 12:39 PM, Brian Lavender wrote: > What started this is that having a hash provides ideally a O(1). So, > I say that to achieve this, one would ideally want to have a large array > to store your IP addresses and a hashing function that provides good > distribution. To which, Tim pointed to using a smaller array and using > chaining, because he initially thought that arrays were limited to smaller > sizes than was observed by doing some simple tests. To which I replied, > using a fixed size array and the chaining could cause the hash to decay > into a linked list which has a O(n). Indeed, the classic memory vs access time for hashes. > And to which Tim has noted using a > large fixed size array may just take up all your architectural limits > of the program. I imagine that having such a large array would push > the lower bound of the stack up in memory, or if allocated at runtime, > would push the heap way down from the top. I had what sounds like a similar problem for web logging. I wanted to take a URL which contained agent, IP, URL, referer. I posted the source on lugod quite awhile ago. The pseudo code was: for hits: agent_id=getid(agent,"agent string"); ip_id=getid(ip,"ip string"); url_id=getid(url,"url string"); refer_id=getid(refer,"refer string"); append_log(agent_id,ip_id,url_id,refer_id,datastamp,returncode) Of course getid would lookup the value and if it didn't exist it would add that string to the table. This resulted in several advantages: * Made it much easier to extract info from the logs, things like how many hits did a subset of the website have last year vs this year so we can evaluate advertising effectiveness. * Logs were denser (less disk) * Running reports like top 10 for things like hits, bandwidth, IPs, agents, and 404s was easy. While load testing on a rather old machine (PII-400 I believe) I could record 1000 hits/sec even when using mysql as the backend. With a memory resident database or something lighter like sqllite I'd expect much better. Of course hardware has come a long way since the PII-400 as well. Such a setup might work well with for an IDS as well. With some care it should be relatively straight forward to make it parallel friendly. With all that said, I used the big hash method in a simple little tool I wrote to instantly tell you where your bandwidth is going: http://broadley.org/bill/tcpdump.pl The output: tcpdump -n -c 1000 | ./tcpdump.pl | head -5 1000 packets captured 1000 packets received by filter 0 packets dropped by kernel lines=1000 lengths=760 613884 128.120.46.32.8649 -> 128.120.246.1.48876 153463 128.120.46.33.8649 -> 128.120.246.1.48609 153462 128.120.46.33.8649 -> 128.120.246.52.54912 2684 131.215.74.22.61733 -> 128.120.246.102.9996 ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On 06/23/2010 11:36 AM, Ken Bloom wrote: > On a 32-bit machine, this will eat up most of the computer's address > space, including *all* application space, and some kernel space (so you > can expect things to segfault). Assuming no PAE. With PAE often the kernel and other user processes would be in different memory ranges. So the kernel and other apps could be perfectly happy while you fill your particular segment. > On a 64-bit machine, it should work > though. Nope, at least with the normal defaults stack segments tend to be less than 4GB. On a 32GB ram 64-bit machine: $ ./a.out Segmentation fault > Stuff gets pushed on the stack after the end of the array, approximately > 1GB into memory. This means you have a stack that's 1GB long when you > get inside printf, which is seriously much more than the OS is prepared > to let you have for your stack. (This will overflow the stack whether > your machine is 32-bits or 64-bits.) Try mallocing your array instead. Ya, definitely should work since that won't be on the stack. #include #include int *table; #define big 10 int main( void ) { int i; table = (int *) malloc(sizeof(int)*big); for (i=0;i (Oh and change that printf to be printf("%px Hello world\n",table); so > you can see whether the allocation is working.) > > If you're looking at an IDS that big, you're going to need to find an > appropriate caching data structure that can write the infrequently used > parts out to disk. Or you're going to have to find some other way of > minimizing the number of packets you're keeping track of at a time. Exactly. I'd suggest: 1) Don't require contiguous memory (you get much more that way) 2) Plan on parallel access (not a single thread/core) 3) Don't ignore pages, cache lines, L1/L2/L3 cache, etc. So ignoring the cost of the hash calculation a memory lookup takes approximately 100ns. If you do 8 or 16 in parallel you could get an effective latency (or throughput if you prefer) of 10ns. Of course if you can get into L3 cache the latency and bandwidth will go up an order of magnitude. Of course getting into L1/L2 will get you another magnitude. So basically for optimal performance you'll end up with a multi-level datastructure that allocates chunks of memory much less than 4GB. After all for any IDS the number of active sessions in any short time is going to be MUCH MUCH MUCH less than 2^96. To keep that in perspective if your IDS handled a billion machines sending 1000 packets a second for 2 billion years it still would be less than 2^96. So, sure a hash of the 2^96 space is easy to write and plenty for smaller installations as you scale you are going to have to get significantly more efficient. ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On 06/23/2010 10:42 AM, timri...@appahost.com wrote: > The reason for the discussion was Brian's intrusion detection > implementation stored the incoming packets in a hash > table. The key to the hash table was quite > large -- inbound IP address, outbound IP address, inbound port, and > outbound port. I thought to my self, on a very large implementation (say > Google) the table could grow to a billion entries. Could a hash table > store this amount in memory? Could you allocate an array of half the > total memory? Could you allocate an array of a billion integers? Brian, > on his laptop, couldn't allocate a billion integers. But he could > allocate a billion characters (bytes). Since I thought both bytes and > integers were words, and since I thought memory stored words > like registers stored words, we had our discussion. Sounds like an interesting discussion, sorry I missed it. Kinda of amusing trying to handle such a hash table on a (older I assume) single 32 bit laptop. > The following failed to compile with an array-to-large error: > int main( void ) > { > int table[ 10 ]; > return 0; > } Keep in mind, that's allocating on the stack, you could easily run out even on a 64 bit machine. Moving the int table up before the int main would allocate it directly from memory. Also I believe that requires contiguous memory, which may well be significantly lower than the free memory. > The following succeeded to compile: > int main( void ) > { > char table[ 10 ]; > return 0; > } > > This compiles but core dumps: > #include > int main( void ) > { > char table[ 10 ]; > printf( "Hello world!\n" ); > } Keep in mind that compilers are getting smarter and I've seen problems with blind arrays. Also depending on the OS they may or may not allocate memory that you ask for but don't initialized. I'm not sure offhand if C initializes statically allocated memory like that. So if you want to insure that a memory allocation (generically is working) I'd use the memory. The code I used was: #include #include #define big 10 int table[ big ]; int main( void ) { int i; for (i=0;ihttp://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On Wed, Jun 23, 2010 at 10:42:18AM -0700, timri...@appahost.com wrote: > > The reason for the discussion was Brian's intrusion detection > implementation stored the incoming packets in a hash > table. The key to the hash table was quite > large -- inbound IP address, outbound IP address, inbound port, and > outbound port. I thought to my self, on a very large implementation (say > Google) the table could grow to a billion entries. Could a hash table > store this amount in memory? Could you allocate an array of half the > total memory? Could you allocate an array of a billion integers? Brian, > on his laptop, couldn't allocate a billion integers. But he could > allocate a billion characters (bytes). Since I thought both bytes and > integers were words, and since I thought memory stored words > like registers stored words, we had our discussion. What started this is that having a hash provides ideally a O(1). So, I say that to achieve this, one would ideally want to have a large array to store your IP addresses and a hashing function that provides good distribution. To which, Tim pointed to using a smaller array and using chaining, because he initially thought that arrays were limited to smaller sizes than was observed by doing some simple tests. To which I replied, using a fixed size array and the chaining could cause the hash to decay into a linked list which has a O(n). And to which Tim has noted using a large fixed size array may just take up all your architectural limits of the program. I imagine that having such a large array would push the lower bound of the stack up in memory, or if allocated at runtime, would push the heap way down from the top. In my Genetic Algorithm, I used Gnome's glib to do the hashing for me. I figured that someone else already looked at this problem. In my Genetic Algorithm, I am only storing a maximum of 32 bits in the hash key. In the nProbe code, Luca Deri wrote the hash, using the key values as Tim described. I have not figured out exactly how Luca does it, but I will assume that he used a suitable method that works well enough for me. ;-) If I don't contain the scope on my Master's Project, I won't finish! Thus, this was the impetus for our discussion. ;-) brian -- Brian Lavender http://www.brie.com/brian/ "There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies." Professor C. A. R. Hoare The 1980 Turing award lecture ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On Wed, 2010-06-23 at 10:42 -0700, timri...@appahost.com wrote: > > Original Message > > Subject: Re: [vox-tech] Memory addressing? > > From: "Chanoch (Ken) Bloom" > > Date: Tue, June 22, 2010 9:46 am > > To: lugod's technical discussion forum > > > > > > On Tue, Jun 22, 2010 at 09:11:44AM -0700, Brian Lavender wrote: > > > Can someone confirm what is correct? > > > > > > Tim and I were discussing memory addressing at Crepeville last night > > > and we had a disagreement about how memory is addressable. I say that > > > on today's common intel i386 32 bit architecture (in case you are one of > > > those souls who builds your hardware from scratch), that memory is byte > > > (octet) addressable. You can load a byte from memory into the lower 8 > > > bits of a register. Tim says that memory is only addressable on 32 bit > > > word boundaries. > > > > > I stand corrected. Now that I have my hardware textbook open, > Tanenbaum 1990, I see that the smallest addressable unit on most > computers > is the byte -- 8 bits. Bytes are grouped into words. The word-length is > the size of the registers, not registers and memory. > > > > > Consider the following program. The fact that you can get pointers to > > arbitrary characers should be enough proof that the architecture is > > byte addressable. > > Tanenbaum calls the smallest addressable unit a cell. He then lists > examples > of cell lengths for 11 computers. Before cells were standardized > to 8 bits, they ranged from 1 bit to 60 bits per cell. Therefore, a 60 > bits per cell computer would store a single ASCII character in the > lowest > 7 bits and set the upper 53 bits to off (probably). You could then > allocate > a pointer to address that 60 bit cell. > > The reason for the discussion was Brian's intrusion detection > implementation stored the incoming packets in a hash > table. The key to the hash table was quite > large -- inbound IP address, outbound IP address, inbound port, and > outbound port. 12 bytes in IPv4, 36 bytes in IPv6. > I thought to my self, on a very large implementation (say > Google) the table could grow to a billion entries. Could a hash table > store this amount in memory? Could you allocate an array of half the > total memory? Could you allocate an array of a billion integers? Brian, > on his laptop, couldn't allocate a billion integers. But he could > allocate a billion characters (bytes). Since I thought both bytes and > integers were words, and since I thought memory stored words > like registers stored words, we had our discussion. > > The following failed to compile with an array-to-large error: > int main( void ) > { > int table[ 10 ]; > return 0; > } On a 32-bit machine, this will eat up most of the computer's address space, including *all* application space, and some kernel space (so you can expect things to segfault). On a 64-bit machine, it should work though. It in fact compiles perfectly on a 64-bit machine, so what your compiler's doing is detecting the OS and architecture's limits and warning you about them. Malloc should make this compile on a 32-bit machine, but the code will fail to run -- malloc should return an error. > The following succeeded to compile: > int main( void ) > { > char table[ 10 ]; > return 0; > } > This compiles but core dumps: > #include > int main( void ) > { > char table[ 10 ]; > printf( "Hello world!\n" ); > } Stuff gets pushed on the stack after the end of the array, approximately 1GB into memory. This means you have a stack that's 1GB long when you get inside printf, which is seriously much more than the OS is prepared to let you have for your stack. (This will overflow the stack whether your machine is 32-bits or 64-bits.) Try mallocing your array instead. (Oh and change that printf to be printf("%px Hello world\n",table); so you can see whether the allocation is working.) If you're looking at an IDS that big, you're going to need to find an appropriate caching data structure that can write the infrequently used parts out to disk. Or you're going to have to find some other way of minimizing the number of packets you're keeping track of at a time. ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On Tue, 2010-06-22 at 18:05 -0700, Bill Broadley wrote: > On 06/22/2010 09:46 AM, Chanoch (Ken) Bloom wrote: > > On Tue, Jun 22, 2010 at 09:11:44AM -0700, Brian Lavender wrote: > >> Can someone confirm what is correct? > >> > >> Tim and I were discussing memory addressing at Crepeville last night > >> and we had a disagreement about how memory is addressable. I say that > >> on today's common intel i386 32 bit architecture (in case you are one of > >> those souls who builds your hardware from scratch), that memory is byte > >> (octet) addressable. You can load a byte from memory into the lower 8 > >> bits of a register. Tim says that memory is only addressable on 32 bit > >> word boundaries. > > Heh, there are many tiny details that depending on your perspective could > justify either answer. > > Ken is of course correct that a pointer can address a byte. There are > performance penalties for unaligned byte accesses, but they work. > > On the other hand from the offchip view you can't load less than a cache line > from main memory and that's usually 64-128 bytes. Additionally those requests > have to be aligned on cache line boundaries. > > Kinda related to the question "is an i386 a 32 bit architecture?" Well it can > address 40 bits (with PAE), read/write 64 bit values from main memory > (doubles), and allow for machines with over 4GB ram. Granted PAE is gross, > the segment register bites again. > > > (I don't know of any modern architectures that aren't > > byte addressable, though MIPS takes some shortcuts in its various jump > > instructions because the instructions have to be word-aligned.) > > Well, depending on your definition of modern ;-). The alpha architecture is > IMO rather modern... even if dead. > > [SNIP the history of the Alpha platform] > > To allow for increased CPU performance they simplified the CPU so they could > not restrict the clock rate and spend the transistors where they would help > performance the most. The original didn't have byte addressing. To write a > byte on the original alpha you had to read 64 bits, modify the byte you wanted > and write 64 bits. > > BTW, I believe this wasn't that a pointer addresses 2^64 64 bit quantities, > but that the bottom 6 bits of a pointer were mandated by the architecture to > be zero. Alpha has the BWX instructions designed to make it easy to load 8-bit and 16-bit values into memory, but those instructions came to the architecture 4 years into the product line. MIPS (the first assembly language a lot of CS students learn) likewise has special instructions for reading characters, halfwords, and unaligned parts of words into memory as well. It also requires the lowest bits of a pointer to be 0 when you use the lw instruction, but can uses those for the (slower) instructions involved in smaller reads. All of these fall into the category of what are called alignment requirements -- you need to use special, slower instructions to read unaligned data. The Alpha sounds unique in omitting those features for the first 4 years, but even they had to bow to C's char in the end. --Ken ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On 06/22/2010 09:46 AM, Chanoch (Ken) Bloom wrote: > On Tue, Jun 22, 2010 at 09:11:44AM -0700, Brian Lavender wrote: >> Can someone confirm what is correct? >> >> Tim and I were discussing memory addressing at Crepeville last night >> and we had a disagreement about how memory is addressable. I say that >> on today's common intel i386 32 bit architecture (in case you are one of >> those souls who builds your hardware from scratch), that memory is byte >> (octet) addressable. You can load a byte from memory into the lower 8 >> bits of a register. Tim says that memory is only addressable on 32 bit >> word boundaries. Heh, there are many tiny details that depending on your perspective could justify either answer. Ken is of course correct that a pointer can address a byte. There are performance penalties for unaligned byte accesses, but they work. On the other hand from the offchip view you can't load less than a cache line from main memory and that's usually 64-128 bytes. Additionally those requests have to be aligned on cache line boundaries. Kinda related to the question "is an i386 a 32 bit architecture?" Well it can address 40 bits (with PAE), read/write 64 bit values from main memory (doubles), and allow for machines with over 4GB ram. Granted PAE is gross, the segment register bites again. > (I don't know of any modern architectures that aren't > byte addressable, though MIPS takes some shortcuts in its various jump > instructions because the instructions have to be word-aligned.) Well, depending on your definition of modern ;-). The alpha architecture is IMO rather modern... even if dead. It had 64 bits from the start, vector/SIMD operations, symmetric multithreading (before intel), out of order operation, multiple functional units, on chip memory controllers, multiple memory channels, etc. It didn't however have x86 compatibility, that and marketing killed it. Amusingly after many $B and many years of process improvements the Itanium never really bested the alpha even with a much smaller process, much better memory bandwidth, and embarrassing more transistors. Sad, seemed like the linux boom was a bit late for the alpha. So Dec died, sold alpha to compaq which died and sold alpha to HP which killed of pa-risc, alpha, and for the most part itanium. Amusingly arm is now looking at the same space (servers and hpc). Amusingly intel is trying to get some folks to migrate their very expensive mainframe based services to itanium, but it seems like the market is mostly interested in either staying where they are, of if they do bother to make the huge outlay to port their software once they want to use a standard commodity platform instead of a specialty CPU solution looking for a problem. The alpha originally designed to deliver a factor of 1000 in performance over it's lifetime (10 in clock, 10 in ipc, and 10 in parallel cores) with all of the latest greatest architectural features. They basically completely designed a CPU from scratch (something that hasn't been done in awhile) designed for general purpose use. To allow for increased CPU performance they simplified the CPU so they could not restrict the clock rate and spend the transistors where they would help performance the most. The original didn't have byte addressing. To write a byte on the original alpha you had to read 64 bits, modify the byte you wanted and write 64 bits. BTW, I believe this wasn't that a pointer addresses 2^64 64 bit quantities, but that the bottom 6 bits of a pointer were mandated by the architecture to be zero. [ Snipped Ken's proof of i386 byte addressability ] ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Memory addressing?
On Tue, Jun 22, 2010 at 09:11:44AM -0700, Brian Lavender wrote: > Can someone confirm what is correct? > > Tim and I were discussing memory addressing at Crepeville last night > and we had a disagreement about how memory is addressable. I say that > on today's common intel i386 32 bit architecture (in case you are one of > those souls who builds your hardware from scratch), that memory is byte > (octet) addressable. You can load a byte from memory into the lower 8 > bits of a register. Tim says that memory is only addressable on 32 bit > word boundaries. > > Say you look at memory in bits and then on the left is the memory > address. > > I say that memory is normally byte addressable and the addressing > corresponds to byte (octet) boundaries. > > Address bits > 00 7 15 23 31 > 30 7 15 23 31 > 70 7 15 23 31 > 11 0 7 15 23 31 > 15 0 7 15 23 31 > > Tim says that memory is only 32 bit word addressable > > Address bits > 00 7 15 23 31 > 10 7 15 23 31 > 20 7 15 23 31 > 30 7 15 23 31 > 40 7 15 23 31 Consider the following program. The fact that you can get pointers to arbitrary characers should be enough proof that the architecture is byte addressable. (I don't know of any modern architectures that aren't byte addressable, though MIPS takes some shortcuts in its various jump instructions because the instructions have to be word-aligned.) Now what can happen is that the computer crashes when dereferencing the integer pointer, complaining that the access is unaligned -- the addresses would still refer to individual bytes, but the computer would crash if the two least significant bits were nonzero. MIPS behaves this way, but since the following program works, you can see that Intel doesn't even require word-aligned integer accesses. (word-aligned integer accesses may be faster, but they don't require special instructions to perform them). And just to prove that the compiler isn't performing any funny business, I included an assembly dump from this program. #include int main(int argc, char** argv){ char* mystring="ABCDEFGHIJKLMNOPQRSTUVWXYZ"; int theint= *((int*)(mystring+1)); printf("%x\n",theint); } $ ./test | xxd -r -p EDCB AMD64 assembly language: .file "test.c" .section.rodata .LC0: .string "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .LC1: .string "%x\n" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq%rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 subq$32, %rsp movl%edi, -20(%rbp) movq%rsi, -32(%rbp) movq$.LC0, -16(%rbp) movq-16(%rbp), %rax addq$1, %rax ; we're adding 1 to an address ; for one character movl(%rax), %eax ; and here we're dereferencing it ; successfully movl%eax, -4(%rbp) movl$.LC1, %eax movl-4(%rbp), %edx movl%edx, %esi movq%rax, %rdi movl$0, %eax callprintf leave ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Debian 4.4.4-5) 4.4.4" .section.note.GNU-stack,"",@progbits Intel 32-bit x86 assembly language: .file "test.c" .section.rodata .LC0: .string "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .LC1: .string "%x\n" .text .globl main .type main, @function main: pushl %ebp movl%esp, %ebp andl$-16, %esp subl$32, %esp movl$.LC0, 24(%esp) movl24(%esp), %eax addl$1, %eax ; we're adding 1 to an address ; for one character movl(%eax), %eax ; and we're loading from that ; address without a special instruction movl%eax, 28(%esp) movl$.LC1, %eax movl28(%esp), %edx movl%edx, 4(%esp) movl%eax, (%esp) callprintf leave ret .size main, .-main .ident "GCC: (Debian 4.4.4-5) 4.4.4" .section.note.GNU-stack,"",@progbits -- Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/ ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech