jail network problems
Hi I am running 4.1-stable updated ca 22.10.00. I set up a jail, started it but I have no network at all. I made an alias for the used IP address, I ran /etc/rc with the following output: Skipping disk checks ... dmesg: short read dmesg: kvm_read: Doing initial network setup:. Additional routing options: tcp extensions=NOsysctl: net.inet.tcp.rfc1323: Operation not permitted TCP keepalive=YESsysctl: net.inet.tcp.always_keepalive: Operation not permitted . routing daemons:. additional daemons: syslogdsyslogd: child pid 25355 exited with return code 1 . Doing additional network setup:. Starting final network daemons:. setting ELF ldconfig path: /usr/lib /usr/lib/compat setting a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout starting standard daemons: inetd cron sendmail. Initial rc.i386 initialization:. rc.i386 configuring syscons: blank_time/etc/rc.i386: cannot open /dev/ttyv0: no such file . additional ABI support:. Local package initialization:. Additional TCP options:. Sun Oct 29 14:05:55 GMT 2000 resulting in the following ps aux: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 24510 0.0 1.5 1332 908 #C1 SJ1:54PM 0:00.28 /bin/tcsh root 25226 0.0 1.0 912 608 ?? IsJ 2:02PM 0:00.02 syslogd -s root 25245 0.0 1.2 1032 760 ?? IsJ 2:02PM 0:00.04 inetd -wW root 25247 0.0 1.1 952 696 ?? IsJ 2:02PM 0:00.02 cron root 25250 0.0 2.2 1536 1356 ?? IsJ 2:02PM 0:00.02 sendmail: accepting root 25280 0.0 0.4 392 240 #C1 R+J 2:02PM 0:00.00 ps aux ping doesnt work from within the jail (I assume this is normal) jail# telnet localhost - doesnt work jail# telnet some address - doesnt work some host# telnet jail - doesnt work some host# ping jail - doesnt work (should it?) any hint? Stefan Aeschbacher To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
Luigi Rizzo wrote: Hi, how about using an indirect table of 64M 32-bit pointers into the actual blocks being used ? For insertions you just allocate a new fixed size block from the file. Isn't this roughly what dbm does with the hash key method? -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC [EMAIL PROTECTED] http://softweyr.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
re: Re: Need dotfiles for various L10N groups
** Original Subject: Re: Need dotfiles for various L10N groups ** Original Sender: Olexander Kunytsa [EMAIL PROTECTED] ** Original Date: Sat, 28 Oct 2000 19:29:01 -0400 ** Original Message follows... On Sun, 29 Oct 2000, Nik Clayton wrote: I am trying to collect various dotfiles (.cshrc, .profile, .Xresources, .Xdefaults, ~/.*) for various language localization groups. As I discussed with Nik Clayton, I hope to create /usr/share/skel/{chinese, japanese, french, russian, korean, vietnamese *} Shouldn't these be /usr/share/skel/{ja_JP.eucJP, zh_TW.Big5, ...} to cater for the same language/multiple encodings problem? It would be better to use in such way, I think. I myself can make /usr/share/skel/uk_UA.KOI8-U/* for Ukrainian lang To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message ** - End Original Message --- ** My suspicion is that all those configuration files, Use dot as the leading character and that POSIXdidn't change this? Thisis from old memory so someone doublechecking would help. As far as your proposal goes. it doesn.sound that bad as long as it is consistant and someone supports it (well lots of people). I automatically knowthat when I see .xxx in *nix it is a config file. The problem never seems to be with standards but getting people to agree with them, and they provide a wide consistency. Please correct me if I am wrong. Thanks and Have fun, Seteve Have Fun, Sends Steve Download the Lycos Browser at http://lycos.neoplanet.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Signal 11 when compiling app with LinuxThreads
I think I've configured and compiled the application (latest ntop) correct but it crashes hard on the following way. (see bottom) There's no problem when compiled without thread support. Another issue is if I should use linuxthreads or not.. What's the difference in lthread and pthread in short ? Anyone in for suggestions ?? What lib have I missed ?? Cheers, Nicolai Petri -- This GDB was configured as "i386-unknown-freebsd"... Core was generated by `intop'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/local/lib/libntop-1.3.so.2...done. Reading symbols from /usr/lib/libpcap.so.2...done. Reading symbols from /usr/local/lib/libgdbm.so.2...done. Reading symbols from /usr/local/lib/liblthread.so.2...done. Reading symbols from /usr/lib/libcrypt.so.2...done. Reading symbols from /usr/lib/libm.so.2...done. Reading symbols from /usr/lib/libssl.so.1...done. Reading symbols from /usr/lib/libcrypto.so.1...done. Reading symbols from /usr/lib/libreadline.so.4...done. Reading symbols from /usr/lib/libncurses.so.5...done. Reading symbols from /usr/lib/libc.so.4...done. Reading symbols from /usr/libexec/ld-elf.so.1...done. #0 0x28639668 in fstat () from /usr/lib/libc.so.4 (gdb) bt #0 0x28639668 in fstat () from /usr/lib/libc.so.4 #1 0x28673d43 in __swhatbuf () from /usr/lib/libc.so.4 #2 0x28673c9b in __smakebuf () from /usr/lib/libc.so.4 #3 0x286643d7 in __srefill () from /usr/lib/libc.so.4 #4 0x2865cd88 in fgets () from /usr/lib/libc.so.4 #5 0x286549e6 in gethostent () from /usr/lib/libc.so.4 #6 0x28654c3d in _gethostbyhtaddr () from /usr/lib/libc.so.4 #7 0x286545d8 in gethostbyaddr () from /usr/lib/libc.so.4 #8 0x2807fa9e in resolveAddress (symAddr=0x8204044 "*192.168.50.10*", hostAddr=0x81fc100, keepAddressNumeric=0) at address.c:125 #9 0x2807ffae in dequeueAddress () at address.c:263 #10 0x284769a1 in __pthread_manager_event () from /usr/local/lib/liblthread.so.2 #11 0x28477621 in _clone () from /usr/local/lib/liblthread.so.2 #12 0x10692c8 in ?? () #13 0x0 in ?? () (gdb) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
: Hi all... : : One the tasks that I have undertaken lately is to improve the efficiency : of a couple of storage facilities we use internally, here. Basically, : they are moderate-size tables currently implemented in SQL, which is OK in : terms of performance, but the hash function is breaking down and the : storage is rather inefficient for our table of about 2,850,000 members : (~2.1 GB total storage). There are 64M possible hash values in our : current implementation, and our record size is variable, but could be : safely fixed at about 1.5KB... So, total storage if all values were used : would be about 96GB. (See where I'm going with this?) : : One of the options I am considering is actually using address calculation, : and taking advantage of filesystem holes, to keep storage down to what is : actually being used, while providing instant lookups. : : The single file would be about 96G addressable bytes... But the actual : block count would be much lower. I suppose I will have to create a series : of these files and divide the problem into 4GB chunks, but one : lookup/store will still be done with one calculation and one disk seek : (just more filehandles open). : : Deletes seem problematic. My question is, does the operating system : provide any way to free blocks used in the middle of a file? : : Must I periodically re-create these files (long and slow process, but not : so bad if done infrequently) to reclaim space, or is there a way to free : arbitrary blocks in a file in userland (WITHOUT trashing the filesystem? : :-) : : - Ryan : : -- : Ryan Thompson [EMAIL PROTECTED] : Network Administrator, Accounts : Phone: +1 (306) 664-1161 : : SaskNow Technologies http://www.sasknow.com : #106-380 3120 8th St E Saskatoon, SK S7H 0W2 I would strongly recommend using a B+Tree instead of a hash table. With a hash table slightly different lookups will seek all over the place. With a B+Tree lookups will stay more localized. For example, if you insert a (nearly) sorted dictionary of words into an SQL table with a B+Tree, the memory working set required to insert efficiently stays constant whether you are inserting a thousand, a million, or a billion records. That is, the memory requirement is effecitvely O(LogN) for a disk requirement of O(N). With a hash table, the memory working set required to insert efficiently is approximately O(N) for a disk requirement of O(N)... much much worse. A B+Tree will also scale with the size of the dataset being managed, so you do not have to preallocate or prereserve file space. We are using an in-house SQL database for our product (which I can't really talk about right now) and, using B+Tree's, I can consistently insert 300 records/sec on a cheap desktop PC (which I use for testing) with 64MB of ram (remember, insert requires an internal select to check for key conflicts), even when the test database grows to several gigabytes. In anycase, a B+Tree is the way to go. Hash tables are useful only in very, very, very specialized circumstances. In all other circumstances they are no better then a pain in the rear. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
ulpt is broken.
I've tried to use uplt instead of lpt and got a kernel panic. From kernel stack trace I found that it happens due to a wrong pointer to dev structure being passed to usbd_do_request_flags. I've made a PR for the problem (despite its number is'n known to me yet) but would like to duplicate it here. In fact, the stack trace must be containing all the needed information: IdlePTD 4214784 initial pcb at 3681c0 panicstr: from debugger panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x20726568 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01811a7 stack pointer = 0x10:0xc842ccb4 frame pointer = 0x10:0xc842 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 2858 (lpd) panic: from debugger panic: from debugger Uptime: 36m38s dumping to dev #da/0x20001, offset 262144 dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:475 475 if (dumping++) { #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:475 #1 0xc019e6ac in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:318 #2 0xc019eaad in panic (fmt=0xc02c5894 "from debugger") at /usr/src/sys/kern/kern_shutdown.c:566 #3 0xc012f945 in db_panic (addr=-1072164441, have_addr=0, count=-1, modif=0xc842cb2c "") at /usr/src/sys/ddb/db_command.c:433 #4 0xc012f8e3 in db_command (last_cmdp=0xc031cbdc, cmd_table=0xc031ca3c, aux_cmd_tablep=0xc03616ac) at /usr/src/sys/ddb/db_command.c:333 #5 0xc012f9aa in db_command_loop () at /usr/src/sys/ddb/db_command.c:455 #6 0xc0131bf7 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_trap.c:71 #7 0xc029c776 in kdb_trap (type=12, code=0, regs=0xc842cc74) at /usr/src/sys/i386/i386/db_interface.c:163 #8 0xc02a in trap_fatal (frame=0xc842cc74, eva=544367976) at /usr/src/sys/i386/i386/trap.c:934 #9 0xc02a8601 in trap_pfault (frame=0xc842cc74, usermode=0, eva=544367976) at /usr/src/sys/i386/i386/trap.c:853 #10 0xc02a814b in trap (frame={tf_fs = -1071972336, tf_es = 16, tf_ds = 16, tf_edi = 0, tf_esi = 375, tf_ebp = -935146292, tf_isp = -935146336, tf_ebx = -935146236, tf_edx = 544367976, tf_ecx = 544367976, tf_eax = -935146237, tf_trapno = 12, tf_err = 0, tf_eip = -1072164441, tf_cs = 8, tf_eflags = 66118, tf_esp = -1055754752, tf_ss = 375}) at /usr/src/sys/i386/i386/trap.c:436 #11 0xc01811a7 in usbd_do_request_flags (dev=0x20726568, req=0xc842cd04, data=0xc842cd03, flags=0, actlen=0x0) at /usr/src/sys/dev/usb/usbdi.c:938 #12 0xc018117c in usbd_do_request (dev=0x20726568, req=0xc842cd04, data=0xc842cd03) at /usr/src/sys/dev/usb/usbdi.c:919 #13 0xc017d540 in ulpt_status (sc=0xc1127600) at /usr/src/sys/dev/usb/ulpt.c:357 #14 0xc017d6e1 in ulptopen (dev=0xc103fc00, flag=2, mode=8192, p=0xc7fff100) at /usr/src/sys/dev/usb/ulpt.c:418 #15 0xc01e358a in spec_open (ap=0xc842cda0) at /usr/src/sys/miscfs/specfs/spec_vnops.c:200 #16 0xc01e3439 in spec_vnoperate (ap=0xc842cda0) at /usr/src/sys/miscfs/specfs/spec_vnops.c:117 #17 0xc0279071 in ufs_vnoperatespec (ap=0xc842cda0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2312 #18 0xc01dd467 in vn_open (ndp=0xc842ce74, flagp=0xc842ce40, cmode=48) at vnode_if.h:189 #19 0xc01d8ead in open (p=0xc7fff100, uap=0xc842cf80) at /usr/src/sys/kern/vfs_syscalls.c:999 #20 0xc02a8cf8 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134578534, tf_esi = -1077937784, tf_ebp = -1077938048, tf_isp = -935145516, tf_ebx = 2, tf_edx = -1077938104, tf_ecx = 2, tf_eax = 5, tf_trapno = 7, tf_err = 2, tf_eip = 269347636, tf_cs = 31, tf_eflags = 582, tf_esp = -1077938092, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1139 #21 0xc029d0df in Xint0x80_syscall () #22 0x804dbe9 in ?? () #23 0x804b5cd in ?? () #24 0x804db8e in ?? () #25 0x804acd5 in ?? () #26 0x804ab37 in ?? () #27 0x804a1e1 in ?? () -- /Voland Vadim Belman E-mail: [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
PPP patch (CHAP81 + MPPE)
Hi! Here is a patch attached (made from current from 2813). Patch makes ppp able to respond and initiate MS-CHAP-v2 authentication and supports MPPE encryption protocol. With all these ppp can participate in MS VPN. Please look at this, and tell me what you think? Bye! ppp-mppe-patch.gz
disklabel madness, make it boot...
It should be easy. I'm trying to get a 24Mb CF card to be a bootable drive from 4.0. I have it wired on the primary IDE as a slave drive, it is therefore ad1 and the following almost works: fdisk -I ad1 #take the whole MSDOS partition table dd if=/dev/zero of=/dev/rad1 bs=1k count=1 #Wipe the existing disk label disklabel -rwB ad1 auto #write a disk label newfs /dev/rad1c #create new file system ...when accompanied by copying over the kernel and other such (hints taken from handbook on making a rescue floppy for backups). The boot blocks run and complain because there's no kernel on ad(0,a), which is fair enough since it's on ad(0,c). So giving it the hint it boots the kernel then complains about failing to mount /sbin/init (which is also on 0,c). Anyway, it became clear to me that 'c' is the 'complete' unix partition and what I needed was an 'a' (or root) partition and that such a thing was set using disklabel (correct?). Anyway, to get a first shot I used /stand/sysinstall and wrote the disklabel, which appeared to then bin out my real drive (which fixed itself on reboot), so this is a significant bug in itself. Anyway, I then got the disklabel, which is currently this: # /dev/rad1c: type: ESDI disk: ad1s1 label: flags: bytes/sector: 512 sectors/track: 32 tracks/cylinder: 4 sectors/cylinder: 128 cylinders: 368 sectors/unit: 47200 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: #size offsetfstype [fsize bsize bps/cpg] a:4720004.2BSD 1024 819216 # (Cyl.0 - 368*) c:472000unused0 0 # (Cyl.0 - 368*) (sorry for the bloat), so we can see the whole disk is turned over to an 'a' partition. I piped this to disk (file, disklabel.24meg) and modified my script to: fdisk -I ad1 dd if=/dev/zero of=/dev/ad1 bs=1k count=1 disklabel -R -B ad1 disklabel.24meg newfs /dev/rad1a #create new file system And while I can mount the filesystem with mount /dev/ad1a if I now set this to be the master drive (and disconnect the other one), I get a 'Missing Operating System' from the BIOS - i.e. where are the boot blocks then?. It appears the -B in disklabel isn't working. Stunts I've tried include setting the disklabel to say 'disk: ad0s1' (for when it restarts being the master on the channel), doing a separate 'disklabel -B ad1' to force the boot blocks on and doing the -B within fdisk (i.e. 'fdisk -IB ad1'). The usual combinations of sleep, coffee and sacrifices to the gods also seem to do nothing. Help help help. I have no idea what I'm doing wrong here and any pointers would be much appreciated. As an aside, and a point of view that will no doubt not prove very popular, is this: I'm not a great hacker, but I'm not crap either and this is driving me up the wall. We are talking DCOM levels of frustration here and I can't help the feeling that the whole thing is just too damn complicated... Anyway, advice _please_ and flames if you really must. Dave :( BTW, -B option in fdisk is ignored if -f is specified, yet there is no command in the configuration file to set boot blocks, what gives?. See fdisk(8). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
: : storage is rather inefficient for our table of about 2,850,000 members : : (~2.1 GB total storage). There are 64M possible hash values in our : : current implementation, and our record size is variable, but could be : : safely fixed at about 1.5KB... So, total storage if all values were used : : would be about 96GB. (See where I'm going with this?) :... : :Remember, though, I'm not proposing a hash table. I have been lucky :enough to design a "hash" function that will crunch down all possible :values into about 64M unique keys--so, no collisions. I propose to use :this function with address calculation. So, O(1) everything, for single :random lookups, which is (virtually) all this data structure must deal :with. Sorting isn't necessary, and that could be accomplished in O(n) :anyway. Are you still talking about creating a 96GB file to manage 2.1GB worth of data? I gotta say, that sounds kinda nuts to me! Cpu cycles are cheap, I/O and memory is not. Take the BTree example again -- if you think about it, all internal nodes in the tree will fit into memory trivially. It is only the leaf nodes that do not. So in regards to actual disk accesses you still wind up only doing *ONE* for the btree, even if it winds up being four or five levels deep. The result is that you will be able to create an index to your data, which is only 2.8 million records, in around 32 bytes per record or 89 Megabytes. Not 96GB. Since you can cache all the internal nodes of the btree you are done. The machine will do a much better job caching a 89MB index then a 96GB index. Do not make the mistake of equating cpu to I/O. The cpu required to iterate through 4 levels in the btree (assuming 64 entries per node, it only takes 4 to cover 16 million records) is *ZERO* ... as in, probably less then a microsecond, whereas accessing the disk is going to be milliseconds. : A B+Tree will also scale with the size of the dataset being managed, : so you do not have to preallocate or prereserve file space. : :So will address calculation + filesystem holes, and sufficiently large :filesizes :-) Uh, not with a 50:1 file size factor it won't. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: PPP patch (CHAP81 + MPPE)
Hi, Thanks for the patches. I've committed the changes although I'm having problems with MPPE. I suspect the problems are actually in the CCP stuff though - and I've suspected this for some time, something to do with running ppp back-to-back (and not over a tty). I'll look into this soon. Anyway, thanks again for your work. It's really appreciated. Hi! Here is a patch attached (made from current from 2813). Patch makes ppp able to respond and initiate MS-CHAP-v2 authentication and supports MPPE encryption protocol. With all these ppp can participate in MS VPN. Please look at this, and tell me what you think? Bye! [.] -- Brian [EMAIL PROTECTED]brian@[uk.]FreeBSD.org http://www.Awfulhak.org brian@[uk.]OpenBSD.org Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
Matt Dillon wrote to Ryan Thompson: : : storage is rather inefficient for our table of about 2,850,000 members : : (~2.1 GB total storage). There are 64M possible hash values in our : : current implementation, and our record size is variable, but could be : : safely fixed at about 1.5KB... So, total storage if all values were used : : would be about 96GB. (See where I'm going with this?) :... : :Remember, though, I'm not proposing a hash table. I have been lucky :enough to design a "hash" function that will crunch down all possible :values into about 64M unique keys--so, no collisions. I propose to use :this function with address calculation. So, O(1) everything, for single :random lookups, which is (virtually) all this data structure must deal :with. Sorting isn't necessary, and that could be accomplished in O(n) :anyway. Are you still talking about creating a 96GB file to manage 2.1GB worth of data? I gotta say, that sounds kinda nuts to me! Cpu cycles are cheap, I/O and memory is not. Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-) Hmm... Perhaps you're still missing my original point? I'm talking about a file with 96GB in addressable bytes (well, probably a bunch of files, given logical filesize limitations, but let's say for simplicity's sake that we have a single file). It's actual "size" (in terms of allocated blocks) will be only a bit larger than 2.1GB. (Directly proportional to the used size of the dataset. Discrepancies only come into play when record size != block size, but that can be worked around somewhat) In other words, ls -ls will report the "size" as some ridiculously large number, will show a much smaller block count. So, assuming four records are added to the file on block boundaries, the file will actually only use four blocks... nowhere near 96GB! In the UNIX filesystem (ya, I know.. just pick one :-), size of file != space allocated for file. Thus, my original questions were centered around filesystem holes. I.e., non-allocated chunks in the middle of a file. When trying to READ from within a hole, the kernel just sends back a buffer of zeros... which is enough to show that the record is not initialized. Actually, something like an "exists" function for a record wouldn't touch the disk at all! When writing to a hole, the kernel simply allocates the necessary block(s). This is really fast, too, for creation, as the empty set can be written to disk with touch(1), and uses far less memory than virtual initialization or memory structures ;-) As an example, try fseek(f, 0-1, SEEK_SET); fputc('\n', f); And analyze the reported filesize, as well as the reported block count of the file. You should see a 2GB "size", and maybe 8K in allocated blocks (depends how you ran newfs ;-). This is behaviour that has been around since the original UFS. Take the BTree example again -- if you think about it, all internal nodes in the tree will fit into memory trivially. It is only the leaf nodes that do not. So in regards to actual disk accesses you still wind up only doing *ONE* for the btree, even if it winds up being four or five levels deep. Right. And, actually, (without looking a bit more closely), I wouldn't be suprised if you could replace the four-line address calculation I have with your B+Tree structure and come up with the same result. Only difference would be a few hundred lines of code, much more testing, and quite a few megs of RAM... ;-) What you referred to as "nuts", above, is just a logical way to provide a huge address space for a set of data, without actually allocating blocks in the filesystem for the entire address space until they are used. The result is that you will be able to create an index to your data, which is only 2.8 million records, in around 32 bytes per record or 89 Megabytes. Not 96GB. Since you can cache all the internal nodes of the btree you are done. The machine will do a much better job caching a 89MB index then a 96GB index. Do not make the mistake of equating cpu to I/O. The cpu required to iterate through 4 levels in the btree (assuming 64 entries per node, it only takes 4 to cover 16 million records) is *ZERO* ... as in, probably less then a microsecond, whereas accessing the disk is going to be milliseconds. CPU time for what I'm proposing is even closer to zero than for a tree... But, you're right, it doesn't make any real difference when compared to disk I/O... B-Trees are good for a lot of things. Address calculation can be really good, too, given a finite key set, and a way to represent that finite key set without wasting space. A B+Tree will also scale with the size of the dataset being managed, so you do not have to preallocate or prereserve file space. So will address calculation + filesystem holes, and sufficiently large filesizes :-) Uh, not with a
Re: Filesystem holes
Ryan Thompson wrote to Matt Dillon: Matt Dillon wrote to Ryan Thompson: : : storage is rather inefficient for our table of about 2,850,000 members : : (~2.1 GB total storage). There are 64M possible hash values in our : : current implementation, and our record size is variable, but could be : : safely fixed at about 1.5KB... So, total storage if all values were used : : would be about 96GB. (See where I'm going with this?) :... : :Remember, though, I'm not proposing a hash table. I have been lucky :enough to design a "hash" function that will crunch down all possible :values into about 64M unique keys--so, no collisions. I propose to use :this function with address calculation. So, O(1) everything, for single :random lookups, which is (virtually) all this data structure must deal :with. Sorting isn't necessary, and that could be accomplished in O(n) :anyway. Are you still talking about creating a 96GB file to manage 2.1GB worth of data? I gotta say, that sounds kinda nuts to me! Cpu cycles are cheap, I/O and memory is not. Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-) Hmm... Perhaps you're still missing my original point? I'm talking about a file with 96GB in addressable bytes (well, probably a bunch of files, given logical filesize limitations, but let's say for simplicity's sake that we have a single file). It's actual "size" (in terms of allocated blocks) will be only a bit larger than 2.1GB. (Directly proportional to the used size of the dataset. Discrepancies only come into play when record size != block size, but that can be worked around somewhat) In other words, ls -ls will report the "size" as some ridiculously large number, will show a much smaller block count. So, assuming four records are added to the file on block boundaries, the file will actually only use four blocks... nowhere near 96GB! In the UNIX filesystem (ya, I know.. just pick one :-), size of file != space allocated for file. Thus, my original questions were centered around filesystem holes. I.e., non-allocated chunks in the middle of a file. When trying to READ from within a hole, the kernel just sends back a buffer of zeros... which is enough to show that the record is not initialized. If you prefer to read system documentation instead of me, see lseek(2) :-) The lseek() function allows the file offset to be set beyond the end of the existing end-of-file of the file. If data is later written at this point, subsequent reads of the data in the gap return bytes of zeros (un- til data is actually written into the gap). I suppose gap == hole. Silly semantics. :-) Actually, something like an "exists" function for a record wouldn't touch the disk at all! When writing to a hole, the kernel simply allocates the necessary block(s). This is really fast, too, for creation, as the empty set can be written to disk with touch(1), and uses far less memory than virtual initialization or memory structures ;-) As an example, try fseek(f, 0-1, SEEK_SET); fputc('\n', f); And analyze the reported filesize, as well as the reported block count of the file. You should see a 2GB "size", and maybe 8K in allocated blocks (depends how you ran newfs ;-). This is behaviour that has been around since the original UFS. Take the BTree example again -- if you think about it, all internal nodes in the tree will fit into memory trivially. It is only the leaf nodes that do not. So in regards to actual disk accesses you still wind up only doing *ONE* for the btree, even if it winds up being four or five levels deep. Right. And, actually, (without looking a bit more closely), I wouldn't be suprised if you could replace the four-line address calculation I have with your B+Tree structure and come up with the same result. Only difference would be a few hundred lines of code, much more testing, and quite a few megs of RAM... ;-) What you referred to as "nuts", above, is just a logical way to provide a huge address space for a set of data, without actually allocating blocks in the filesystem for the entire address space until they are used. The result is that you will be able to create an index to your data, which is only 2.8 million records, in around 32 bytes per record or 89 Megabytes. Not 96GB. Since you can cache all the internal nodes of the btree you are done. The machine will do a much better job caching a 89MB index then a 96GB index. Do not make the mistake of equating cpu to I/O. The cpu required to iterate through 4 levels in the btree (assuming 64 entries per node, it only takes 4 to cover 16 million records) is *ZERO* ... as in, probably less then a microsecond, whereas accessing the disk is going to be milliseconds. CPU time for what I'm proposing is even closer to
Re: Filesystem holes
:Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-) : :Hmm... Perhaps you're still missing my original point? I'm talking about :a file with 96GB in addressable bytes (well, probably a bunch of files, :given logical filesize limitations, but let's say for simplicity's sake :that we have a single file). It's actual "size" (in terms of allocated :blocks) will be only a bit larger than 2.1GB. (Directly proportional to :the used size of the dataset. Discrepancies only come into play when :record size != block size, but that can be worked around somewhat) Ah, ok... that's better, though I think you will find yourself tuning it endlessly if the blocksize does not match-up. Remember, the filesystem allocates 8K per block, so if your record size is 1500 bytes and you have a random distribution, you are going to wind up eating 8-14GB. :In other words, ls -ls will report the "size" as some ridiculously large :number, will show a much smaller block count. So, assuming four records :are added to the file on block boundaries, the file will actually only use :four blocks... nowhere near 96GB! : :In the UNIX filesystem (ya, I know.. just pick one :-), size of file != :space allocated for file. Thus, my original questions were centered :around filesystem holes. I.e., non-allocated chunks in the middle of a :file. When trying to READ from within a hole, the kernel just sends back :a buffer of zeros... which is enough to show that the record is not :initialized. Actually, something like an "exists" function for a record :wouldn't touch the disk at all! When writing to a hole, the kernel simply :allocates the necessary block(s). This is really fast, too, for creation, :as the empty set can be written to disk with touch(1), and uses far less :memory than virtual initialization or memory structures ;-) : :As an example, try Ahh.. yes, I know. I'm a filesystem expert :-) However, that said, I will tell you quite frankly that virtually *nobody* depends on holes for efficient storage. There are only a few problems where it's practical some forms of executables, and sparse matrixes. That's pretty much it. :And analyze the reported filesize, as well as the reported block count of :the file. You should see a 2GB "size", and maybe 8K in allocated blocks :(depends how you ran newfs ;-). This is behaviour that has been around :since the original UFS. It's not a good idea to use a small block size in UFS. The minimum is pretty much 1K frag, 8K block. for a sparse file this means 8K is the minimum that will be allocated (frags are only allocated for the end of a file). If you think the btree idea is too complex to implement quickly, then try using a radix tree (essentially what you said in regards to breaking the hash you calculate into a node traversal). For your problem I would recommend 64 elements per node, which corresponds to 6 bits. 16 million records would fit in 6x4 = 4 levels. If you cache the first three levels in memory you eat 64^3 = 262144 x sizeof(record). Assuming a simple file offset for the record, you eat exactly 1MB of memory, 64MB for the file index, and your data can be placed sequentially in the file. :- Ryan -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
DMA/66 not available for secondary IDE bus?
This may be intentional, but I've noticed that if you have a non-UDMA66 device on the primary IDE bus, FreeBSD 4.x does not allow you to have UDMA66 on the secondary bus. To wit, if you had a configuration similar to: Primary bus Older (UDMA33 or earlier) IDE boot disk CD ROM Secondary bus 15GB UDMA 66 disk 15GB UDMA 66 disk then even your 16GB drives can not be UDMA 66, and are limited to the UDMA (or less!) speed provided on the primary bus. This is particularly troubling if one were to use vinum to try to do a mirror, as you will not get optimum performance from the drives. One could argue that shuffling devices around and upgrading the boot drive would solve the problem on one of the 15GB disks, but its still not really an optimal solution, as you'd have to move your CD rom on to the secondary bus, which would kill the other 15GB drive. Any possibility that we could 'unhook' the two PCI buses, and allow the secondary to go to UDMA66 if the first is down at UDMA 33 or slower? -Brian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Logging users out
Hi, What is the best way to go about logging a user out given their tty? I had a couple of ideas: (a) open their tty and set the baud rate to 0 (b) use tcgetpgrp to get the process group id of the foreground process group on that tty, then with that info use libkvm to find the session leader's pid and send it a SIGHUP (c) use tcgetpgrp to get the process group id of the foreground process group on that tty then using killpg to kill all the members of that process group. I would need to do this in a loop to catch background process groups that come to the foreground after a process group is killed. Whenever sending a signal I will have to verify the process exited, possibly sending TERM and KILL until it does. Problems: (a) a doesn't seem to work...I'm guessing it only works on serial lines. (b) b would be quite unportable I would guess (although thats not a tragedy I would like to avoid it if it isn't too hard). Also if the session leader dies is there any guarentee everything else in the session goes as well? Or would I have to go through and kill every process group in the session? (c) c just seemed to be a bit of a hack (assuming you haven't just read b). Out of all of them it seems the best so far however. Does anyone have any suggestions or comments? Is there a "proper" way to do this? Thanks, Andrew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
At 19:16 29/10/00 -0800, you wrote: Ahh.. yes, I know. I'm a filesystem expert :-) However, that said, I will tell you quite frankly that virtually *nobody* depends on holes for efficient storage. There are only a few problems where it's practical some forms of executables, and sparse matrixes. That's pretty much it. Presumably writing into these holes repeatedly is a none-too-efficient affair (requiring moving all the stuff on either side), or am I missing the point slightly? Dave :) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
Hmm... Perhaps you're still missing my original point? I'm talking about a file with 96GB in addressable bytes (well, probably a bunch of files, given logical filesize limitations, but let's say for simplicity's sake that we have a single file). It's actual "size" (in terms of allocated blocks) will be only a bit larger than 2.1GB. (Directly proportional to the used size of the dataset. Discrepancies only come into play when record size != block size, but that can be worked around somewhat) In other words, ls -ls will report the "size" as some ridiculously large number, will show a much smaller block count. So, assuming four records are added to the file on block boundaries, the file will actually only use four blocks... nowhere near 96GB! In the UNIX filesystem (ya, I know.. just pick one :-), size of file != space allocated for file. Thus, my original questions were centered around filesystem holes. I.e., non-allocated chunks in the middle of a file. When trying to READ from within a hole, the kernel just sends back a buffer of zeros... which is enough to show that the record is not initialized. Actually, something like an "exists" function for a record wouldn't touch the disk at all! When writing to a hole, the kernel simply allocates the necessary block(s). This is really fast, too, for creation, as the empty set can be written to disk with touch(1), and uses far less memory than virtual initialization or memory structures ;-) What will happen, if somebody (possibly you, as mahordomo says), tries to make a backup of that file. Will the copy also be with holes, or would that file suddenly use all 96GB? It will at least do so, if one does cat filefile.bak Probably tar will do the same. I'd be afraid to create something which could easily blow up by having normal operations applied to it. Leif To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
Matt Dillon wrote to Ryan Thompson: :Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-) : :Hmm... Perhaps you're still missing my original point? I'm talking about :a file with 96GB in addressable bytes (well, probably a bunch of files, :given logical filesize limitations, but let's say for simplicity's sake :that we have a single file). It's actual "size" (in terms of allocated :blocks) will be only a bit larger than 2.1GB. (Directly proportional to :the used size of the dataset. Discrepancies only come into play when :record size != block size, but that can be worked around somewhat) Ah, ok... that's better, though I think you will find yourself tuning it endlessly if the blocksize does not match-up. Remember, the filesystem allocates 8K per block, so if your record size is 1500 bytes and you have a random distribution, you are going to wind up eating 8-14GB. True.. As they say, "Disk is cheap", though... And "tuning" isn't so bad on a simple address calculation. I agree that if the record size doesn't closely match the blocksize, there will be a waste of space, but at least that waste is proportional to the dataset... Better than many data structures that I could call by name. It wouldn't be that difficult with many problem sets to optimize them to the point where their records align neatly with the blocksize. Granted, the waste space would hang you with some problem sets. I propose, though, that for some problems, it is easy enough to tune them to reduce (or, if you're really lucky, eliminate), waste space. Yes, this is a specialized solution. :In other words, ls -ls will report the "size" as some ridiculously large :number, will show a much smaller block count. So, assuming four records :are added to the file on block boundaries, the file will actually only use :four blocks... nowhere near 96GB! : :In the UNIX filesystem (ya, I know.. just pick one :-), size of file != :space allocated for file. Thus, my original questions were centered :around filesystem holes. I.e., non-allocated chunks in the middle of a :file. When trying to READ from within a hole, the kernel just sends back :a buffer of zeros... which is enough to show that the record is not :initialized. Actually, something like an "exists" function for a record :wouldn't touch the disk at all! When writing to a hole, the kernel simply :allocates the necessary block(s). This is really fast, too, for creation, :as the empty set can be written to disk with touch(1), and uses far less :memory than virtual initialization or memory structures ;-) : :As an example, try Ahh.. yes, I know. I'm a filesystem expert :-) I know, Matt, and that's why it's great to talk to you about this :-) I guess I needed an example to convey my point, though. I apologize if it came across as an insult to your intelligence; 'twas not intended that way in the least. However, that said, I will tell you quite frankly that virtually *nobody* depends on holes for efficient storage. Ahh. Ok. This is the kind of response I was looking for. Case studies :-) There are only a few problems where it's practical some forms of executables, and sparse matrixes. That's pretty much it. :And analyze the reported filesize, as well as the reported block count of :the file. You should see a 2GB "size", and maybe 8K in allocated blocks :(depends how you ran newfs ;-). This is behaviour that has been around :since the original UFS. It's not a good idea to use a small block size in UFS. The minimum is pretty much 1K frag, 8K block. for a sparse file this means 8K is the minimum that will be allocated (frags are only allocated for the end of a file). Right, which is why this strategy _does_ only work for a specific subset of problems, just as a directed graph works well for a path structure, but would really suck for (among other things) maintaining a sorted list of account numbers. If you think the btree idea is too complex to implement quickly, then try using a radix tree (essentially what you said in regards to breaking the hash you calculate into a node traversal). For your problem I would recommend 64 elements per node, which corresponds to 6 bits. 16 million records would fit in 6x4 = 4 levels. If you cache the first three levels in memory you eat 64^3 = 262144 x sizeof(record). Assuming a simple file offset for the record, you eat exactly 1MB of memory, 64MB for the file index, and your data can be placed sequentially in the file. Right. One more way to skin the cat, and a pretty good one at that, if you have main memory to burn (I don't :-) Assuming, though, (yes, I'm going to be a bit stubborn, here... ;-) that I WANT to use this address calculation method in conjunction with the filesystem's ability to allocate blocks as they are used... I can do
Kernel tutorials
Hi,everybody, I am a newbie to the FreeBSD, so I just wander where can I find some kernel tutorials on the internet. Thanks for your help! Qian Feng To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
: :What will happen, if somebody (possibly you, as mahordomo says), tries to :make a backup of that file. That will depend on the backup program. dump/restore can handle holes just fine. tar can handle them in a 'fake' way, and you have to tell it. Programs like 'cp' cannot handle holes they'll copy the zero's. :I'd be afraid to create something which could easily blow up by having :normal operations applied to it. : :Leif Yes, there's a high probability of that. It's one of the reasons why people typically use the feature, at least not for 'permanent' data sets. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
: :Presumably writing into these holes repeatedly is a none-too-efficient :affair (requiring moving all the stuff on either side), or am I missing the :point slightly? : :Dave :) It isn't quite that bad, but it isn't a walk in the park either. Since data blocks are referenced from a block table, inserting new blocks is cheap. However, this means that data may not be physically ordered in the file -- if your read crosses a block boundry it may require an extra seek. FreeBSD's FFS does have the reallocblks feature turned on and this will cause the filesystem to try to reorder nearby blocks, but it was never designed to handle sparse files so -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: DMA/66 not available for secondary IDE bus?
It seems Brian McGovern wrote: This may be intentional, but I've noticed that if you have a non-UDMA66 device on the primary IDE bus, FreeBSD 4.x does not allow you to have UDMA66 on the secondary bus. Say what ? there is NO such limitation in the ATA driver What chipset are we talking about here ? -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Kernel tutorials
* qianfeng [EMAIL PROTECTED] [001029 23:18] wrote: Hi,everybody, I am a newbie to the FreeBSD, so I just wander where can I find some kernel tutorials on the internet. Thanks for your help! Try browsing the mailing lists and searching the committer's websites (http://people.freebsd.org/~person) -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
Leif Neland wrote to Ryan Thompson and Matt Dillon: What will happen, if somebody (possibly you, as mahordomo says), tries to make a backup of that file. Make sure to use a program that can cope ;-) Will the copy also be with holes, or would that file suddenly use all 96GB? It will at least do so, if one does cat filefile.bak Probably tar will do the same. Actually, tar will handle holes elegantly (this I have tested), with the --sparse option. Older versions would not. cat and other similar "filters" are naive, as they simply block I/O. Backing up with tar and/or a filesystem dump would be just as effective as with any other storage strategy. cat file file.bak on even a 2GB file is probably not something that would be popular, anyway. I'd be afraid to create something which could easily blow up by having normal operations applied to it. That's a valid concern. That's the biggest drawback I see to the overall strategy... But, specialized problems sometimes encourage specialized solutions. Leif -- Ryan Thompson [EMAIL PROTECTED] Network Administrator, Accounts Phone: +1 (306) 664-1161 SaskNow Technologies http://www.sasknow.com #106-380 3120 8th St E Saskatoon, SK S7H 0W2 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
:Yes, there's a high probability of that. It's one of the reasons :why people typically use the feature, at least not for 'permanent' :data sets. Ahhh... of course I meant 'typically do not use the feature'. Heh. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Really odd BTX halted problem booting FreeBSD on VALinuxhardware
I know I'm getting into this late but I can reliably reproduce this problem. I ran into it about 3 months ago when using a custom PXE-based installer for our SCSI boxes. I even annoyed -hackes and got John Baldwin to help me decode the register dumps. The IP does end up in the SCSI BIOS extension somewhere, which is really scary. ... The Adaptec BIOS is doing something really fugly when it doesn't find proper partition tables on the disks. It does it if ANY of the disks are done 'dangerously dedicated.' This also happens on IBM Netfinity 5600 servers (Adaptec 7890/91 on motherboard). Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Really odd BTX halted problem booting FreeBSD on VALinuxhardware
On Fri, 27 Oct 2000, Paul Saab wrote: Mike Smith ([EMAIL PROTECTED]) wrote: :I'm just curious. How many disks are in this box? We saw something :similar here at work and it turned out that there were multiple disklabels :on the other disks and for somereason it was confusing the loader. :We dd'd the bad sections off and everything worked. Are you sure it's confusing the loader? Matt's fault address puts it in the BIOS at 0xc800, which is probably the SCSI adapter's BIOS... I wasn't 100% involved with the problem. Peter looked into and notice the disks had bogus labels (sometimes up to 3 labels on 1 disk) and when he removed them, the machines were happy again. We never looked into further because we just didn't have the time. I know I'm getting into this late but I can reliably reproduce this problem. I ran into it about 3 months ago when using a custom PXE-based installer for our SCSI boxes. I even annoyed -hackes and got John Baldwin to help me decode the register dumps. The IP does end up in the SCSI BIOS extension somewhere, which is really scary. To do it: 1. Pick a motherboard with built-in SCSI; the L440GX+ for instance. Put 2 or more disks on a controller. 2. Set the disks up dangerously dedicated, i.e. don't put proper MS partition tables on the disks. 3. Try to boot the box; watch BTX die in the same place every time. The Adaptec BIOS is doing something really fugly when it doesn't find proper partition tables on the disks. It does it if ANY of the disks are done 'dangerously dedicated.' The easy solution: always put proper partitions on your disks. The hard solution: figure out what nastiness Adaptec is doing and slap their hand. Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Really odd BTX halted problem booting FreeBSD on VALinux
On 27-Oct-00 Robert Nordier wrote: around the broken bioses I use. I just might start using program posted in this thread that lets you do labels right in lieu of anything else, or perhaps I'll fix disklabel to work right as was suggeseted elsewhere. Don't sysinstall work in a script mode? I've never used it, but I thought it did. Yes it does.. It's not totally automated (you have to press enter twice) but a) that is not very hard and b) even if it was you could fix it and roll your own sysinstall that just assumed 'yes'. I have sysinstall doing a custom installation from a CD in short order with the user only having to configure X (ship different video cards so I can't just whack in a fixed XF86Config) --- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Filesystem holes
Matt Dillon wrote to Ryan Thompson and [EMAIL PROTECTED]: : Hi all... : : One the tasks that I have undertaken lately is to improve the efficiency : of a couple of storage facilities we use internally, here. Basically, : they are moderate-size tables currently implemented in SQL, which is OK in : terms of performance, but the hash function is breaking down and the : storage is rather inefficient for our table of about 2,850,000 members : (~2.1 GB total storage). There are 64M possible hash values in our : current implementation, and our record size is variable, but could be : safely fixed at about 1.5KB... So, total storage if all values were used : would be about 96GB. (See where I'm going with this?) : : One of the options I am considering is actually using address calculation, : and taking advantage of filesystem holes, to keep storage down to what is : actually being used, while providing instant lookups. : : The single file would be about 96G addressable bytes... But the actual : block count would be much lower. I suppose I will have to create a series : of these files and divide the problem into 4GB chunks, but one : lookup/store will still be done with one calculation and one disk seek : (just more filehandles open). : : Deletes seem problematic. My question is, does the operating system : provide any way to free blocks used in the middle of a file? : : Must I periodically re-create these files (long and slow process, but not : so bad if done infrequently) to reclaim space, or is there a way to free : arbitrary blocks in a file in userland (WITHOUT trashing the filesystem? : :-) : : - Ryan : : -- : Ryan Thompson [EMAIL PROTECTED] : Network Administrator, Accounts : Phone: +1 (306) 664-1161 : : SaskNow Technologies http://www.sasknow.com : #106-380 3120 8th St E Saskatoon, SK S7H 0W2 I would strongly recommend using a B+Tree instead of a hash table. With a hash table slightly different lookups will seek all over the place. With a B+Tree lookups will stay more localized. Right... That's a good point, but (and, no, I really didn't mention this in my original post), "sequential" access is never important with the system I have described. Lookups are always random, and they are almost always done one at a time (multiple lookups only done for data swaps... very rare, like 1/1e7 accesses... and those are still O(1) (well, O(2), but who's counting ;-)))... For example, if you insert a (nearly) sorted dictionary of words into an SQL table with a B+Tree, the memory working set required to insert efficiently stays constant whether you are inserting a thousand, a million, or a billion records. That is, the memory requirement is effecitvely O(LogN) for a disk requirement of O(N). With a hash table, the memory working set required to insert efficiently is approximately O(N) for a disk requirement of O(N)... much much worse. Remember, though, I'm not proposing a hash table. I have been lucky enough to design a "hash" function that will crunch down all possible values into about 64M unique keys--so, no collisions. I propose to use this function with address calculation. So, O(1) everything, for single random lookups, which is (virtually) all this data structure must deal with. Sorting isn't necessary, and that could be accomplished in O(n) anyway. A B+Tree will also scale with the size of the dataset being managed, so you do not have to preallocate or prereserve file space. So will address calculation + filesystem holes, and sufficiently large filesizes :-) #include stdio.h int main() { FILE*f; f = fopen("bigfile", "w"); fseek(f, 0x7fff, SEEK_SET); putc('\n', f); fclose(f); return 0; } $ cc -o hole hole.c $ ./hole $ ls -lsk bigfile 48 -rw-rw-r-- 1 ryan 2147483648 Oct 29 14:09 bigfile We are using an in-house SQL database for our product (which I can't really talk about right now) and, using B+Tree's, I can consistently insert 300 records/sec on a cheap desktop PC (which I use for testing) with 64MB of ram (remember, insert requires an internal select to check for key conflicts), even when the test database grows to several gigabytes. I haven't even begun implementation, and I haven't had a chance to greatly experiment... So I don't know how this will perform. It would be directly dependent on disk seeks, though... Well... We could try a simple test... Add the following to hole.c: charbuf[1024]; /* 1K structure */ int i; ... for (i = 0; i 8192; i++) /* Insert 8192 records */ { fseek(f, rand()/1024*1024, SEEK_SET); /* Random access simulation */ fwrite(buf, sizeof(buf), 1, f); } $ time ./hole.c real0m25.436s user0m0.180s sys 0m4.912s So, about 320 records/sec on the following test machine: Intel Pentium