Re: fsck segfault on a big partition, 4.6
Joe Gidi [...@entropicblur.com] wrote: Does this mean that amd64 can now handle 4G of RAM, or is that a separate issue? Separate issue But if you have an iommu device and you set bigmem=1 then it might work for you
Re: fsck segfault on a big partition, 4.6
On Thu, Jan 28, 2010 at 1:24 AM, Robert info...@die-optimisten.net wrote: nixlists wrote: The idea is to limit memory such that running out of RAM+swap is not possible, or unlikely. You can set the limit on the allowed number of processes as well. I do use ulimit / login.conf for some processes, but does anybody really use it for *all possible* processes on each production machine? I set memory limits on most daemons. Especially on the 'net-connected stuff for obvious reasons. Including the necessary research into what could be the max. memory they *might* need in a spike situation? I honestly doubt that... Better estimate/guesstimate and limit some services than not at all.
Re: fsck segfault on a big partition, 4.6
On Tue, 26 Jan 2010 19:10:47 -0600 (CST), L. V. Lammert l...@omnitec.net wrote: On Wed, 27 Jan 2010, Rob Sheldon wrote: Don't know if this is related to a problem I had on a machine recently, .. however I found that if I hung the 'bad' drive on ANOTHER machine, the fsck ran just fine! To be honest, I'm not sure how I'd set that up without a ton of effort. The 6TB are done through multiple drives (raid 6) through an Areca raid controller; without having an identical machine to swap the hardware into, I don't think I could pull that off. Even if I did have an identical system to do that with, I doubt it would gain me anything in this case. Thanks for the tip though. :-) - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek o...@drijf.net wrote: On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote: Hi, Therse days, amd64 is the only platform that increases the limit (MAXDSIZE) to 8G. Though you venture into untested territory, we (myself at least) just do not have the hardware to test anything beyond 2T. OK. I just went back and looked at the order sheet for this thing, and it looks like it shipped with enough RAM to require amd64, so it should be (had better be!) running that kernel. I'd like to help, if at all possible. I should be able to get on-site with the client for at least a couple of hours today, and I can probably draw this out for a few days before I have to get the server back on-line. I can provide a dmesg and any other system specs without too much trouble -- is there any way to help track down the exact source of the segfault? The SEGVs may be related to not having swap. Running OpenBSD in overcommitted state is not what you want. What do you mean by overcommitted state -- not enough resources? The only thing this machine is supposed to do is run backuppc, which is just rsync with some Perl scripts. The old backup server was doing the same job with less resources for quite a while. The old server did have a swap partition, but as near as I could tell it was rarely used. ...In fact, I just logged in to the old server; it has an 8G swap partition, and top says it's not using any of it. So here's something I don't understand then: in the generic kernel, will fsck allocate more than 1G if swap is available, or is it still limited to just 1G? There's no dmesg attached because I'm not on-site with the server at the moment, and because AFAICT this is a known problem. A pity, since it does matter what platform you run on. fsck needing a lot of memory is indeed a known problem, but the SEGVs are not. You might want to check if they still occur when you have enough swap. OK. I'll get that info to you, and anything else you need (that I can handle), and I'll futz around with it and see if I can cable in a spare drive for swap. - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On Wed, Jan 27, 2010 at 02:06:20PM +, Rob Sheldon wrote: On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek o...@drijf.net wrote: On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote: Hi, Therse days, amd64 is the only platform that increases the limit (MAXDSIZE) to 8G. Though you venture into untested territory, we (myself at least) just do not have the hardware to test anything beyond 2T. OK. I just went back and looked at the order sheet for this thing, and it looks like it shipped with enough RAM to require amd64, so it should be (had better be!) running that kernel. I'd like to help, if at all possible. I should be able to get on-site with the client for at least a couple of hours today, and I can probably draw this out for a few days before I have to get the server back on-line. I can provide a dmesg and any other system specs without too much trouble -- is there any way to help track down the exact source of the segfault? The SEGVs may be related to not having swap. Running OpenBSD in overcommitted state is not what you want. What do you mean by overcommitted state -- not enough resources? The only thing this machine is supposed to do is run backuppc, which is just rsync with some Perl scripts. The old backup server was doing the same job with less resources for quite a while. The old server did have a swap partition, but as near as I could tell it was rarely used. ...In fact, I just logged in to the old server; it has an 8G swap partition, and top says it's not using any of it. The point is that fsck_ffs need loads of memory. So here's something I don't understand then: in the generic kernel, will fsck allocate more than 1G if swap is available, or is it still limited to just 1G? Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per process. What happens if more memory is allocated than the available swap is that the kernel will kill random processes to free swap. That might be what is going on in your case. Also, in some cases a lack of physical memory might kill processes. -Otto There's no dmesg attached because I'm not on-site with the server at the moment, and because AFAICT this is a known problem. A pity, since it does matter what platform you run on. fsck needing a lot of memory is indeed a known problem, but the SEGVs are not. You might want to check if they still occur when you have enough swap. OK. I'll get that info to you, and anything else you need (that I can handle), and I'll futz around with it and see if I can cable in a spare drive for swap. - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per process. What happens if more memory is allocated than the available swap is that the kernel will kill random processes to free swap. That might be what is going on in your case. Also, in some cases a lack of physical memory might kill processes. the kernel will kill random processes? are we talking about linux's OOM here or openbsd? since when is this in openbsd? i seem to recall some debate where openbsd devs found that idea ridiculous. i know i do, and the machine should panic instead of starting shooting down processes. -f -- to get a loan you must prove you don't need it.
Re: fsck segfault on a big partition, 4.6
On Wed, January 27, 2010 9:28 am, Otto Moerbeek wrote: Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per process. What happens if more memory is allocated than the available swap is that the kernel will kill random processes to free swap. That might be what is going on in your case. Also, in some cases a lack of physical memory might kill processes. -Otto Does this mean that amd64 can now handle 4G of RAM, or is that a separate issue? -- Joe Gidi j...@entropicblur.com
Re: Killing Random Processes [was: fsck segfault on a big partition, 4.6]
On Wed, 27 Jan 2010 16:00:32 +0100, frantisek holop min...@obiit.org wrote: hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that the kernel will kill random processes? are we talking about linux's OOM here or openbsd? since when is this in openbsd? i seem to recall some debate where openbsd devs found that idea ridiculous. i know i do, and the machine should panic instead of starting shooting down processes. I remember reading a thread here about killing random processes a long time ago, but I don't recall the results of that. I can't find it (quickly) in the archives. If you (and all) don't mind, if there's going to be any debate about this, I'd like to see it under a different thread instead. - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On Wed, Jan 27, 2010 at 10:11:57AM -0500, Joe Gidi wrote: On Wed, January 27, 2010 9:28 am, Otto Moerbeek wrote: Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per process. What happens if more memory is allocated than the available swap is that the kernel will kill random processes to free swap. That might be what is going on in your case. Also, in some cases a lack of physical memory might kill processes. -Otto Does this mean that amd64 can now handle 4G of RAM, or is that a separate issue? virtual mem != physical mem, so that's indeed a different issue. -Otto
Re: fsck segfault on a big partition, 4.6
On Wed, Jan 27, 2010 at 10:00 AM, frantisek holop min...@obiit.org wrote: hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per process. What happens if more memory is allocated than the available swap is that the kernel will kill random processes to free swap. That might be what is going on in your case. Also, in some cases a lack of physical memory might kill processes. the kernel will kill random processes? are we talking about linux's OOM here or openbsd? since when is this in openbsd? i seem to recall some debate where openbsd devs found that idea ridiculous. i know i do, and the machine should panic instead of starting shooting down processes. Some archs will kill processes, some will panic. i386 and amd64 should both panic I believe.
Re: fsck segfault on a big partition, 4.6
frantisek holop wrote: the kernel will kill random processes? are we talking about linux's OOM here or openbsd? since when is this in openbsd? i seem to recall some debate where openbsd devs found that idea ridiculous. i know i do, and the machine should panic instead of starting shooting down processes. -f Am I missing something here? If the OS runs out of (any) memory then there is already a serious problem. In such a case I would prefer that the kernel kills some random applications but protects itself, so that I can login on the console and check what's going on. It might even be possible to make a clean reboot (avoiding a long fsck). A kernel panic is IMHO the worst option. ? Please explain your point of view, or why the devs consider it a bad idea (a quick search on the list didn't show anything). (I understand that in case of kernel development a panic would be useful as it shows information, but I consider the daily usage case) regards, Robert PS: What is the actual situation in OpenBSD? Does it have some OOM killer?
Re: fsck segfault on a big partition, 4.6
hmm, on Wed, Jan 27, 2010 at 04:35:19PM +0100, Robert said that If the OS runs out of (any) memory then there is already a serious there's plenty of discussion about the virtues/stupidity of the OOM killer approach, including various pardon policies. google for out of fuel linux for amusement. problem. In such a case I would prefer that the kernel kills some random applications but protects itself, so that I can login on the console and check what's going on. It might even be possible to make riiight. and how pray if that random process happens to be the ssh daemon or some other process supporting your infrastructure? if a process is out of control, i'd rather have the system complain loudly and angrily. i am not keen on seeing mysterious missing processes, user/customer complaints because of untraceable failures of transactions, tasks, jobs, whatever. -f -- fish and guests smell in three days.
Re: fsck segfault on a big partition, 4.6
On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek o...@drijf.net wrote: On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote: There's no dmesg attached because I'm not on-site with the server at the moment, and because AFAICT this is a known problem. A pity, since it does matter what platform you run on. fsck needing a lot of memory is indeed a known problem, but the SEGVs are not. You might want to check if they still occur when you have enough swap. OK, I was able to visit for a few minutes today, enough to get the machine answering ssh again. First, disklabel so you know what it actually has: $ sudo disklabel sd1 # /dev/rsd1c: type: SCSI disk: SCSI disk label: Transcend 4GB flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 488 total sectors: 7843840 rpm: 3600 interleave: 1 boundstart: 63 boundend: 7839720 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] a: 7839657 63 4.2BSD 2048 163841 # / c: 78438400 unused $ sudo disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: ARC-1220-VOL#00 flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 729458 total sectors: 11718749184 rpm: 1 interleave: 1 boundstart: 63 boundend: 3128808178 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] a: 11718749121 63 4.2BSD 2048 163841 c: 117187491840 unused ...and the dmesg... $ dmesg OpenBSD 4.6 (GENERIC.MP) #81: Thu Jul 9 21:26:19 MDT 2009 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 3486973952 (3325MB) avail mem = 3370655744 (3214MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries) bios0: vendor Phoenix Technologies LTD version 1.2a date 12/19/2008 bios0: Supermicro X7SB4/E acpi0 at bios0: rev 2 acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT acpi0: wakeup devices PXHA(S5) PXHB(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5) USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5) USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2494.07 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG cpu0: 2MB 64b/line 8-way L2 cache cpu0: apic clock running at 199MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2493.75 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG cpu1: 2MB 64b/line 8-way L2 cache ioapic0 at mainbus0 apid 2 pa 0xfec0, version 20, 24 pins ioapic1 at mainbus0 apid 3 pa 0xfecc, version 20, 24 pins ioapic2 at mainbus0 apid 4 pa 0xfecc0400, version 20, 24 pins acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 2 (PXHA) acpiprt2 at acpi0: bus 3 (PXHB) acpiprt3 at acpi0: bus 4 (PEX_) acpiprt4 at acpi0: bus 7 (EXP1) acpiprt5 at acpi0: bus 13 (EXP5) acpiprt6 at acpi0: bus 15 (EXP6) acpiprt7 at acpi0: bus 17 (PCIB) acpicpu0 at acpi0: C3, PSS acpicpu1 at acpi0: C3, PSS acpibtn0 at acpi0: PWRB acpivideo0 at acpi0: IGD0 ipmi at mainbus0 not configured cpu0: Enhanced SpeedStep 2493 MHz: speeds: 2500, 2400, 2000, 1600, 1200 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 Intel 3200/3210 Host rev 0x01 ppb0 at pci0 dev 1 function 0 Intel 3200/3210 PCIE rev 0x01: apic 2 int 16 (irq 5) pci1 at ppb0 bus 1 ppb1 at pci1 dev 0 function 0 Intel PCIE-PCIE rev 0x09 pci2 at ppb1 bus 2 Intel IOxAPIC rev 0x09 at pci1 dev 0 function 1 not configured ppb2 at pci1 dev 0 function 2 Intel PCIE-PCIE rev 0x09 pci3 at ppb2 bus 3 Intel IOxAPIC rev 0x09 at pci1 dev 0 function 3 not configured ppb3 at pci0 dev 6 function 0 Intel 3210 PCIE rev 0x01: apic 2 int 16 (irq 5) pci4 at ppb3 bus 4 ppb4 at pci4 dev 0 function 0 Intel IOP333 PCIE-PCIX rev 0x00 pci5 at ppb4 bus 5 arc0 at pci5 dev 14 function 0 Areca ARC-1220 rev 0x00: apic 2 int 18 (irq 11) arc0: 8 ports, 256MB SDRAM, firmware V1.46 2009-01-06 scsibus0 at arc0: 16 targets sd0 at scsibus0 targ 0 lun 0: Areca, ARC-1220-VOL#00, R001 SCSI3 0/direct fixed sd0: 5722045MB, 512 bytes/sec, 11718749184 sec total ppb5 at pci4 dev 0 function 2 Intel IOP333 PCIE-PCIX rev 0x00 pci6 at ppb5 bus 6 uhci0 at pci0 dev 26 function 0 Intel 82801I USB rev 0x02: apic 2 int 16 (irq 5) uhci1 at pci0 dev 26 function 1 Intel 82801I USB rev 0x02: apic 2 int 17 (irq 10) uhci2 at pci0 dev 26 function
Re: fsck segfault on a big partition, 4.6
On Wed, 27 Jan 2010 22:06:19 +0100, Otto Moerbeek o...@drijf.net wrote: No, currently the amount of physical memory an amd64 can address is limited. Well, F___. :-( The rule here then is, if you've got a partition bigger than 1TB, you *must* have swap? - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On Wed, 27 Jan 2010 20:43 +, Rob Sheldon r...@associatedtechs.com wrote: [snip] softraid0 at root root on sd1a swap on sd1b dump on sd1b ...that's odd, it's showing swap (and dump) on sd1b, but there's no such thing: $ sudo df /dev/sd1b df: /dev/sd1b: Device not configured ...maybe it really doesn't like running without swap? It's there. disklabel -vh sd1 and you'll see b is swap. Try swapctl as well... also dmesg | grep swap: root on sd1a swap on sd1b dump on sd1b Oh wait, it's showing only 3G of memory installed. I just physically checked the machine, and it has 4 full banks of 2G each. amd64 should be able to address that, right? I think you would need a bigmem enabled kernel. That could certainly explain why fsck is unhappy. Thanks, - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On 2010-01-27, Rob Sheldon r...@associatedtechs.com wrote: The longer version: this is a backup server running backuppc for a corporate client (large enough number of workstations) that does research work (some really big files). I _thought_ I had read the big filesystem FAQ carefully, but somehow missed that fsck simply couldn't handle anything over 1TB without doing funny things during the fs setup. The default is to create an inode for each 8192 bytes of data space. They aren't especially funny things; if you have a fairly large filesystem with files most people would now call medium or larger, you'll probably be rather surprised at the difference in fsck time if you lower the inode density a bit... If it's not essential data I don't think I'd waste time tryings to fsck it. Force a read-only mount and copy any backuppc config you need off first, disklabel, allocate some swap, consider splitting into smaller chunks, and newfs with more appropriate settings, you'll still have the main OS install on the other partitions. Or, indeed, use a different OS if you prefer.
Re: fsck segfault on a big partition, 4.6
Whoops... re-reading, I see that I missed your disklabel output... sorry. On Wed, 27 Jan 2010 17:25 -0500, Brad Tilley b...@16systems.com wrote: On Wed, 27 Jan 2010 20:43 +, Rob Sheldon r...@associatedtechs.com wrote: [snip] softraid0 at root root on sd1a swap on sd1b dump on sd1b ...that's odd, it's showing swap (and dump) on sd1b, but there's no such thing: $ sudo df /dev/sd1b df: /dev/sd1b: Device not configured ...maybe it really doesn't like running without swap? It's there. disklabel -vh sd1 and you'll see b is swap. Try swapctl as well... also dmesg | grep swap: root on sd1a swap on sd1b dump on sd1b Oh wait, it's showing only 3G of memory installed. I just physically checked the machine, and it has 4 full banks of 2G each. amd64 should be able to address that, right? I think you would need a bigmem enabled kernel. That could certainly explain why fsck is unhappy. Thanks, - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On 1/28/10, nixlists nixmli...@gmail.com wrote: Why kill random processes that may not be misbehaving and/or cause a kernel panic when you want to kill the process(es) that leak memory or are hungry in the first place? It's possible to avoid kernel panics in this case IMO, and not kill random processes. aren't you missing the point of original comment made by Otto? consider a situation, when all the processes in the system are behaving, none of them violates their rlimits, but they all together have allocated more memory than the box contains (RAM + swap). so the OS needs to do something. what should it do? should it just panic? or may be losing one process is better than losing them all? then, what are the criteria for choosing processes to be killed?.. wondering if random means the process with PID 1 could be one of them...
Re: fsck segfault on a big partition, 4.6
On Wed, Jan 27, 2010 at 8:14 PM, nixlists nixmli...@gmail.com wrote: On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko denis.doroshe...@gmail.com wrote: aren't you missing the point of original comment made by Otto? consider a situation, when all the processes in the system are behaving, none of them violates their rlimits, but they all together have allocated more memory than the box contains (RAM + swap). The idea is to limit memory such that running out of RAM+swap is not possible, or unlikely. You can set the limit on the allowed number of processes as well. $ ulimit -m 971876 $ dmesg | grep real\ mem real mem = 1039691776 (991MB) So... this box should run only one process? $ ps -auxww|wc 54 7134936 If I were to use the max memory usage of each process, I would need a 53Gig ram machine? -- http://www.glumbert.com/media/shift http://www.youtube.com/watch?v=tGvHNNOLnCk This officer's men seem to follow him merely out of idle curiosity. -- Sandhurst officer cadet evaluation. Securing an environment of Windows platforms from abuse - external or internal - is akin to trying to install sprinklers in a fireworks factory where smoking on the job is permitted. -- Gene Spafford learn french: http://www.youtube.com/watch?v=30v_g83VHK4
Re: fsck segfault on a big partition, 4.6
Obviously, as any competent sysadmin like nixlists knows, you should restrict all your processes to a max of 20 megs. On Jan 27, 2010, at 9:23 PM, bofh goodb...@gmail.com wrote: On Wed, Jan 27, 2010 at 8:14 PM, nixlists nixmli...@gmail.com wrote: On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko denis.doroshe...@gmail.com wrote: aren't you missing the point of original comment made by Otto? consider a situation, when all the processes in the system are behaving, none of them violates their rlimits, but they all together have allocated more memory than the box contains (RAM + swap). The idea is to limit memory such that running out of RAM+swap is not possible, or unlikely. You can set the limit on the allowed number of processes as well. $ ulimit -m 971876 $ dmesg | grep real\ mem real mem = 1039691776 (991MB) So... this box should run only one process? $ ps -auxww|wc 54 7134936 If I were to use the max memory usage of each process, I would need a 53Gig ram machine? -- http://www.glumbert.com/media/shift http://www.youtube.com/watch?v=tGvHNNOLnCk This officer's men seem to follow him merely out of idle curiosity. -- Sandhurst officer cadet evaluation. Securing an environment of Windows platforms from abuse - external or internal - is akin to trying to install sprinklers in a fireworks factory where smoking on the job is permitted. -- Gene Spafford learn french: http://www.youtube.com/watch?v=30v_g83VHK4
fsck segfault on a big partition, 4.6
Hi, So, the short version is that I have a server with OpenBSD 4.6 that can't fsck its big partition; fsck fails with a segfault every time. If I ulimit -d unlimited before fsck'ing, it just takes a little longer to segfault. It produces no other output. IIRC, the partition is roughly 6 TB. Two questions then: is there any way through this that doesn't involve newfs'ing the partition, and is there a right way to do a partition of that size in OpenBSD given fsck's 1G hard limit? The longer version: this is a backup server running backuppc for a corporate client (large enough number of workstations) that does research work (some really big files). I _thought_ I had read the big filesystem FAQ carefully, but somehow missed that fsck simply couldn't handle anything over 1TB without doing funny things during the fs setup. So, this particular partition was backuppc's data directory, and it was set up with the default block sizes. Also possibly noteworthy: there's no swap, the OS and other partitions are all running off of a USB flash drive for various reasons. If I have to wipe the partition and start over, it's not a disaster. This was a newer server, the old backup server was still online and still had some disk left, so I get to keep my butt out of a sling. But, if I'm going to have to do that, then I also need to consider whether it might just be better to use a different OS. (No foul intended, I'm a big fan of OpenBSD, but it just might not be the right tool for this job.) There's no dmesg attached because I'm not on-site with the server at the moment, and because AFAICT this is a known problem. Thanks, - R. -- [__ Robert Sheldon [__ Founder, No Problem [__ Information technology support and services [__ Software and web design and development [__ (530) 575-0278 [__ You must be the change you wish to see in the world. -- Mahatma Gandhi
Re: fsck segfault on a big partition, 4.6
On Wed, 27 Jan 2010, Rob Sheldon wrote: Hi, So, the short version is that I have a server with OpenBSD 4.6 that can't fsck its big partition; fsck fails with a segfault every time. If I ulimit -d unlimited before fsck'ing, it just takes a little longer to segfault. It produces no other output. IIRC, the partition is roughly 6 TB. Two questions then: is there any way through this that doesn't involve newfs'ing the partition, and is there a right way to do a partition of that size in OpenBSD given fsck's 1G hard limit? Don't know if this is related to a problem I had on a machine recently, .. however I found that if I hung the 'bad' drive on ANOTHER machine, the fsck ran just fine! Might be worth a try, .. Lee
Re: fsck segfault on a big partition, 4.6
On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote: Hi, So, the short version is that I have a server with OpenBSD 4.6 that can't fsck its big partition; fsck fails with a segfault every time. If I ulimit -d unlimited before fsck'ing, it just takes a little longer to segfault. It produces no other output. IIRC, the partition is roughly 6 TB. Two questions then: is there any way through this that doesn't involve newfs'ing the partition, and is there a right way to do a partition of that size in OpenBSD given fsck's 1G hard limit? Amd64 allows 8G. Increase newfs blocksize to 64k (make sure you don't run out of inodes), that should lessen the memory requirements a bit and make fsck runs a little faster. I have my doubts about OpenBSD as a (backup) file server with large filesystems, there might be a more appropriate OS for the job.
Re: fsck segfault on a big partition, 4.6
On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote: Hi, So, the short version is that I have a server with OpenBSD 4.6 that can't fsck its big partition; fsck fails with a segfault every time. If I ulimit -d unlimited before fsck'ing, it just takes a little longer to segfault. It produces no other output. IIRC, the partition is roughly 6 TB. Two questions then: is there any way through this that doesn't involve newfs'ing the partition, and is there a right way to do a partition of that size in OpenBSD given fsck's 1G hard limit? No, there is no other way. I've posted a small piece of code some time ago that estimate the amount of mem needed for doing an fsck during newfs. Therse days, amd64 is the only platform that increases the limit (MAXDSIZE) to 8G. Though you venture into untested territory, we (myself at least) just do not have the hardware to test anything beyond 2T. The longer version: this is a backup server running backuppc for a corporate client (large enough number of workstations) that does research work (some really big files). I _thought_ I had read the big filesystem FAQ carefully, but somehow missed that fsck simply couldn't handle anything over 1TB without doing funny things during the fs setup. So, this particular partition was backuppc's data directory, and it was set up with the default block sizes. Also possibly noteworthy: there's no swap, the OS and other partitions are all running off of a USB flash drive for various reasons. The SEGVs may be related to not having swap. Running OpenBSD in overcommitted state is not what you want. If I have to wipe the partition and start over, it's not a disaster. This was a newer server, the old backup server was still online and still had some disk left, so I get to keep my butt out of a sling. But, if I'm going to have to do that, then I also need to consider whether it might just be better to use a different OS. (No foul intended, I'm a big fan of OpenBSD, but it just might not be the right tool for this job.) There's no dmesg attached because I'm not on-site with the server at the moment, and because AFAICT this is a known problem. A pity, since it does matter what platform you run on. fsck needing a lot of memory is indeed a known problem, but the SEGVs are not. You might want to check if they still occur when you have enough swap. -Otto