Re: fsck segfault on a big partition, 4.6

2010-02-19 Thread Chris Cappuccio
Joe Gidi [...@entropicblur.com] wrote:
 
 Does this mean that amd64 can now handle 4G of RAM, or is that a separate
 issue?

Separate issue

But if you have an iommu device and you set bigmem=1 then it might work for you



Re: fsck segfault on a big partition, 4.6

2010-01-28 Thread nixlists
On Thu, Jan 28, 2010 at 1:24 AM, Robert info...@die-optimisten.net wrote:
 nixlists wrote:

 The idea is to limit memory such that running out of RAM+swap is not
 possible, or unlikely. You can set the limit on the allowed number of
 processes as well.

 I do use ulimit / login.conf for some processes, but does anybody really use
 it for *all possible* processes on each production machine?

I set memory limits on most daemons. Especially on the 'net-connected
stuff for obvious reasons.

 Including the necessary research into what could be the max. memory they
 *might* need in a spike situation?
 I honestly doubt that...

Better estimate/guesstimate and limit some services than not at all.



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Rob Sheldon
On Tue, 26 Jan 2010 19:10:47 -0600 (CST), L. V. Lammert
l...@omnitec.net
wrote:
 On Wed, 27 Jan 2010, Rob Sheldon wrote:
 
 Don't know if this is related to a problem I had on a machine recently,
..
 however I found that if I hung the 'bad' drive on ANOTHER machine, the
 fsck ran just fine!

To be honest, I'm not sure how I'd set that up without a ton of effort.
The 6TB are done through multiple drives (raid 6) through an Areca raid
controller; without having an identical machine to swap the hardware into,
I don't think I could pull that off. Even if I did have an identical system
to do that with, I doubt it would gain me anything in this case.

Thanks for the tip though. :-)

- R.

-- 
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ You must be the change you wish to see in the world. -- Mahatma
Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Rob Sheldon
On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek o...@drijf.net wrote:
 On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote:
 
 Hi,
 
 Therse days, amd64 is the only platform that increases the limit
 (MAXDSIZE) to 8G. Though you venture into untested territory, we
 (myself at least) just do not have the hardware to test anything
 beyond 2T. 

OK. I just went back and looked at the order sheet for this thing, and it
looks like it shipped with enough RAM to require amd64, so it should be
(had better be!) running that kernel.

I'd like to help, if at all possible. I should be able to get on-site with
the client for at least a couple of hours today, and I can probably draw
this out for a few days before I have to get the server back on-line. I can
provide a dmesg and any other system specs without too much trouble -- is
there any way to help track down the exact source of the segfault?

 The SEGVs may be related to not having swap. Running OpenBSD in
 overcommitted state is not what you want. 

What do you mean by overcommitted state -- not enough resources? The
only thing this machine is supposed to do is run backuppc, which is just
rsync with some Perl scripts. The old backup server was doing the same job
with less resources for quite a while. The old server did have a swap
partition, but as near as I could tell it was rarely used. ...In fact, I
just logged in to the old server; it has an 8G swap partition, and top says
it's not using any of it.

So here's something I don't understand then: in the generic kernel, will
fsck allocate more than 1G if swap is available, or is it still limited to
just 1G?

 There's no dmesg attached because I'm not on-site with the server at
the
 moment, and because AFAICT this is a known problem.
 
 A pity, since it does matter what platform you run on. fsck needing a
 lot of memory is indeed a known problem, but the SEGVs are not. You
 might want to check if they still occur when you have enough swap.

OK. I'll get that info to you, and anything else you need (that I can
handle), and I'll futz around with it and see if I can cable in a spare
drive for swap.

- R.

-- 
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ You must be the change you wish to see in the world. -- Mahatma
Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Otto Moerbeek
On Wed, Jan 27, 2010 at 02:06:20PM +, Rob Sheldon wrote:

 On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek o...@drijf.net wrote:
  On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote:
  
  Hi,
  
  Therse days, amd64 is the only platform that increases the limit
  (MAXDSIZE) to 8G. Though you venture into untested territory, we
  (myself at least) just do not have the hardware to test anything
  beyond 2T. 
 
 OK. I just went back and looked at the order sheet for this thing, and it
 looks like it shipped with enough RAM to require amd64, so it should be
 (had better be!) running that kernel.
 
 I'd like to help, if at all possible. I should be able to get on-site with
 the client for at least a couple of hours today, and I can probably draw
 this out for a few days before I have to get the server back on-line. I can
 provide a dmesg and any other system specs without too much trouble -- is
 there any way to help track down the exact source of the segfault?
 
  The SEGVs may be related to not having swap. Running OpenBSD in
  overcommitted state is not what you want. 
 
 What do you mean by overcommitted state -- not enough resources? The
 only thing this machine is supposed to do is run backuppc, which is just
 rsync with some Perl scripts. The old backup server was doing the same job
 with less resources for quite a while. The old server did have a swap
 partition, but as near as I could tell it was rarely used. ...In fact, I
 just logged in to the old server; it has an 8G swap partition, and top says
 it's not using any of it.

The point is that fsck_ffs need loads of memory.

 
 So here's something I don't understand then: in the generic kernel, will
 fsck allocate more than 1G if swap is available, or is it still limited to
 just 1G?

Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
process.  What happens if more memory is allocated than the available
swap is that the kernel will kill random processes to free swap. That
might be what is going on in your case. Also, in some cases a lack of
physical memory might kill processes. 

-Otto

 
  There's no dmesg attached because I'm not on-site with the server at
 the
  moment, and because AFAICT this is a known problem.
  
  A pity, since it does matter what platform you run on. fsck needing a
  lot of memory is indeed a known problem, but the SEGVs are not. You
  might want to check if they still occur when you have enough swap.
 
 OK. I'll get that info to you, and anything else you need (that I can
 handle), and I'll futz around with it and see if I can cable in a spare
 drive for swap.
 
 - R.
 
 -- 
 [__ Robert Sheldon
 [__ Founder, No Problem
 [__ Information technology support and services
 [__ Software and web design and development
 [__ (530) 575-0278
 [__ You must be the change you wish to see in the world. -- Mahatma
 Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread frantisek holop
hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
 Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
 process.  What happens if more memory is allocated than the available
 swap is that the kernel will kill random processes to free swap. That
 might be what is going on in your case. Also, in some cases a lack of
 physical memory might kill processes. 

the kernel will kill random processes?  are we talking about linux's OOM
here or openbsd?  since when is this in openbsd?  i seem to recall
some debate where openbsd devs found that idea ridiculous.  i know i do,
and the machine should panic instead of starting shooting down processes.

-f
-- 
to get a loan you must prove you don't need it.



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Joe Gidi
On Wed, January 27, 2010 9:28 am, Otto Moerbeek wrote:
 Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
 process.  What happens if more memory is allocated than the available
 swap is that the kernel will kill random processes to free swap. That
 might be what is going on in your case. Also, in some cases a lack of
 physical memory might kill processes.

   -Otto

Does this mean that amd64 can now handle 4G of RAM, or is that a separate
issue?

-- 
Joe Gidi
j...@entropicblur.com



Re: Killing Random Processes [was: fsck segfault on a big partition, 4.6]

2010-01-27 Thread Rob Sheldon
On Wed, 27 Jan 2010 16:00:32 +0100, frantisek holop min...@obiit.org
wrote:
 hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
 
 the kernel will kill random processes?  are we talking about linux's OOM
 here or openbsd?  since when is this in openbsd?  i seem to recall
 some debate where openbsd devs found that idea ridiculous.  i know i do,
 and the machine should panic instead of starting shooting down
processes.

I remember reading a thread here about killing random processes a long
time ago, but I don't recall the results of that. I can't find it (quickly)
in the archives.

If you (and all) don't mind, if there's going to be any debate about this,
I'd like to see it under a different thread instead.

- R.

-- 
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ You must be the change you wish to see in the world. -- Mahatma
Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Otto Moerbeek
On Wed, Jan 27, 2010 at 10:11:57AM -0500, Joe Gidi wrote:

 On Wed, January 27, 2010 9:28 am, Otto Moerbeek wrote:
  Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
  process.  What happens if more memory is allocated than the available
  swap is that the kernel will kill random processes to free swap. That
  might be what is going on in your case. Also, in some cases a lack of
  physical memory might kill processes.
 
  -Otto
 
 Does this mean that amd64 can now handle 4G of RAM, or is that a separate
 issue?

virtual mem != physical mem, so that's indeed a different issue.

-Otto



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Ted Unangst
On Wed, Jan 27, 2010 at 10:00 AM, frantisek holop min...@obiit.org wrote:
 hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
 Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
 process.  What happens if more memory is allocated than the available
 swap is that the kernel will kill random processes to free swap. That
 might be what is going on in your case. Also, in some cases a lack of
 physical memory might kill processes.

 the kernel will kill random processes?  are we talking about linux's OOM
 here or openbsd?  since when is this in openbsd?  i seem to recall
 some debate where openbsd devs found that idea ridiculous.  i know i do,
 and the machine should panic instead of starting shooting down processes.

Some archs will kill processes, some will panic.  i386 and amd64
should both panic I believe.



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Robert

frantisek holop wrote:

the kernel will kill random processes?  are we talking about linux's OOM
here or openbsd?  since when is this in openbsd?  i seem to recall
some debate where openbsd devs found that idea ridiculous.  i know i do,
and the machine should panic instead of starting shooting down processes.

-f


Am I missing something here?
If the OS runs out of (any) memory then there is already a serious 
problem. In such a case I would prefer that the kernel kills some random 
applications but protects itself, so that I can login on the console and 
check what's going on. It might even be possible to make a clean reboot 
(avoiding a long fsck).

A kernel panic is IMHO the worst option.

?
Please explain your point of view, or why the devs consider it a bad 
idea (a quick search on the list didn't show anything).
(I understand that in case of kernel development a panic would be useful 
as it shows information, but I consider the daily usage case)


regards,
Robert

PS:
What is the actual situation in OpenBSD? Does it have some OOM killer?



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread frantisek holop
hmm, on Wed, Jan 27, 2010 at 04:35:19PM +0100, Robert said that
 If the OS runs out of (any) memory then there is already a serious

there's plenty of discussion about the virtues/stupidity
of the OOM killer approach, including various pardon policies.
google for out of fuel linux for amusement.

 problem. In such a case I would prefer that the kernel kills some
 random applications but protects itself, so that I can login on the
 console and check what's going on. It might even be possible to make

riiight.  and how pray if that random process happens to be the
ssh daemon or some other process supporting your infrastructure?

if a process is out of control, i'd rather have the system complain
loudly and angrily.  i am not keen on seeing mysterious missing
processes, user/customer complaints because of untraceable failures
of transactions, tasks, jobs, whatever.

-f
-- 
fish and guests smell in three days.



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Rob Sheldon
On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek o...@drijf.net wrote:
 On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote:
 
 There's no dmesg attached because I'm not on-site with the server at
the
 moment, and because AFAICT this is a known problem.
 
 A pity, since it does matter what platform you run on. fsck needing a
 lot of memory is indeed a known problem, but the SEGVs are not. You
 might want to check if they still occur when you have enough swap.

OK, I was able to visit for a few minutes today, enough to get the machine
answering ssh again.

First, disklabel so you know what it actually has:

$ sudo disklabel sd1
# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: Transcend 4GB   
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 488
total sectors: 7843840
rpm: 3600
interleave: 1
boundstart: 63
boundend: 7839720
drivedata: 0 

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  a:  7839657   63  4.2BSD   2048 163841 # /
  c:  78438400  unused   

$ sudo disklabel sd0 
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: ARC-1220-VOL#00 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 729458
total sectors: 11718749184
rpm: 1
interleave: 1
boundstart: 63
boundend: 3128808178
drivedata: 0 

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  a:  11718749121   63  4.2BSD   2048 163841 
  c:  117187491840  unused   

...and the dmesg...

$ dmesg
OpenBSD 4.6 (GENERIC.MP) #81: Thu Jul  9 21:26:19 MDT 2009
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3486973952 (3325MB)
avail mem = 3370655744 (3214MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
bios0: vendor Phoenix Technologies LTD version 1.2a date 12/19/2008
bios0: Supermicro X7SB4/E
acpi0 at bios0: rev 2
acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ SLIC
SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices PXHA(S5) PXHB(S5) PEX_(S5) LAN_(S5) USB4(S5)
USB5(S5) USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5)
USB3(S5) USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5)
PWRB(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2494.07 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
cpu0: 2MB 64b/line 8-way L2 cache
cpu0: apic clock running at 199MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2493.75 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
cpu1: 2MB 64b/line 8-way L2 cache
ioapic0 at mainbus0 apid 2 pa 0xfec0, version 20, 24 pins
ioapic1 at mainbus0 apid 3 pa 0xfecc, version 20, 24 pins
ioapic2 at mainbus0 apid 4 pa 0xfecc0400, version 20, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (PXHA)
acpiprt2 at acpi0: bus 3 (PXHB)
acpiprt3 at acpi0: bus 4 (PEX_)
acpiprt4 at acpi0: bus 7 (EXP1)
acpiprt5 at acpi0: bus 13 (EXP5)
acpiprt6 at acpi0: bus 15 (EXP6)
acpiprt7 at acpi0: bus 17 (PCIB)
acpicpu0 at acpi0: C3, PSS
acpicpu1 at acpi0: C3, PSS
acpibtn0 at acpi0: PWRB
acpivideo0 at acpi0: IGD0
ipmi at mainbus0 not configured
cpu0: Enhanced SpeedStep 2493 MHz: speeds: 2500, 2400, 2000, 1600, 1200
MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 Intel 3200/3210 Host rev 0x01
ppb0 at pci0 dev 1 function 0 Intel 3200/3210 PCIE rev 0x01: apic 2 int
16 (irq 5)
pci1 at ppb0 bus 1
ppb1 at pci1 dev 0 function 0 Intel PCIE-PCIE rev 0x09
pci2 at ppb1 bus 2
Intel IOxAPIC rev 0x09 at pci1 dev 0 function 1 not configured
ppb2 at pci1 dev 0 function 2 Intel PCIE-PCIE rev 0x09
pci3 at ppb2 bus 3
Intel IOxAPIC rev 0x09 at pci1 dev 0 function 3 not configured
ppb3 at pci0 dev 6 function 0 Intel 3210 PCIE rev 0x01: apic 2 int 16
(irq 5)
pci4 at ppb3 bus 4
ppb4 at pci4 dev 0 function 0 Intel IOP333 PCIE-PCIX rev 0x00
pci5 at ppb4 bus 5
arc0 at pci5 dev 14 function 0 Areca ARC-1220 rev 0x00: apic 2 int 18
(irq 11)
arc0: 8 ports, 256MB SDRAM, firmware V1.46 2009-01-06
scsibus0 at arc0: 16 targets
sd0 at scsibus0 targ 0 lun 0: Areca, ARC-1220-VOL#00, R001 SCSI3
0/direct fixed
sd0: 5722045MB, 512 bytes/sec, 11718749184 sec total
ppb5 at pci4 dev 0 function 2 Intel IOP333 PCIE-PCIX rev 0x00
pci6 at ppb5 bus 6
uhci0 at pci0 dev 26 function 0 Intel 82801I USB rev 0x02: apic 2 int 16
(irq 5)
uhci1 at pci0 dev 26 function 1 Intel 82801I USB rev 0x02: apic 2 int 17
(irq 10)
uhci2 at pci0 dev 26 function 

Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Rob Sheldon
On Wed, 27 Jan 2010 22:06:19 +0100, Otto Moerbeek o...@drijf.net wrote:

 No, currently the amount of physical memory an amd64 can address is
 limited.

Well, F___. :-(

The rule here then is, if you've got a partition bigger than 1TB, you
*must* have swap?

- R.

-- 
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ You must be the change you wish to see in the world. -- Mahatma
Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Brad Tilley
On Wed, 27 Jan 2010 20:43 +, Rob Sheldon r...@associatedtechs.com wrote:

[snip]

 softraid0 at root
 root on sd1a swap on sd1b dump on sd1b
 
 ...that's odd, it's showing swap (and dump) on sd1b, but there's no such
 thing:
 
 $ sudo df /dev/sd1b
 df: /dev/sd1b: Device not configured

 ...maybe it really doesn't like running without swap?

It's there. disklabel -vh sd1 and you'll see b is swap. Try swapctl as well... 
also dmesg | grep swap:

root on sd1a swap on sd1b dump on sd1b
 

 Oh wait, it's showing only 3G of memory installed. I just physically
 checked the machine, and it has 4 full banks of 2G each. amd64 should be
 able to address that, right?

I think you would need a bigmem enabled kernel.
 
 That could certainly explain why fsck is unhappy.
 
 Thanks,
 
 - R.
 
 -- 
 [__ Robert Sheldon
 [__ Founder, No Problem
 [__ Information technology support and services
 [__ Software and web design and development
 [__ (530) 575-0278
 [__ You must be the change you wish to see in the world. -- Mahatma
 Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Stuart Henderson
On 2010-01-27, Rob Sheldon r...@associatedtechs.com wrote:
 The longer version: this is a backup server running backuppc for a
 corporate client (large enough number of workstations) that does research
 work (some really big files). I _thought_ I had read the big filesystem
 FAQ carefully, but somehow missed that fsck simply couldn't handle anything
 over 1TB without doing funny things during the fs setup.

The default is to create an inode for each 8192 bytes of data space.

They aren't especially funny things; if you have a fairly large
filesystem with files most people would now call medium or larger,
you'll probably be rather surprised at the difference in fsck time
if you lower the inode density a bit...

If it's not essential data I don't think I'd waste time tryings
to fsck it. Force a read-only mount and copy any backuppc config
you need off first, disklabel, allocate some swap, consider
splitting into smaller chunks, and newfs with more appropriate
settings, you'll still have the main OS install on the other
partitions. Or, indeed, use a different OS if you prefer.



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Brad Tilley
Whoops... re-reading, I see that I missed your disklabel output... sorry.


On Wed, 27 Jan 2010 17:25 -0500, Brad Tilley b...@16systems.com wrote:
 On Wed, 27 Jan 2010 20:43 +, Rob Sheldon r...@associatedtechs.com
 wrote:
 
 [snip]
 
  softraid0 at root
  root on sd1a swap on sd1b dump on sd1b
  
  ...that's odd, it's showing swap (and dump) on sd1b, but there's no such
  thing:
  
  $ sudo df /dev/sd1b
  df: /dev/sd1b: Device not configured
 
  ...maybe it really doesn't like running without swap?
 
 It's there. disklabel -vh sd1 and you'll see b is swap. Try swapctl as
 well... also dmesg | grep swap:
 
 root on sd1a swap on sd1b dump on sd1b
  
 
  Oh wait, it's showing only 3G of memory installed. I just physically
  checked the machine, and it has 4 full banks of 2G each. amd64 should be
  able to address that, right?
 
 I think you would need a bigmem enabled kernel.
  
  That could certainly explain why fsck is unhappy.
  
  Thanks,
  
  - R.
  
  -- 
  [__ Robert Sheldon
  [__ Founder, No Problem
  [__ Information technology support and services
  [__ Software and web design and development
  [__ (530) 575-0278
  [__ You must be the change you wish to see in the world. -- Mahatma
  Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Denis Doroshenko
On 1/28/10, nixlists nixmli...@gmail.com wrote:
  Why kill random processes that may not be misbehaving and/or cause a
  kernel panic when you want to kill the process(es) that leak memory or
  are hungry in the first place? It's possible to avoid kernel panics in
  this case IMO, and not kill random processes.

aren't you missing the point of original comment made by Otto?

consider a situation, when all the processes in the system are
behaving, none of them violates their rlimits, but they all together
have allocated more memory than the box contains (RAM + swap).

so the OS needs to do something. what should it do? should it just
panic? or may be losing one process is better than losing them all?
then, what are the criteria for choosing processes to be killed?..

wondering if random means the process with PID 1 could be one of them...



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread bofh
On Wed, Jan 27, 2010 at 8:14 PM, nixlists nixmli...@gmail.com wrote:
 On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko
 denis.doroshe...@gmail.com wrote:
 aren't you missing the point of original comment made by Otto?

 consider a situation, when all the processes in the system are
 behaving, none of them violates their rlimits, but they all together
 have allocated more memory than the box contains (RAM + swap).

 The idea is to limit memory such that running out of RAM+swap is not
 possible, or unlikely. You can set the limit on the allowed number of
 processes as well.


$ ulimit -m
971876
$ dmesg | grep real\ mem
real mem  = 1039691776 (991MB)

So... this box should run only one process?

$ ps -auxww|wc
  54 7134936

If I were to use the max memory usage of each process, I would need a
53Gig ram machine?


-- 
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
This officer's men seem to follow him merely out of idle curiosity.
-- Sandhurst officer cadet evaluation.
Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted.  -- Gene Spafford
learn french:  http://www.youtube.com/watch?v=30v_g83VHK4



Re: fsck segfault on a big partition, 4.6

2010-01-27 Thread Ted Unangst
Obviously, as any competent sysadmin like nixlists knows, you should  
restrict all your processes to a max of 20 megs.


On Jan 27, 2010, at 9:23 PM, bofh goodb...@gmail.com wrote:


On Wed, Jan 27, 2010 at 8:14 PM, nixlists nixmli...@gmail.com wrote:

On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko
denis.doroshe...@gmail.com wrote:

aren't you missing the point of original comment made by Otto?

consider a situation, when all the processes in the system are
behaving, none of them violates their rlimits, but they all  
together

have allocated more memory than the box contains (RAM + swap).


The idea is to limit memory such that running out of RAM+swap is not
possible, or unlikely. You can set the limit on the allowed number of
processes as well.



$ ulimit -m
971876
$ dmesg | grep real\ mem
real mem  = 1039691776 (991MB)

So... this box should run only one process?

$ ps -auxww|wc
 54 7134936

If I were to use the max memory usage of each process, I would need a
53Gig ram machine?


--
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
This officer's men seem to follow him merely out of idle curiosity.
-- Sandhurst officer cadet evaluation.
Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted.  -- Gene Spafford
learn french:  http://www.youtube.com/watch?v=30v_g83VHK4




fsck segfault on a big partition, 4.6

2010-01-26 Thread Rob Sheldon
Hi,

So, the short version is that I have a server with OpenBSD 4.6 that can't
fsck its big partition; fsck fails with a segfault every time. If I ulimit
-d unlimited before fsck'ing, it just takes a little longer to segfault.
It produces no other output. IIRC, the partition is roughly 6 TB. Two
questions then: is there any way through this that doesn't involve
newfs'ing the partition, and is there a right way to do a partition of
that size in OpenBSD given fsck's 1G hard limit?

The longer version: this is a backup server running backuppc for a
corporate client (large enough number of workstations) that does research
work (some really big files). I _thought_ I had read the big filesystem
FAQ carefully, but somehow missed that fsck simply couldn't handle anything
over 1TB without doing funny things during the fs setup. So, this
particular partition was backuppc's data directory, and it was set up with
the default block sizes. Also possibly noteworthy: there's no swap, the OS
and other partitions are all running off of a USB flash drive for various
reasons.

If I have to wipe the partition and start over, it's not a disaster. This
was a newer server, the old backup server was still online and still had
some disk left, so I get to keep my butt out of a sling. But, if I'm going
to have to do that, then I also need to consider whether it might just be
better to use a different OS. (No foul intended, I'm a big fan of OpenBSD,
but it just might not be the right tool for this job.)

There's no dmesg attached because I'm not on-site with the server at the
moment, and because AFAICT this is a known problem.

Thanks,

- R.

-- 
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ You must be the change you wish to see in the world. -- Mahatma
Gandhi



Re: fsck segfault on a big partition, 4.6

2010-01-26 Thread L. V. Lammert
On Wed, 27 Jan 2010, Rob Sheldon wrote:

 Hi,

 So, the short version is that I have a server with OpenBSD 4.6 that can't
 fsck its big partition; fsck fails with a segfault every time. If I ulimit
 -d unlimited before fsck'ing, it just takes a little longer to segfault.
 It produces no other output. IIRC, the partition is roughly 6 TB. Two
 questions then: is there any way through this that doesn't involve
 newfs'ing the partition, and is there a right way to do a partition of
 that size in OpenBSD given fsck's 1G hard limit?

Don't know if this is related to a problem I had on a machine recently, ..
however I found that if I hung the 'bad' drive on ANOTHER machine, the
fsck ran just fine!

Might be worth a try, ..

Lee



Re: fsck segfault on a big partition, 4.6

2010-01-26 Thread Tobias Ulmer
On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote:
 Hi,
 
 So, the short version is that I have a server with OpenBSD 4.6 that can't
 fsck its big partition; fsck fails with a segfault every time. If I ulimit
 -d unlimited before fsck'ing, it just takes a little longer to segfault.
 It produces no other output. IIRC, the partition is roughly 6 TB. Two
 questions then: is there any way through this that doesn't involve
 newfs'ing the partition, and is there a right way to do a partition of
 that size in OpenBSD given fsck's 1G hard limit?

Amd64 allows 8G. Increase newfs blocksize to 64k (make sure you don't
run out of inodes), that should lessen the memory requirements a bit
and make fsck runs a little faster.

I have my doubts about OpenBSD as a (backup) file server with large
filesystems, there might be a more appropriate OS for the job.



Re: fsck segfault on a big partition, 4.6

2010-01-26 Thread Otto Moerbeek
On Wed, Jan 27, 2010 at 12:38:47AM +, Rob Sheldon wrote:

 Hi,
 
 So, the short version is that I have a server with OpenBSD 4.6 that can't
 fsck its big partition; fsck fails with a segfault every time. If I ulimit
 -d unlimited before fsck'ing, it just takes a little longer to segfault.
 It produces no other output. IIRC, the partition is roughly 6 TB. Two
 questions then: is there any way through this that doesn't involve
 newfs'ing the partition, and is there a right way to do a partition of
 that size in OpenBSD given fsck's 1G hard limit?

No, there is no other way. I've posted a small piece of code some time
ago that estimate the amount of mem needed for doing an fsck during newfs.

Therse days, amd64 is the only platform that increases the limit
(MAXDSIZE) to 8G. Though you venture into untested territory, we
(myself at least) just do not have the hardware to test anything
beyond 2T. 

 
 The longer version: this is a backup server running backuppc for a
 corporate client (large enough number of workstations) that does research
 work (some really big files). I _thought_ I had read the big filesystem
 FAQ carefully, but somehow missed that fsck simply couldn't handle anything
 over 1TB without doing funny things during the fs setup. So, this
 particular partition was backuppc's data directory, and it was set up with
 the default block sizes. Also possibly noteworthy: there's no swap, the OS
 and other partitions are all running off of a USB flash drive for various
 reasons.

The SEGVs may be related to not having swap. Running OpenBSD in
overcommitted state is not what you want. 

 
 If I have to wipe the partition and start over, it's not a disaster. This
 was a newer server, the old backup server was still online and still had
 some disk left, so I get to keep my butt out of a sling. But, if I'm going
 to have to do that, then I also need to consider whether it might just be
 better to use a different OS. (No foul intended, I'm a big fan of OpenBSD,
 but it just might not be the right tool for this job.)
 
 There's no dmesg attached because I'm not on-site with the server at the
 moment, and because AFAICT this is a known problem.

A pity, since it does matter what platform you run on. fsck needing a
lot of memory is indeed a known problem, but the SEGVs are not. You
might want to check if they still occur when you have enough swap.

-Otto