Re: [uml-devel] When /tmp is not tmpfs.

Rob Landley Sat, 26 Nov 2005 03:48:06 -0800

On Friday 25 November 2005 20:12, Nix wrote:
> If it's a problem you have both hostile users and no size limits on /tmp
> and you therefore have bigger problems anyway. :)


The size limits on /tmp aren't per-user.

> >> Yeah, true, if you think the OOM killer is worthwhile (I do: most of the
> >> MM hackers don't. I know who knows more about the Linux kernel's MM and
> >> it's not me!)
> >
> > Its euristics are crap (many cases breaking them), and the concept is
> > crap: damn hell, a C programmer has been taught to check that malloc()
> > can return NULL, not that he should patch a kernel to get a meaningful
> > behaviour.
>
> Yeah, but it does sort of work. Personally I prefer to just never run out
> of memory :)

My laptop has 512 megs of ram, and 700 megs of swap.  I'm running QEMU to boot 
a knoppix image with 256 megs of ram, running UML to build gcc 4 (which has a 
high water mark of disk usage somewhere north of 128 megs).  I have two 
konqueror windows open with an average of 30 tabs in each.  I have kmail open 
with a threaded view of linux-kernel with 69,649 messages in that folder.  
Plus the general overhead for kde, two standalone pdf viewers, several 
terminal windows, a partridge in a pear tree, and so on.

It's been a few weeks since I've triggered the OOM killer, but I've done it.

> > However, the idea of an OOM could be made to work, if you can kill an app
> > based on the derivative of its memory usage (i.e. how fast usage has
> > increased over the last moments).
>
> ... and the VM appears to be growing things that might help in that area :)

We get better as time goes on.

My original point was that the semantics of what UML wants is shared memory.  
It's trusting /tmp to provide different behavior than simply using ~, and 
this turns out to be a very unreliable assumption.  There is a directory 
(/dev/shm) whose entire definition is to provide those semantics, and 
shouldn't even _exist_ if it doesn't.  I believe that would be a better 
directory to use.

I can submit a patch for this.  It's arch/um/os-Linux/mem.c, line 37, in 
find_tempdir().

And while I'm at it, os-Linux/start_up.c has a check_tmpexec() that has "/tmp" 
hardwired into its messages, even if that's not what find_tempdir() 
returned...

> >> > Using /tmp for anything has been kind of discouraged for a while,
> >> > because throwing any insufficiently randomized filename in there is a
> >> > security hole waiting to happen.
> >>
> >> Um, atomically create a directory,
> >
> > DoS-able if filenames are predictable...
>
> ... with a random name, obviously. :)

Like "/tmp/uml.ctl" in arch/um/drivers/daemon_kern.c, line 70?

(It's not obvious where this file is actually created, it's one of those funky 
callback things where data in a structure is used somewhere else...)

> > Never seen anybody doing it, IIRC. Not even mkstemp() (even if today I
> > discover mkdtemp()).
>
> Oh. I do it all the time. I prefer not to work under the assumption that
> I'm more brilliant than thirty years of Unix hackers and spotted
> something none of them did, but so be it...

30 years ago the Unix hackers were working on a 16-bit PDP-11 with two RK05 
disk packs storing 2.5 megabytes each.  And the reason they duplicated /bin 
and /sbin and /lib under /usr is that they ran out of space on the root disk 
and had to leak the OS into the second disk pack which had previously held 
all the user home directories.  And people never revisited this decision for 
the next three decades, despite the fact the "needed for early boot" 
rationale was entirely a pragmatic thing of the moment, and makes _no_ sense 
on a modern system ever since the invention of the initial ramdisk, let alone 
initramfs.  I personally symlink /bin, /sbin, and /lib to the 
corresponding /usr directories and consolidate the whole mess, myself.  Yes, 
you have to patch gcc's paths (in collect2) to not search _both_ /lib 
and /usr/lib because if gnu's linker finds the same symbols in two different 
libraries it statically links them in rather than trying to figure out which 
one is right, resulting in executables as big as if they're statically linked 
but still refusing to run if they can't find their shared libraries at run 
time.  That's a bug in ld.

The point is, it's important to know _what_ conclusions the 30 years of unix 
hackers came to, but keep in mind that the computing environment of 2005 is 
in some ways very different from the computing environments of 1976 or 1984.

> > - back in 2.4, tmpfs on /tmp broke mkinitrd since it tried to loop-mount
> > the new initrd, which was in /tmp. And loop-mount over tmpfs didn't work.
>
> Ah, well, I never use initrd if I can avoid it, and a bug in one tool is
> a reason to *fix that tool*, not rejig teh whole damn system.

I agree initrd is kinda pointless, but initramfs isn't.  The kernel guys are 
moving towards initramfs being required someday.  These are still nebulous 
future plans with no actual deadline, but they include moving to dynamically 
assigned major/minor numbers (so you need something like udev to 
populate /dev), having userspace find and mount the real root partition (so 
when you're booting from a USB key but your root paritition lives on an NFS 
server that in order to access it you have to dhcp yourself an address, 
nslookup the server name, and then login with a public key from said USB 
stick...)  All the various partitioning schemes could be moved over to device 
mapper.  And so on.

They'd proposed a serious kernel crapectomy "for 2.7" back before 2.7 got put 
on indefinite hold.  How they're rolling it out now, we dunno.  They seem to 
be happy chewing their current mouthful, at the moment...

> (and `mount', of course, only lists mounts if you trust /proc/mounts to
> be accurate.

If the kernel doesn't know what's mounted, you have bigger problems.

> What does it look like in this brave new world of shared 
> subtrees?

I had this discussion on the kernel list a week or so back: namespaces are 
reference counted so as soon as the last process that can see a mount goes 
away, umount happens.  This means that umount -a should only zap everything 
in your current namespace, so that after init kills all sub-processes it can 
then run umount -a for pid 1, life is good.

I had this discussion because I wanted to make sure busybox umount would be 
doing it right.

> Obviously /etc/mtab *must* be a symlink to /proc/mounts, now, 
> only oops that breaks the quota tools...)

I rewrote busybox mount so that things work properly with /proc/mounts.  And I 
vaguely remember coming up with an in-house patch to fix the quota tools 
(they were upset by rootfs) something like four years ago.

> > (Btw, the problem was that he added a new external disk, but labeled it
> > /boot, like an existing /boot partition , so mount -a choked with
> > "duplicate label '/boot'" and it stopped before mounting /home).
>
> I think now is an appropriate time to say
>
> I HATE FSCKING MTAB
>
> (in three-part harmony, probably)

Everybody hates /etc/mtab.  It doesn't work if you chroot.  It can't handle 
--bind or --move mounts...  Just symlink it to /proc/mounts and recognize 
that any tool that can't handle that is a buggy tool that needs to be fixed.

> >> You've never used dar in infinint mode or watched large matrix maths
> >> stuff churn through to completion :/ there really are things with insane
> >> memory requirements and good locality of reference. (I think the most I
> >> ever saw dar eat was 15Gb of swap. *gah*)
> >
> > Boy, be serious - we are talking about normal systems, and you know that
> > you'd better run dar on properly sized systems...
>
> I still boggle that infinint mode is the default for that tool.

First time I've heard of the tool, but then back under 2.4.7 I remember I had 
rsync regularly triggering the OOM killer.  Not because rsync was leaking, 
but because the servers backing up only had 128 megs of memory and the 
balancing was _terrible_ so the dentry cache and page cache would squeeze out 
anonymous pages to the point where rsync itself got OOM killed...

People who want truly insane amounts of memory these days (often for graphics 
or video editing) tend to mmap their data files directly and work in there.  
Once again rendering insane amounts of swap less useful...

I'm under the vague impression there's some kind of madvise you can do that 
says "don't flush this before close unless you're responding to memory 
pressure".  Hmmm...  Closest I can find is MADV_RANDOM...

If we had a "treat this like it's on tmpfs" madvice, that would be ideal...

Rob
-- 
Steve Ballmer: Innovation!  Inigo Montoya: You keep using that word.
I do not think it means what you think it means.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Re: [uml-devel] When /tmp is not tmpfs.

Reply via email to