sed on
tradition, are more convincing.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo
benefits from late
allocation: one that creates a lot more virtual memory than it ever
touches. For example, a sparse array. Or am I missing something?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this l
efore c. 2000, this was
the only mode. Now the default is late allocation mode, which is similar
to Linux.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-
thinks it's entitled to and it
can reduce its memory footprint to improve its speed. It can even check
whether an access to readahead data caused a page fault; if so, it knows
reading ahead is actually making things worse and therefore reduce
readahead until the page faults stop happening.
as allocation maps and pointer blocks from mtime,
permissions, etc.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAI
the same time)?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
committing of the size change
without O_SYNC? That seems wrong to me.
This does need to be documented carefully, because a person could easily
believe, even subconsciously, that O_DIRECT makes the entire file write
direct, and sloppy documentation might actually use words t
>the
>maintainers of util-linux have well versed autotool people at their
disposal,
>so i really dont see this as being worrisome.
As long as that is true, I agree that the fact that so many autotool
packages don't work well is irrelevant.
However, I think the difficulty of using autotools (I
>i dont see how blaming autotools for other people's misuse is relevant
Here's how other people's misuse of the tool can be relevant to the choice
of the tool: some tools are easier to use right than others. Probably the
easiest thing to use right is the system you designed and built yourself.
anything
and then tell you it's there. Indeed, whether you use open/close or some
other kind of transaction, just pausing the application doesn't help. If
you were to implement open/close transactions, the filesystem driver would
just wait for the application to close and in the meant
pplication for a few seconds. During that time it delays
processing of service requests, but every request ultimately goes through,
with the requester probably not noticing any difference.
If a system claims that snapshot function in the filesystem alone gets you
consistent backups, it'
"single" update. Also, data and metadata updates
remain buffered at the kernel level after a close. And don't forget that
a single update may span multiple files.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
T
>md/raid already works happily with different sized drives from
>different manufacturers ...
>So I still cannot see anything particularly new.
As compared to md of conventional disk partitions, it brings the ability
to create and delete arrays without shutting down all use of the physical
disks
>If your only purpose is to try generate a defensive patent, then just
>dumping the idea in the public domain serves the same purpose, probably
>better.
>
>I have a few patents, some of which are defensive. That has not prevented
>the USPTO issuing quite a few patents that are in clear violation of
escribed, wildly successful, that means
users are willing to tolerate this level of breakage, so it could be used
for versioning too.
But I think I'd rather see a truly hidden directory for this (visible only
when looked up explicitly).
--
Bryan Henderson IBM Alm
w the dot files by default, having been
burned too many times by invisible files).
I assume NetApp flags the directory specially so that a POSIX directory
read doesn't get it. I've seen that done elsewhere.
The same thing, by the way, is possible with Jack's filename:version idea,
and
feeling that
I compromised in order not to involve the kernel.
Of course, if you want to do it with snapshots and COW, you'll have to ask
where in the kernel to put that, but that's not a file versioning
question; it's the larger snapshot question.
--
Bryan Henderson
keep one
of those for a week, and keep one of those for a month, etc. This works
even without snapshot technology and even without sub-file deltas. But of
course, it's better with those.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
bc's fread/fwrite, which
don't have partial transfers. GNU libc does handle partial (kernel) reads
and writes correctly. I'd be surprised if someone can name a major
application that doesn't.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
>> I don't get this. If you mean partitions defined by the classic DOS
>> partition table format, then AFAICS, such a partition can start in any
>> sector.
>
>Only at "logical cylinder boundary" (except for the first partition).
That's a requirement in ancient DOS systems that use CHS addre
>DOS partitions start partitions on odd-numbered sectors
I don't get this. If you mean partitions defined by the classic DOS
partition table format, then AFAICS, such a partition can start in any
sector.
>so presuming you have odd-aligned disks, life is good.
What is an odd-aligned disk?
-
T
>On Wed, Jan 10, 2007 at 09:38:11AM -0800, Bryan Henderson wrote:
>> >Other people are of the opinion that the invention of the symbolic
link
>> >was a huge mistake.
>>
>> I guess I haven't heard that one. What is the argument that we were
>> bet
>Other people are of the opinion that the invention of the symbolic link
>was a huge mistake.
I guess I haven't heard that one. What is the argument that we were
better off without symbolic links?
--
Bryan Henderson San Jose California
IBM Almaden Rese
is
explains why the diff optimization is so important.
--
Bryan Henderson San Jose California
IBM Almaden Research Center Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a messag
t would be
good to see what the positives are.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More
, in which case there's no practical way for the reading client to
know his cache is stale. When the updater and reader use the same client,
we can do better, but if I'm not mistaken, the NFS protocol does not
require us to do so. And probably more relevant: the user wouldn't exp
ble inode number might fall in that
category.
I fully agree that much effort should be put into making inode numbers
work the way POSIX demands, but I also know that that sometimes requires
more than just writing some code.
--
Bryan Henderson San Jose Californi
>On Fri, 2006-12-29 at 10:08 -0800, Bryan Henderson wrote:
>> >On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote:
>> >> >Statement 1:
>> >> >If two files have identical st_dev and st_ino, they MUST be
hardlinks
>> of
>> >> >
>On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote:
>> >Statement 1:
>> >If two files have identical st_dev and st_ino, they MUST be hardlinks
of
>> >each other/the same file.
>> >
>> >Statement 2:
>> >If two "files" are a
st insignificant. And I think
infinitesimally small must mean infinite.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to
to impractical.
People tend to demand that restore programs faithfully restore what was
backed up. (I've even seen requirements that the inode numbers upon
restore be the same). Given the difficulty of dealing with multi-linked
files, not to mention various nonstandard file at
he object you're looking for, but often can't be
used to _find_ that object. For the latter, you need an address. There
lots of examples where you can't practically use the same value for both
an identifier and an address.
--
Bryan Henderson IBM A
s long as the filesystems
are mounted, etc.).
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
Mor
mple policy change might solve the problem as well as the more complex
approach of getting an individual filesystem driver more involved in
memory management.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe
u additional data transfer capacity.
I just don't see the panacea so far.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body o
the on-line recovery," so maybe you're
talking about something different. I would think that if you fail the
only branch that has current data on it (the "newer branch"?) that
recovery would be pretty much over.
--
Bryan Henderson IBM Almaden Research Cent
cal filesystems,
members of a RAIF set. I change one file. One file in each member
filesystem gets updated, and I again have two identical filesystems.
How would a cloneset work differently, and how would it be better?
>This type of logic is great for backups.
Can you give an example of
hem unreliable (or unscalable) in the same ways as an NFS
server.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
>Have you looked at how we're dealing with this in NFSv4?
No.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a
r open when the
filesystem driver has said the entire operation is complete.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the bod
er tidbit of information I just verified: umount of "."
unmounts from the top of the stack, as opposed to unmounting the stuff you
would see if you did "ls .". So this is all consistent.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
possible with one of the two behaviors we've been discussing, I'd say
that "." stands for the name by which you looked up that directory in the
first place (so in this case, it's equivalent to mount ... /mnt). And
that means I would expect the new mount to obscure the a
ctory that's already been
mounted over (such that the stacking behavior is relevant). That seems
really eccentric.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "uns
t into the stack,
since you have no way to refer to the covered directory -- it's no longer
in the namespace.
I have no idea if that clarifies the shared subtree dilemma, but you ask
if there's any pressing need for the current behavior, and I would have to
say no, beca
>On Thu, 28 Jul 2005, Andrew Morton wrote:
>> Martin Jambor <[EMAIL PROTECTED]> wrote:
>> >
>> > Do filesystems try to relocate the data from bad blocks of the
>> > device?
>
>Only Windows NTFS, not others AFAIK (most filesytems can mark them during
>mkfs, that's all).
>
>> Nope. Disks will do tha
>I don't see how the following is tortured:
>
>enum {
> PNODE_MEMBER_VFS = 0x01,
> PNODE_SLAVE_VFS = 0x02
>};
Only because it's using a facility that's supposed to be for enumerated
types for something that isn't. If it were a true enumerated type, the
codes for the enumeration
>If it's really enumerated data types, that's fine, but this example was
>about bitfield masks.
Ah. In that case, enum is a pretty tortured way to declare it, though it
does have the practical advantages over define that have been mentioned
because the syntax is more rigorous.
The proper way
I wasn't aware anyone preferred defines to enums for declaring enumerated
data types. The practical advantages of enums are slight, but as far as I
know, the practical advantages of defines are zero. Isn't the only
argument for defines, "that's what I'm used to."?
Two advantages of the enum d
s
have their simple U-G-0 and the more creative ones do something more
complex.
I'm not opposed, by the way, to an implementation that just does U-G-O (or
even just U) if it's done in a way amenable to future extension.
--
Bryan Henderson IBM Almaden Research
, it's
technically possible for two processes to see different files as the same
name, if one opened the directory before a mount and the other after.
"Mounting over" is a curse.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
thname for the
lifetime of the mount. But as between multiple processes on the same
system at the same time, yeah, the directory has one name.
(statements above have to be modified for chroot, btw).
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
ed systems don't even use them for file permissions.
So I hesitate to tie anything else to them.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux
multiple processes, with or without the same uid, can see user-mounted
files if they want.
- a process can opt not to see user-mounted files, even if it has the
same uid as processes that do.
I'm not saying how I would implement this; there's enough discussion over
the desired r
ould take a
major new concept to have a different kind of group of processes for
namespace purposes, and then we probably wouldn't want to base it on uid,
because uid means other things already. Why tie them together?
--
Bryan Henderson IBM Almaden Research Cente
ge
without ever mounting it, and access a file in it without ever adding it
to the master file namespace.
>bringing up 2) completely in the userspace.
That part's another issue. The user-controls-his-namespace aspect of it
has been commented on at length in this and another current thread
de. I know these
are trivial connections, because I work around them by supplying
a dummy inode (and sometimes a dummy superblock) with a few
fields filled in.
(Incidentally, _I_ am actually using address spaces for file caches; I
just can't tie them to the files in the t
ernel parsing code small
I personally almost never worry about the number of bytes of code, but I
worry a lot about its simplicity. User space code is less costly to
develop and less risky to make a mistake in. I would add,
3) Keeping the kernel parsing code simple
ed the errno
issue, I wanted to comment on that one independently.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a mess
that are dependent on local word size and endianness. Lots of them do.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
th
ns as for Reiserfs, of course.
To be really exact, it's OK for the blocks to move, as long as it doesn't
do so so subtly that the user doesn't know to rerun the LILO installer.
E.g. you can move the blocks of the kernel file if someone overwrites it.
--
Bryan Henderson
options to
be private to one particular user space program. Especially one that
isn't even packaged with the driver.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line &q
courage filesystem writers to do such stupid
>things as ncfps/smbfs do. In fact I'm totally unhappy thay nfs4 went
>down that road.
Which road is that?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsub
>Make a ->compat_read_super() just like we have a ->compat_ioctl()
>method for files, if you want to suggest a solution like what
>you describe.
Even better.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA F
of which have an
argument which is the address of a block of memory that contains other
addresses. fs/compat.c approaches these in a more
filesystem-type-independent way than it does mount(), but still not
independent enough.
--
Bryan Henderson IBM Almaden R
ocks used, blocks free, inodes used, inodes
free. They make sense for the original Unix File System, but get harder
to give meaning with every new generation.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubs
u can consider
at this layer for getting at file data is VFS ->read.
--
Bryan Henderson San Jose California
IBM Almaden Research Center Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the
;via inodes." I don't know what "via blocks" means.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a me
re in more eclectic places.
>the NFS client itself had to defer actually
>putting reads on the wire until someone requested the lock
But really, you mean the client had to defer putting reads on the wire
until someone was ready to use the data. That suggests a call to
->sync_page in fil
>what it
>*really* means to be called in sync_page() is that you're being told
>that some process is about to block on that page. For what reason, you
>can't know from the call alone.
Ugh. IOW it barely means anything.
--
Bryan Henderson IBM Almad
appropriate to do given that information.
I agree that for the conventional filesystem and device types for which
this interface was designed, the appropriate response would be to start
any queued I/O.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
I forgot you were talking about code inside the kernel. In that case,
filemap_sync().
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
>Is there an existing interface to force it to check if the page is
>dirty
The msync() system call and libc function does that. And then it does the
same thing as fsync().
--
Bryan Henderson IBM Almaden Research Center
San J
>well, we *could* know ... never map this page writable. have a per-vma
>flag that says "emulate writes", and call the filesystem to update
>backing storage before returning to the application.
Ah yes, you mean, I take it, that the page fault handler would look at the
user's program and emulate
t_page_dirty.
Without knowing what properties of not having a cache you were hoping for,
I couldn't say what alternative would be closest to this.
Hypothetically, if you had a backing storage device that could do memory
mapped I/O, you could have mmapped direct I/O.
--
Bryan Henderson
>The problem appears to be mixing calls to lseek64 with calls to fread
>and fwrite.
Oh, of course. I didn't see that. You can't use the file descriptor of a
file that is opened as a stream. This test case uses the fileno()
function to mess with the internals of the stream.
fseeko64() is the
>I found
>that for larger values, your test program is returning -1, but unsigned
>it appears as 18446744073709551615.
You mean you ran it? Then what about the more interesting question of
what your filesize ends up to be? You say JFS allows files up to 2**52
bytes, so I expect the test case w
e allowed by the filesystem in some cases, so
the write isn't happening, which means you should get a failure return
code.
In the results you showed, the filesize ends up being a little less than
2^48, which is not a place that you wrote ever.
--
Bryan Henderson IBM Al
>Sounds reasonable. The thing with "reservation" is that people use
>it in daily life with all kinds of meanings,
That's the way it is all over. Normal people are very sloppy in their
language. Engineers have to try to narrow the meanings of the common
words to avoid totally confusing each oth
es, while "reserve" just means to make arrangements so
that a future allocate will succeed. For example, if you know you need up
to 10 blocks of memory to complete a task without deadlocking, but you
don't know yet how exactly how many, you would reserve 10 blocks and
later,
>>>118 total. When I attempt to mount the 57th one, I
>>>get "Too many mounted Filesystems"
>>
>>
>> Sorry I don't know what the limitations are for non-anonymous
filesystems.
>> 57 seems a bit unusual though.
>
>Is that the exact error message?
>Can you post the kernel message log with that mes
>I cannot seem to increase the maximum number of
>filesystems on my Red hat system...
What is your evidence of the maximum that you can't increase? (E.g. does
something fail? How?)
--
Bryan Henderson San Jose California
IBM Almaden Rese
them or what the filesystem does with them.
In particular, there's no reason to give up the character stream notion of
a file and start talking about blocks just to have visible cleared regions
(holes).
--
Bryan Henderson IBM Almaden Resear
ending development effort on exploiting file sparseness,
I'd rather see it spent implementing a clear (aka punch) system call
first. Or has that been done when I wasn't looking?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
e) cause lower throughput in the
non-writepages case (it seems more likely that the lower throughput causes
the smaller I/Os).
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the lin
and syncs to a single file?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info
/O above the I/O scheduler as
inside it.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More m
ys sending I/O to
the device even though the device is idle) going on?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a
he I/O
scheduler not building large I/Os out of small requests? Is the queue
running dry while the device is actually busy?
--
Bryan Henderson San Jose California
IBM Almaden Research Center Filesystems
-
To unsubscribe from this list: send the line "
>Or actually we wouldn't
>even care if stale pages are added as they would still be cleared in
>readpage(). And pages found and uptodate and locked simply need to be
>marked dirty and released again and if not uptodate they need to be
>cleared first.
You do need some form of locking to make sure
arge-block filesystem driver does the nopage thing, and does in fact
fill in files unnecessarily in this scenario. :-( The driver for the
same filesystems on AIX does not, though. It has the write protection
thing.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA
he and generic writer).
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
sec/swsec/srkB/s wkB/s
avgrq-sz avgqu-sz await >svctm %util
>sdc 0.00 1110.58 0.00 97.800.00 201821.96 0.00
100910.98 2063.53 117.09 1054.11 >10.21 99.84
>
>So, in this case I think it is making a difference 1k merges and a big
difference in
>t
ny natural way), you need to throw out not
only the generic file read/write routines, but the page cache as well.
Every time I've looked at multi-page bios, I've been unable to see any
reason that they would be faster than multiple single-page bios. But I
haven't seen any experim
very filesystem driver having its own code.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
A filesystem driver is supposed to be able to use the page cache for file
caching without involving the buffer cache, isn't it? I can't find any
examples of it, but I heard that was the case.
I had a filesystem driver doing just that with Linux 2.4.2, but now the
interface it used is no longe
I posted this earlier, but it was right at the time that linux-fsdevel
got swamped with a linux-kernel discussion, so I don't think anyone saw
it.
I have discovered, looking at Linux 2.4.2, that the read-only status of a
mount is considered in some places to be a matter of file permissions, and
>What we ought to do in 2.5.early (possibly - in 2.4) is to
>add ->max_page to address_space. I.e. ->i_size in pages
I don't get it. What would address_space.max_page mean and how would you
use it? Obviously, you don't really mean for it to be defined as
inode.i_size in pages, since then it wo
>IMO preemptive kernel patches are an
>exercise in masturbation (bad algorithm that can be preempted at any
point
>is still a bad algorithm and should be fixed, not hidden)
What does this mean? What is a preemptive kernel patch and what kind of
bad algorithm are you contemplating, and what doe
Bryan:
>> introduced, it was not practical to update every filesystem driver, so
the
>> Big Kernel Lock (BKL) was added to give those drivers the uninterrupted
>> access they (may) expect. You may surmise that a "lookup" routine
doesn't
>> need such uninterrupted access, but you can never r
I think what may have gotten lost in Alexander's detailed reply is the big
picture on the BKL in VFS. The issue of the BLK protecting ->lookup is
the same as for every other VFS call:
A whole bunch of filesystem drivers were designed in a time when there
could be only one CPU, and coupled wit
1 - 100 of 109 matches
Mail list logo