Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Toby Thain


On 1-Aug-06, at 4:15 AM, Jeffrey V. Merkey wrote:


...I was and have remained loyal to Linux through it all.


Except for that little fling with SCO, eh?

Off topic, but no more so than your self-aggrandising.

--T



Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Jan Engelhardt

 A filesystem with a fixed number of inodes (= not readjustable while
 mounted) is ehr.. somewhat unuseable for a lot of people with
 big and *flexible* storage needs (Talking about NetApp/EMC owners)

Which is untrue at least for Solaris, which allows resizing a life file
system. FreeBSD and


Linux require an unmount.

Only for shrinking.


Jan Engelhardt
-- 


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Theodore Tso
On Mon, Jul 31, 2006 at 09:41:02PM -0700, David Lang wrote:
 just becouse you have redundancy doesn't mean that your data is idle enough 
 for you to run a repacker with your spare cycles. to run a repacker you 
 need a time when the chunk of the filesystem that you are repacking is not 
 being accessed or written to. it doesn't matter if that data lives on one 
 disk or 9 disks all mirroring the same data, you can't just break off 1 of 
 the copies and repack that becouse by the time you finish it won't match 
 the live drives anymore.
 
 database servers have a repacker (vaccum), and they are under tremendous 
 preasure from their users to avoid having to use it becouse of the 
 performance hit that it generates. (the theory in the past is exactly what 
 was presented in this thread, make things run faster most of the time and 
 accept the performance hit when you repack). the trend seems to be for a 
 repacker thread that runs continuously, causing a small impact all the time 
 (that can be calculated into the capacity planning) instead of a large 
 impact once in a while.

Ah, but as soon as the repacker thread runs continuously, then you
lose all or most of the claimed advantage of wandering logs.
Specifically, the claim of the wandering log is that you don't have
to write your data twice --- once to the log, and once to the final
location on disk (whereas with ext3 you end up having to do double
writes).  But if the repacker is running continuously, you end up
doing double writes anyway, as the repacker moves things from a
location that is convenient for the log, to a location which is
efficient for reading.  Worse yet, if the repacker is moving disk
blocks or objects which are no longer in cache, it may end up having
to read objects in before writing them to a final location on disk.
So instead of a write-write overhead, you end up with a
write-read-write overhead.

But of course, people tend to disable the repacker when doing
benchmarks because they're trying to play the my filesystem/database
has bigger performance numbers than yours game

- Ted


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Avi Kivity

Theodore Tso wrote:


Ah, but as soon as the repacker thread runs continuously, then you
lose all or most of the claimed advantage of wandering logs.
Specifically, the claim of the wandering log is that you don't have
to write your data twice --- once to the log, and once to the final
location on disk (whereas with ext3 you end up having to do double
writes).  But if the repacker is running continuously, you end up
doing double writes anyway, as the repacker moves things from a
location that is convenient for the log, to a location which is
efficient for reading.  Worse yet, if the repacker is moving disk
blocks or objects which are no longer in cache, it may end up having
to read objects in before writing them to a final location on disk.
So instead of a write-write overhead, you end up with a
write-read-write overhead.



There's no reason to repack *all* of the data.  Many workloads write and 
delete whole files, so file data should be contiguous.  The repacker 
would only need to move metadata and small files.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.



Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Adrian Ulrich

 So ZFS isn't state-of-the-art?

Of course it's state-of-the-art (on Solaris ;-) )

 
 WAFL is for high-turnover filesystems on RAID-5 (and assumes flash memory
 staging areas). 

 s/RAID-5/RAID-4/

 Not your run-of-the-mill desktop...

The WAFL-Thing was just a joke ;-)


Regards,
 Adrian


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Adrian Ulrich
 suspect, particularly with 7200/min (s)ATA crap. 

Quoting myself (again):
 A quick'n'dirty ZFS-vs-UFS-vs-Reiser3-vs-Reiser4-vs-Ext3 'benchmark'

Yeah, the test ran on a single SATA-Harddisk (quick'n'dirty).
I'm so sorry but i don't have access to a $$$ Raid-System at home. 

Anyway: The test shows us that Reiser4 performed very good on my
(common 0-8-15) hardware.


 sdparm --clear=WCE /dev/sda   # please.

How about using /dev/emcpower* for the next benchmark?

I mighty be able to re-run it in a few weeks if people are interested
and if i receive constructive suggestions (= Postmark parameters,
mkfs options, etc..)


Regards,
 Adrian



Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-01 Thread Christian Trefzer
On Mon, Jul 31, 2006 at 10:57:35AM -0500, David Masover wrote:

 Wil Reichert wrote:

 Any idea how the fragmentation resulting from re-syncing the tree
 affects performance over time?
 
 Yes, it does affect it a lot.  I have no idea how much, and I've never 
 benchmarked it, but purely subjectively, my portage has gotten slower 
 over time.

Delayed allocation still performs a lot better here than the v3
immediate allocation. In addition, tree balancing operations are
performed on flush as well, so what you get on disk is basically an
almost-optimal tree. Of course, this will change a bit over time, but
with v4 it takes a lot longer for that to happen than with v3 afaict.
There _has_ been some worthwile development in the meantime : )

Kind regards,
Chris


pgpYfJOp8uyxN.pgp
Description: PGP signature


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Matthias Andree
Adrian Ulrich schrieb am 2006-08-01:

  suspect, particularly with 7200/min (s)ATA crap. 
 
 Quoting myself (again):
  A quick'n'dirty ZFS-vs-UFS-vs-Reiser3-vs-Reiser4-vs-Ext3 'benchmark'
 
 Yeah, the test ran on a single SATA-Harddisk (quick'n'dirty).
 I'm so sorry but i don't have access to a $$$ Raid-System at home. 

I'm not asking for you to perform testing on a  RAID system with
SCSI or SAS, but I consider the obtained data (I am focussing on
transactions per unit of time) highly suspicious, and suspect write
caches might have contributed their share - I haven't seen a drive that
shipped with write cache disabled in the past years.

  sdparm --clear=WCE /dev/sda   # please.
 
 How about using /dev/emcpower* for the next benchmark?

No, it is valid to run the test on commodity hardware, but if you (or
the benchmark rather) is claiming transactions, I tend to think
ACID, and I highly doubt any 200 GB SATA drive manages 3000
synchronous writes per second without causing either serious
fragmentation or background block moving.

This is a figure I'd expect for synchronous random access to RAM disks
that have no seek and rotational latencies (and research for hybrid
disks w/ flash or other nonvolatile fast random access media to cache
actual rotating magnetic plattern access is going on elsewhere).

I didn't mean to say your particular drive were crap, but 200GB SATA
drives are low end, like it or not -- still, I have one in my home
computer because these Samsung SP2004C are so nicely quiet.

 I mighty be able to re-run it in a few weeks if people are interested
 and if i receive constructive suggestions (= Postmark parameters,
 mkfs options, etc..)

I don't know Postmark, I did suggest to turn the write cache off. If
your systems uses hdparm -W0 /dev/sda instead, go ahead. But you're
right to collect and evaluate suggestions first if you don't want to run
a new benchmark every day :)

-- 
Matthias Andree


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread Hans Reiser
Andrew Morton wrote:

On Mon, 31 Jul 2006 10:26:55 +0100
Denis Vlasenko [EMAIL PROTECTED] wrote:

  

The reiser4 thread seem to be longer than usual.



Meanwhile here's poor old me trying to find another four hours to finish
reviewing the thing.
  

Thanks Andrew.

The writeout code is ugly, although that's largely due to a mismatch between
what reiser4 wants to do and what the VFS/MM expects it to do.

I agree --- both with it being ugly, and that being part of why.

  If it
works, we can live with it, although perhaps the VFS could be made smarter.
  

I would be curious regarding any ideas on that.  Next time I read
through that code, I will keep in mind that you are open to making VFS
changes if it improves things, and I will try to get clever somehow and
send it by you.  Our squalloc code though is I must say the most
complicated and ugliest piece of code I ever worked on for which every
cumulative ugliness had a substantive performance advantage requiring us
to keep it.  If you spare yourself from reading that, it is
understandable to do so.

I'd say that resier4's major problem is the lack of xattrs, acls and
direct-io.  That's likely to significantly limit its vendor uptake.  (As
might the copyright assignment thing, but is that a kernel.org concern?)
  

Thanks to you and the batch write code, direct io support will now be
much easier to code, and it probably will get coded the soonest of those
features.  acls are on the todo list, but doing them right might require
solving a few additional issues (finishing the inheritance code, etc.)

The plugins appear to be wildly misnamed - they're just an internal
abstraction layer which permits later feature additions to be added in a
clean and safe manner.  Certainly not worth all this fuss.

Could I suggest that further technical critiques of reiser4 include a
file-and-line reference?  That should ease the load on vger.

Thanks.


  




Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Matthias Andree
On Tue, 01 Aug 2006, Avi Kivity wrote:

 There's no reason to repack *all* of the data.  Many workloads write and 
 delete whole files, so file data should be contiguous.  The repacker 
 would only need to move metadata and small files.

Move small files? What for?

Even if it is only moving metadata, it is not different from what ext3
or xfs are doing today (rewriting metadata from the intent log or block
journal to the final location).

The UFS+softupdates from the BSD world looks pretty good at avoiding
unnecessary writes (at the expense of a long-running but nice background
fsck after a crash, which is however easy on the I/O as of recent FreeBSD
versions).  Which was their main point against logging/journaling BTW,
but they are porting XFS as well to save those that need instant
complete recovery.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Hans Reiser
Matthias Andree wrote:


Have you ever seen VxFS or WAFL in action?



No I haven't. As long as they are commercial, it's not likely that I
will.
  

WAFL was well done.   It has several innovations that I admire,
including quota trees, non-support of fragments for performance reasons,
and the basic WAFL notion applied to an NFS RAID special (though
important) case.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Avi Kivity

Matthias Andree wrote:


On Tue, 01 Aug 2006, Avi Kivity wrote:

 There's no reason to repack *all* of the data.  Many workloads write 
and

 delete whole files, so file data should be contiguous.  The repacker
 would only need to move metadata and small files.

Move small files? What for?



WAFL-style filesystems like contiguous space,  so if small files are 
scattered in otherwise free space, the repacker should free them.



Even if it is only moving metadata, it is not different from what ext3
or xfs are doing today (rewriting metadata from the intent log or block
journal to the final location).



There is no need to repack all metadata; only that which helps in 
creating free space.


For example: if you untar a source tree you'd get mixed metadata and 
small file data packed together, but there's no need to repack that data.



--
error compiling committee.c: too many arguments to function



Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Hans Reiser
Theodore Tso wrote:

On Mon, Jul 31, 2006 at 09:41:02PM -0700, David Lang wrote:
  

just becouse you have redundancy doesn't mean that your data is idle enough 
for you to run a repacker with your spare cycles. to run a repacker you 
need a time when the chunk of the filesystem that you are repacking is not 
being accessed or written to. it doesn't matter if that data lives on one 
disk or 9 disks all mirroring the same data, you can't just break off 1 of 
the copies and repack that becouse by the time you finish it won't match 
the live drives anymore.

database servers have a repacker (vaccum), and they are under tremendous 
preasure from their users to avoid having to use it becouse of the 
performance hit that it generates. (the theory in the past is exactly what 
was presented in this thread, make things run faster most of the time and 
accept the performance hit when you repack). the trend seems to be for a 
repacker thread that runs continuously, causing a small impact all the time 
(that can be calculated into the capacity planning) instead of a large 
impact once in a while.



Ah, but as soon as the repacker thread runs continuously, then you
lose all or most of the claimed advantage of wandering logs.
  

Wandering logs is a term specific to reiser4, and I think you are making
a more general remark.

You are missing the implications of the oft-cited statistic that 80% of
files never or rarely move.   You are also missing the implications of
the repacker being able to do larger IOs than occur for a random tiny IO
workload which is impacting a filesystem that is performing allocations
on the fly.

Specifically, the claim of the wandering log is that you don't have
to write your data twice --- once to the log, and once to the final
location on disk (whereas with ext3 you end up having to do double
writes).  But if the repacker is running continuously, you end up
doing double writes anyway, as the repacker moves things from a
location that is convenient for the log, to a location which is
efficient for reading.  Worse yet, if the repacker is moving disk
blocks or objects which are no longer in cache, it may end up having
to read objects in before writing them to a final location on disk.
So instead of a write-write overhead, you end up with a
write-read-write overhead.

But of course, people tend to disable the repacker when doing
benchmarks because they're trying to play the my filesystem/database
has bigger performance numbers than yours game
  

When the repacker is done, we will just for you run one of our
benchmarks the morning after the repacker is run (and reference this
email);-)  that was what you wanted us to do to address your
concern, yes?;-)

   - Ted


  




Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Avi Kivity

Matthias Andree wrote:


No, it is valid to run the test on commodity hardware, but if you (or
the benchmark rather) is claiming transactions, I tend to think
ACID, and I highly doubt any 200 GB SATA drive manages 3000
synchronous writes per second without causing either serious
fragmentation or background block moving.

You are assuming 1 transaction = 1 sync write.  That's not true.  
Databases and log filesystems can get much more out of a disk write.



--
error compiling committee.c: too many arguments to function



Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Helge Hafting
On Mon, Jul 31, 2006 at 05:59:58PM +0200, Adrian Ulrich wrote:
 Hello Matthias,
 
  This looks rather like an education issue rather than a technical limit.
 
 We aren't talking about the same issue: I was asking to do it
 on-the-fly. Umounting the filesystem, running e2fsck and resize2fs
 is something different ;-)
 
  Which is untrue at least for Solaris, which allows resizing a life file
  system. FreeBSD and Linux require an unmount.
 
 Correct: You can add more inodes to a Solaris UFS on-the-fly if you are
 lucky enough to have some free space available.
 
 A colleague of mine happened to create a ~300gb filesystem and started
 to migrate Mailboxes (Maildir-style format = many small files (1-3kb))
 to the new LUN. At about 70% the filesystem ran out of inodes; Not a
 big deal with VxFS because such a problem is fixable within seconds.
 What would have happened if he had used UFS? mkfs -G wouldn't work
 because he had no additional Diskspace left... *ouch*..
 
This case is solvable by planning.  When you know that the new fs
must be created with all inodes from the start, simply count
how many you need before migration.  (And add a decent safety margin.)
That's what I do with my home machine ask disks wear out every third 
year or so. The tools for ext2/3 tells how many inodes are in use,
and the new fs can be made accordingly.  The approach works for bigger
machines too of course.

Helge Hafting



Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Jan Engelhardt

I didn't mean to say your particular drive were crap, but 200GB SATA
drives are low end, like it or not --

And you think an 18 GB SCSI disk just does it better because it's SCSI?
Esp. in long sequential reads.


Jan Engelhardt
-- 


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Jan Engelhardt

Wandering logs is a term specific to reiser4, and I think you are making
a more general remark.

So, what is UDF's wandering log then?



Jan Engelhardt
-- 


Re: reiser4 can now bear with filled fs, looks stable to me...

2006-08-01 Thread Maciej Sołtysiak
Hello David,

Monday, July 31, 2006, 11:46:34 PM, you wrote:
 You must be new here...
;-)

I wanted to point out that because:
 Options B and C are all that ever seems to happen when reiserfs-list and
 lkml collide.

and:
   The speed of a nonworking program is irrelevant.
   The cost-effectiveness of an impossible solution is irrelevant.

maybe the more important thing is to allow people use r4 on their own
(rpms, debs, apt/gentoo/repositories, etc.) better, than to push that hard for 
kernel inclusion.

Currently the r4 patch is very easy to apply, you can apply it on top
of heavily patched kernels with no or little fuzz, which is very good.

But, as Hans wrote earlier, not every user knows how to patch, but
in ubuntu for example it is fairly easy (and encouraged by the official
forums/wikis) for those users to add additional repositories using synaptic
or adept or editing /etc/apt/sources.list.

I mean, there were huge objections against FUSE too, remember? But Miklos
built a steady and growing userbase.

Maybe that is something to realize, Hans, we don't need kernel inclusion to
have a growing userbase. (or atleast a steady one)

A side note: the only time we had not reiserfs-list and lkml collide (that much)
was when Andrew Morton was commenting and when Cristoph made a list of
things to phix - that cost people some nerve but it nevertheles was
more productive than the usual flamewars.

-- 
Best regards,
Maciej




Re: reiser4: maybe just fix bugs?

2006-08-01 Thread Vladimir V. Saveliev
Hello

On Mon, 2006-07-31 at 20:18 -0600, Hans Reiser wrote:
 Andrew Morton wrote:

 The writeout code is ugly, although that's largely due to a mismatch between
 what reiser4 wants to do and what the VFS/MM expects it to do.

Yes. reiser4 writeouts atoms. Most of pages get into atoms via
sys_write. But pages dirtied via shared mapping do not. They get into
atoms in reiser4's writepages address space operation. That is why
reiser4_sync_inodes has two steps: on first one it calls
generic_sync_sb_inodes to call writepages for dirty inodes to capture
pages dirtied via shared mapping into atoms. Second step flushes atoms.

 
 I agree --- both with it being ugly, and that being part of why.
 
   If it
 works, we can live with it, although perhaps the VFS could be made smarter.
   
 
 I would be curious regarding any ideas on that.  Next time I read
 through that code, I will keep in mind that you are open to making VFS
 changes if it improves things, and I will try to get clever somehow and
 send it by you.  Our squalloc code though is I must say the most
 complicated and ugliest piece of code I ever worked on for which every
 cumulative ugliness had a substantive performance advantage requiring us
 to keep it.  If you spare yourself from reading that, it is
 understandable to do so.
 
 I'd say that resier4's major problem is the lack of xattrs, acls and
 direct-io.  That's likely to significantly limit its vendor uptake. 

xattrs is really a problem.





Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-01 Thread Christian Trefzer
On Mon, Jul 31, 2006 at 06:05:01PM +0200, Łukasz Mierzwa wrote:

 I gues that extens are much harder to reuse then normal inodes so when You  
 have something as big as portage tree filled with nano files wich are  
 being modified all the time then You just can't keep performance all the  
 time. You can always tar, rm -fr /usr/portage, untar and You will probably  
 speed things up a lot.

I submitted a script to this list which takes care of everything
required to recreate your fs. It even converts between different
filesystems, for migration purposes or comparitive tests, and currently
supports ext2|3, reiser3|4 and xfs.

The thing is undergoing some surgery atm. to reduce forced disk flushes.
I already replaced the call to sync() after every operation by one
fsync() call on the archive file before the formatting takes place. What
is still missing is functionality to retrieve things like fs label and
UUID from the existing fs and reuse them during mkfs. Testing is also
pending, so you might not want to hold your breath waiting for the funky
version, the idea of which is to leave everything as it was found,
except for better disk layout and possibly changed fs type.

It is a completely different approach from convertfs, which tries to do
the conversion in-place by moving the fs's contents into a new fs
created within a sparse file on the same device and relocating the
sparse file's blocks afterwards. My guess is that a failure of any kind
in the latter process will destroy your data (this was the case last
time I checked), while I do (at least try) everything to ensure that the
tarball is written to platters before mkfs occurs.

The new version will be posted to wiki.namesys.com asap, no timeframe
attached though as Thursday yields an exam, so maybe on Friday, but who
knows. The version already posted to the list works well, I used it at
least a hundred times, even on stuff like /home and /usr (the latter
works only from a live cd or custom initramfs).

Kind regards,
Chris


pgpJCvRI7J8nt.pgp
Description: PGP signature


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Matthias Andree
Jan Engelhardt schrieb am 2006-08-01:

 I didn't mean to say your particular drive were crap, but 200GB SATA
 drives are low end, like it or not --
 
 And you think an 18 GB SCSI disk just does it better because it's SCSI?

18 GB SCSI disks are 1999 gear, so who cares?
Seagate didn't sell 200 GB SATA drives at that time.

 Esp. in long sequential reads.

You think SCSI drives aren't on par? Right, they're ahead.
98 MB/s for the fastest SCSI drives vs. 88 MB/s for Raptor 150 GB SATA
and 74 MB/s for the fastest other ATA drives.

(Figures obtained from StorageReview.com's Performance Database.)

-- 
Matthias Andree


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Jan Engelhardt
 I didn't mean to say your particular drive were crap, but 200GB SATA
 drives are low end, like it or not --
 
 And you think an 18 GB SCSI disk just does it better because it's SCSI?

18 GB SCSI disks are 1999 gear, so who cares?
Seagate didn't sell 200 GB SATA drives at that time.

 Esp. in long sequential reads.

You think SCSI drives aren't on par? Right, they're ahead.
98 MB/s for the fastest SCSI drives vs. 88 MB/s for Raptor 150 GB SATA
and 74 MB/s for the fastest other ATA drives.

Uhuh. And how do they measure that? Did they actually ran sth like...
  dd_rescue /dev/hda /dev/null




Jan Engelhardt
-- 


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Horst H. von Brand
Bernd Schubert [EMAIL PROTECTED] wrote:
 On Monday 31 July 2006 21:29, Jan-Benedict Glaw wrote:
  The point is that it's quite hard to really fuck up ext{2,3} with only
  some KB being written while it seems (due to the
  fragile^Wsophisticated on-disk data structures) that it's just easy to
  kill a reiser3 filesystem.

 Well, I was once very 'luckily' and after a system crash (*) e2fsck put
 all files into lost+found. Sure, I never experienced this again, but I
 also never experienced something like this with reiserfs. So please, stop
 this kind of FUD against reiser3.6.

It isn't FUD. One data point doesn't allow you to draw conclusions.

Yes, I've seen/heard of ext2/ext3 failures and data loss too. But at least
the same number for ReiserFS. And I know it is outnumbered 10 to 1 or so in
my sample, so that would indicate at a 10 fold higher probability of
catastrophic data loss, other factors mostly the same.

 While filesystem speed is nice, it also would be great if reiser4.x would be 
 very robust against any kind of hardware failures.

Can't have both.
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread Andrew Morton
On Tue, 01 Aug 2006 15:24:37 +0400
Vladimir V. Saveliev [EMAIL PROTECTED] wrote:

  The writeout code is ugly, although that's largely due to a mismatch 
  between
  what reiser4 wants to do and what the VFS/MM expects it to do.
 
 Yes. reiser4 writeouts atoms. Most of pages get into atoms via
 sys_write. But pages dirtied via shared mapping do not. They get into
 atoms in reiser4's writepages address space operation.

It think you mean -writepage - reiser4 desn't implement -writepages().

I assume you considered hooking into -set_page_dirty() to do the
add-to-atom thing earlier on?

We'll merge mm-tracking-shared-dirty-pages.patch into 2.6.19-rc1, which
would make that approach considerably more successful, I expect. 
-set_page_dirty() is a bit awkward because it can be called under
spinlock.

Maybe comething could also be gained from the new
vm_operations_struct.page_mkwrite(), although that's less obvious...

 That is why
 reiser4_sync_inodes has two steps: on first one it calls
 generic_sync_sb_inodes to call writepages for dirty inodes to capture
 pages dirtied via shared mapping into atoms. Second step flushes atoms.
 
  
  I agree --- both with it being ugly, and that being part of why.
  
If it
  works, we can live with it, although perhaps the VFS could be made smarter.

  
  I would be curious regarding any ideas on that.  Next time I read
  through that code, I will keep in mind that you are open to making VFS
  changes if it improves things, and I will try to get clever somehow and
  send it by you.  Our squalloc code though is I must say the most
  complicated and ugliest piece of code I ever worked on for which every
  cumulative ugliness had a substantive performance advantage requiring us
  to keep it.  If you spare yourself from reading that, it is
  understandable to do so.
  
  I'd say that resier4's major problem is the lack of xattrs, acls and
  direct-io.  That's likely to significantly limit its vendor uptake. 
 
 xattrs is really a problem.

That's not good.  The ability to properly support SELinux is likely to be
important.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Adrian Ulrich

  While filesystem speed is nice, it also would be great if reiser4.x would 
  be 
  very robust against any kind of hardware failures.
 
 Can't have both.

..and some people simply don't care about this:

If you are running a 'big' Storage-System with battery protected
WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
you don't need your filesystem beeing super-robust against bad sectors
and such stuff because:

 a) You've paid enough money to let the storage care about 
Hardware issues.
 b) If your storage is on fire you can do a failover using the mirror.
 c) And if someone ran dd if=/dev/urandom of=/dev/sda you could
even rollback your Snapshot.
(Btw: i did this once to a Reiser4 filesystem (overwritten about
1.2gb). fsck.reiser4 --rebuild-sb was able to fix it.)


..but what you really need is a flexible and **fast** filesystem: Like
Reiser4.

(Yeah.. yeah.. i know: ext3 is also flexible and fast.. but Reiser4
simply is *MUCH* faster than ext3 for 'my' workload/application).

Regards,
 Adrian



Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Alan Cox
Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich:
 WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
 you don't need your filesystem beeing super-robust against bad sectors
 and such stuff because:

You do it turns out. Its becoming an issue more and more that the sheer
amount of storage means that the undetected error rate from disks,
hosts, memory, cables and everything else is rising.

There has been a great deal of discussion about this at the filesystem
and kernel summits - and data is getting kicked the way of networking -
end to end not reliability in the middle.

The sort of changes this needs hit the block layer and ever fs.



Re: reiser4: maybe just fix bugs?

2006-08-01 Thread Vladimir V. Saveliev
Hello

On Tue, 2006-08-01 at 07:33 -0700, Andrew Morton wrote:
 On Tue, 01 Aug 2006 15:24:37 +0400
 Vladimir V. Saveliev [EMAIL PROTECTED] wrote:
 
   The writeout code is ugly, although that's largely due to a mismatch 
   between
   what reiser4 wants to do and what the VFS/MM expects it to do.
  
  Yes. reiser4 writeouts atoms. Most of pages get into atoms via
  sys_write. But pages dirtied via shared mapping do not. They get into
  atoms in reiser4's writepages address space operation.
 
 It think you mean -writepage - reiser4 desn't implement -writepages().
 

no.
there is one. It is reiser4/plugin/file/file.c:writepages_unix_file().

reiser4_writepage just kicks kernel thread (entd) which works similar to
reiser4_sync_inodes() (described earlier) and waits until several pages
(including the one reiser4_writepage is called with) are written.

 I assume you considered hooking into -set_page_dirty() to do the
 add-to-atom thing earlier on?
 

Currently, add-to-atom is not simple. It may require memory allocations
and disk i/o-s. I guess these are not supposed to be called in
-set_page_dirty(). That is why in reiser4_set_page_dirty we just mark
the page in mapping's tree and dealy adding to atoms until flush time.


 We'll merge mm-tracking-shared-dirty-pages.patch into 2.6.19-rc1, which
 would make that approach considerably more successful, I expect. 
 -set_page_dirty() is a bit awkward because it can be called under
 spinlock.
 
 Maybe comething could also be gained from the new
 vm_operations_struct.page_mkwrite(), although that's less obvious...
 
  That is why
  reiser4_sync_inodes has two steps: on first one it calls
  generic_sync_sb_inodes to call writepages for dirty inodes to capture
  pages dirtied via shared mapping into atoms. Second step flushes atoms.
  
   
   I agree --- both with it being ugly, and that being part of why.
   
 If it
   works, we can live with it, although perhaps the VFS could be made 
   smarter.
 
   
   I would be curious regarding any ideas on that.  Next time I read
   through that code, I will keep in mind that you are open to making VFS
   changes if it improves things, and I will try to get clever somehow and
   send it by you.  Our squalloc code though is I must say the most
   complicated and ugliest piece of code I ever worked on for which every
   cumulative ugliness had a substantive performance advantage requiring us
   to keep it.  If you spare yourself from reading that, it is
   understandable to do so.
   
   I'd say that resier4's major problem is the lack of xattrs, acls and
   direct-io.  That's likely to significantly limit its vendor uptake. 
  
  xattrs is really a problem.
 
 That's not good.  The ability to properly support SELinux is likely to be
 important.
 

Do you think that if reiser4 supported xattrs - it would increase its
chances on inclusion?

PS: what exactly did you refer to when you said that writeout code is
ugly?



Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-01 Thread Łukasz Mierzwa
Dnia Fri, 28 Jul 2006 18:33:56 +0200, Linus Torvalds [EMAIL PROTECTED]  
napisał:



In other words, if a filesystem wants to do something fancy, it needs to
do so WITH THE VFS LAYER, not as some plugin architecture of its own. We
already have exactly the plugin interface we need, and it literally _is_
the VFS interfaces - you can plug in your own filesystems with
register_filesystem(), which in turn indirectly allows you to plug in
your per-file and per-directory operations for things like lookup etc.


What fancy (beside cryptocompress) does reiser4 do now?
Can someone point me to a list of things that are required by kernel  
mainteiners to merge reiser4 into vanilla?
I feel like I'm getting lost with current reiser4 status and things that  
are need to be done.


Łukasz Mierzwa


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Alan Cox wrote:

Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich:

WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
you don't need your filesystem beeing super-robust against bad sectors
and such stuff because:


You do it turns out. Its becoming an issue more and more that the sheer
amount of storage means that the undetected error rate from disks,
hosts, memory, cables and everything else is rising.


Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you 
telling me RAID won't detect such an error?


It just seems wholly alien to me that errors would go undetected, and 
we're OK with that, so long as our filesystems are robust enough.  If 
it's an _undetected_ error, doesn't that cause way more problems 
(impossible problems) than FS corruption?  Ok, your FS is fine -- but 
now your bank database shows $1k less on random accounts -- is that ok?



There has been a great deal of discussion about this at the filesystem
and kernel summits - and data is getting kicked the way of networking -
end to end not reliability in the middle.


Sounds good, but I've never let discussions by people smarter than me 
prevent me from asking the stupid questions.



The sort of changes this needs hit the block layer and ever fs.


Seems it would need to hit every application also...


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread David Masover

Vladimir V. Saveliev wrote:


Do you think that if reiser4 supported xattrs - it would increase its
chances on inclusion?


Probably the opposite.

If I understand it right, the original Reiser4 model of file metadata is 
the file-as-directory stuff that caused such a furor the last big push 
for inclusion (search for Silent semantic changes in Reiser4):


foo.mp3/.../rwx# permissions
foo.mp3/.../artist # part of the id3 tag

So I suspect xattrs would just be a different interface to this stuff, 
maybe just a subset of it (to prevent namespace collisions):


foo.mp3/.../xattr/ # contains files representing attributes

Of course, you'd be able to use the standard interface for 
getting/setting these.  The point is, I don't think Hans/Namesys wants 
to do this unless they're going to do it right, especially because they 
already have the file-as-dir stuff somewhat done.  Note that these are 
neither mutually exclusive nor mutually dependent -- you don't have to 
enable file-as-dir to make xattrs work.


I know it's not done yet, though.  I can understand Hans dragging his 
feet here, because xattrs and traditional acls are examples of things 
Reiser4 is supposed to eventually replace.


Anyway, if xattrs were done now, the only good that would come of it is 
building a userbase outside the vanilla kernel.  I can't see it as doing 
anything but hurting inclusion by introducing more confusion about 
plugins.


I could be entirely wrong, though.  I speak for neither 
Hans/Namesys/reiserfs nor LKML.  Talk amongst yourselves...


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Horst H. von Brand wrote:

Bernd Schubert [EMAIL PROTECTED] wrote:


While filesystem speed is nice, it also would be great if reiser4.x would be 
very robust against any kind of hardware failures.


Can't have both.


Why not?  I mean, other than TANSTAAFL, is there a technical reason for 
them being mutually exclusive?  I suspect it's more we haven't found a 
way yet...


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-01 Thread Vladimir V. Saveliev
Hello

On Tue, 2006-08-01 at 17:32 +0200, Łukasz Mierzwa wrote:
 Dnia Fri, 28 Jul 2006 18:33:56 +0200, Linus Torvalds [EMAIL PROTECTED]  
 napisał:
 
  In other words, if a filesystem wants to do something fancy, it needs to
  do so WITH THE VFS LAYER, not as some plugin architecture of its own. We
  already have exactly the plugin interface we need, and it literally _is_
  the VFS interfaces - you can plug in your own filesystems with
  register_filesystem(), which in turn indirectly allows you to plug in
  your per-file and per-directory operations for things like lookup etc.

 What fancy (beside cryptocompress) does reiser4 do now?

it is supposed to provide an ability to easy modify filesystem behaviour
in various aspects without breaking compatibility.

 Can someone point me to a list of things that are required by kernel  
 mainteiners to merge reiser4 into vanilla?

list of features reiser4 does not have now:
O_DIRECT support - we are working on it now
various block size support
quota support
xattrs and acls

list of warnings about reiser4 code:
I think that last big list of useful comments (from Christoph Hellwig
[EMAIL PROTECTED]) is addressed. Well, except for one minor (I
believe) place in file release.

Currently, Andrew is trying to find some time to review reiser4 code.

 I feel like I'm getting lost with current reiser4 status and things that  
 are need to be done.
 
 Łukasz Mierzwa
 



Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-01 Thread David Masover

Christian Trefzer wrote:

On Mon, Jul 31, 2006 at 10:57:35AM -0500, David Masover wrote:

Wil Reichert wrote:


Any idea how the fragmentation resulting from re-syncing the tree
affects performance over time?
Yes, it does affect it a lot.  I have no idea how much, and I've never 
benchmarked it, but purely subjectively, my portage has gotten slower 
over time.


Delayed allocation still performs a lot better here than the v3
immediate allocation. In addition, tree balancing operations are
performed on flush as well, so what you get on disk is basically an
almost-optimal tree. Of course, this will change a bit over time, but
with v4 it takes a lot longer for that to happen than with v3 afaict.
There _has_ been some worthwile development in the meantime : )


Hmm.  The thing is, I don't remember v3 slowing down much at all, 
whereas v4 slowed down pretty dramatically after the first few weeks. 
It does seem pretty stable now, though, and it doesn't seem to be 
getting any slower.


I've had this particular FS since...  hmm...  Is there an FS tool to 
check mkfs time?  I think it's a year now, but I'd like to be sure.


If not, I'll just find the oldest file, but the clock on this machine 
isn't reliable (have to set it with NTP every boot)...


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Alan Cox
Ar Maw, 2006-08-01 am 11:44 -0500, ysgrifennodd David Masover:
 Yikes.  Undetected.
 
 Wait, what?  Disks, at least, would be protected by RAID.  Are you 
 telling me RAID won't detect such an error?

Yes.

RAID deals with the case where a device fails. RAID 1 with 2 disks can
in theory detect an internal inconsistency but cannot fix it.

 we're OK with that, so long as our filesystems are robust enough.  If 
 it's an _undetected_ error, doesn't that cause way more problems 
 (impossible problems) than FS corruption?  Ok, your FS is fine -- but 
 now your bank database shows $1k less on random accounts -- is that ok?

Not really no. Your bank is probably using a machine (hopefully using a
machine) with ECC memory, ECC cache and the like. The UDMA and SATA
storage subsystems use CRC checksums between the controller and the
device. SCSI uses various similar systems - some older ones just use a
parity bit so have only a 50/50 chance of noticing a bit error.

Similarly the media itself is recorded with a lot of FEC (forward error
correction) so will spot most changes.

Unfortunately when you throw this lot together with astronomical amounts
of data you get burned now and then, especially as most systems are not
using ECC ram, do not have ECC on the CPU registers and may not even
have ECC on the caches in the disks.

  The sort of changes this needs hit the block layer and ever fs.
 
 Seems it would need to hit every application also...

Depending how far you propogate it. Someone people working with huge
data sets already write and check user level CRC values for this reason
(in fact bitkeeper does it for one example). It should be relatively
cheap to get much of that benefit without doing application to
application just as TCP gets most of its benefit without going app to
app.

Alan



Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread David Masover

Theodore Tso wrote:


Ah, but as soon as the repacker thread runs continuously, then you
lose all or most of the claimed advantage of wandering logs.

[...]

So instead of a write-write overhead, you end up with a
write-read-write overhead.


This would tend to suggest that the repacker should not run constantly, 
but also that while it's running, performance could be almost as good as 
ext3.



But of course, people tend to disable the repacker when doing
benchmarks because they're trying to play the my filesystem/database
has bigger performance numbers than yours game


So you run your own benchmarks, I'll run mine...  Benchmarks for 
everyone!  I'd especially like to see what performance is like with the 
repacker not running, and during the repack.  If performance during a 
repack is comparable to ext3, I think we win, although we have to amend 
that statement to My filesystem/database has the same or bigger 
perfomance numbers than yours.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Gregory Maxwell

On 8/1/06, David Masover [EMAIL PROTECTED] wrote:

Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you
telling me RAID won't detect such an error?


Unless the disk ECC catches it raid won't know anything is wrong.

This is why ZFS offers block checksums... it can then try all the
permutations of raid regens to find a solution which gives the right
checksum.

Every level of the system must be paranoid and take measure to avoid
corruption if the system is to avoid it... it's a tough problem. It
seems that the ZFS folks have addressed this challenge by building as
much of what is classically separate layers into one part.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Alan Cox wrote:

Ar Maw, 2006-08-01 am 11:44 -0500, ysgrifennodd David Masover:

Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you 
telling me RAID won't detect such an error?


Yes.

RAID deals with the case where a device fails. RAID 1 with 2 disks can
in theory detect an internal inconsistency but cannot fix it.


Still, if it does that, that should be enough.  The scary part wasn't 
that there's an internal inconsistency, but that you wouldn't know.


And it can fix it if you can figure out which disk went.  Or give it 3 
disks and it should be entirely automatic -- admin gets paged, admin 
hotswaps in a new disk, done.


we're OK with that, so long as our filesystems are robust enough.  If 
it's an _undetected_ error, doesn't that cause way more problems 
(impossible problems) than FS corruption?  Ok, your FS is fine -- but 
now your bank database shows $1k less on random accounts -- is that ok?


Not really no. Your bank is probably using a machine (hopefully using a
machine) with ECC memory, ECC cache and the like. The UDMA and SATA
storage subsystems use CRC checksums between the controller and the
device. SCSI uses various similar systems - some older ones just use a
parity bit so have only a 50/50 chance of noticing a bit error.

Similarly the media itself is recorded with a lot of FEC (forward error
correction) so will spot most changes.

Unfortunately when you throw this lot together with astronomical amounts
of data you get burned now and then, especially as most systems are not
using ECC ram, do not have ECC on the CPU registers and may not even
have ECC on the caches in the disks.


It seems like this is the place to fix it, not the software.  If the 
software can fix it easily, great.  But I'd much rather rely on the 
hardware looking after itself, because when hardware goes bad, all bets 
are off.


Specifically, it seems like you do mention lots of hardware solutions, 
that just aren't always used.  It seems like storage itself is getting 
cheap enough that it's time to step back a year or two in Moore's Law to 
get the reliability.



The sort of changes this needs hit the block layer and ever fs.

Seems it would need to hit every application also...


Depending how far you propogate it. Someone people working with huge
data sets already write and check user level CRC values for this reason
(in fact bitkeeper does it for one example). It should be relatively
cheap to get much of that benefit without doing application to
application just as TCP gets most of its benefit without going app to
app.


And yet, if you can do that, I'd suspect you can, should, must do it at 
a lower level than the FS.  Again, FS robustness is good, but if the 
disk itself is going, what good is having your directory (mostly) intact 
if the files themselves have random corruptions?


If you can't trust the disk, you need more than just an FS which can 
mostly survive hardware failure.  You also need the FS itself (or maybe 
the block layer) to support bad block relocation and all that good 
stuff, or you need your apps designed to do that job by themselves.


It just doesn't make sense to me to do this at the FS level.  You 
mention TCP -- ok, but if TCP is doing its job, I shouldn't also need to 
implement checksums and other robustness at the protocol layer (http, 
ftp, ssh), should I?  Because in this analogy, it looks like TCP is the 
block layer and a protocol is the fs.


As I understand it, TCP only lets the protocol/application know when 
something's seriously FUBARed and it has to drop the connection. 
Similarly, the FS (and the apps) shouldn't have to know about hardware 
problems until it really can't do anything about it anymore, at which 
point the right thing to do is for the FS and apps to go oh shit and 
drop what they're doing, and the admin replaces hardware and restores 
from backup.  Or brings a backup server online, or...




I guess my main point was that _undetected_ problems are serious, but if 
you can detect them, and you have at least a bit of redundancy, you 
should be good.  For instance, if your RAID reports errors that it can't 
fix, you bring that server down and let the backup server run.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Gregory Maxwell wrote:

On 8/1/06, David Masover [EMAIL PROTECTED] wrote:

Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you
telling me RAID won't detect such an error?


Unless the disk ECC catches it raid won't know anything is wrong.

This is why ZFS offers block checksums... it can then try all the
permutations of raid regens to find a solution which gives the right
checksum.


Isn't there a way to do this at the block layer?  Something in 
device-mapper?



Every level of the system must be paranoid and take measure to avoid
corruption if the system is to avoid it... it's a tough problem. It
seems that the ZFS folks have addressed this challenge by building as
much of what is classically separate layers into one part.


Sounds like bad design to me, and I can point to the antipattern, but 
what do I know?


Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)

2006-08-01 Thread Sander Sweers
On Tue, 2006-08-01 at 13:28 +0200, Maciej Sołtysiak wrote:
 Hello David,
 
 Monday, July 31, 2006, 11:46:34 PM, you wrote:
  You must be new here...
 ;-)
 
 I wanted to point out that because:
  Options B and C are all that ever seems to happen when reiserfs-list and
  lkml collide.
 
 and:
The speed of a nonworking program is irrelevant.
The cost-effectiveness of an impossible solution is irrelevant.
 
 maybe the more important thing is to allow people use r4 on their own
 (rpms, debs, apt/gentoo/repositories, etc.) better, than to push that hard 
 for kernel inclusion.
 

Yes, and in case of gentoo there are already people maintaining an
ebuild which pull in r4 on the wiki.
http://gentoo-wiki.com/HOWTO_Reiser4_With_Gentoo-Sources

WHen you make it easy for people to use reiser4 by providing ebuilds,
rpm's or deb's more users will be tempted to try out reiser4 who would
normally not be able or willing to patch the kernel.

Maintaning an ebuild for example is easy. And adding in another patch to
a kernel deb/rpm should also not be too difficult. It will take some
time to do each month but sacrificing a few hours to update these to me
would be worth it.

Maybe the reiser cummunity can help out the namesys devs?

Greets
Sander





Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Adrian Ulrich
 You do it turns out. Its becoming an issue more and more that the sheer
 amount of storage means that the undetected error rate from disks,
 hosts, memory, cables and everything else is rising.

IMHO the possibility to hit such a random-so-far-undetected-corruption
is very low with one of the big/expensive raid systems as they are
doing fancy stuff like 'disk scrubbing' and usually do fail disks
at very early stages..

 * I've seen storage systems from a BIG vendor die due to
   firmware bugs
 * I've seen FC-Cards die.. SAN-switches rebooted.. People used
   my cables to do rope skipping
 * We had Fire, non-working UPS and faulty diesel generators..

but so far the FSes (and applications) on the Storage never
complained about corrupted data.

..YMMV..

Btw: I don't think that Reiserfs really behaves this bad
with broken hardware. So far, Reiser3 survived 2 broken Harddrives
without problems while i've seen ext2/3 die 4 times so far...
(= everything inside /lost+found). Reiser4 survived
 # mkisofs .  /dev/sda

Lucky me.. maybe..


To get back on-topic:

Some people try very hard to claim that the world doesn't need
Reiser4 and that you can do everything with ext3.

Ext3 may be fine for them but some people (like me) really need Reiser4
because they got applications/workloads that won't work good (fast) on ext3.

Why is it such a big thing to include a filesystem?
Even if it's unstable: does anyone care? Eg: the HFS+ driver
is buggy (corrupted the FS of my OSX installation 3 times so far) but
does this buggyness affect people *not* using it? No.

Regards,
 Adrian


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Adrian Ulrich

  This is why ZFS offers block checksums... it can then try all the
  permutations of raid regens to find a solution which gives the right
  checksum.
 
 Isn't there a way to do this at the block layer?  Something in 
 device-mapper?

Remember: Suns new Filesystem + Suns new Volume Manager = ZFS



Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Ric Wheeler

Alan Cox wrote:

Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich:


WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
you don't need your filesystem beeing super-robust against bad sectors
and such stuff because:



You do it turns out. Its becoming an issue more and more that the sheer
amount of storage means that the undetected error rate from disks,
hosts, memory, cables and everything else is rising.



I agree with Alan despite being an enthusiastic supporter of neat array 
based technologies.


Most people use absolutely giant disks in laptops and desktop systems 
(300GB  500GB are common, 750GB on the way). File systems need to be as 
robust as possible for users of these systems as people are commonly 
storing personal critical data like photos mostly on these unprotected 
drives.


Even for the high end users, array based mirroring and so on can only do 
so much to protect you.


Mirroring a corrupt file system to a remote data center will mirror your 
corruption.


Rolling back to a snapshot typically only happens when you notice a 
corruption which can go undetected for quite a while, so even that will 
benefit from having reliability baked into the file system (i.e., it 
should grumble about corruption to let you know that you need to roll 
back or fsck or whatever).


An even larger issue is that our tools, like fsck, which are used to 
uncover these silent corruptions need to scale up to the point that they 
can uncover issues in minutes instead of days.  A lot of the focus at 
the file system workshop was around how to dramatically reduce the 
repair time of file systems.


In a way, having super reliable storage hardware is only as good as the 
file system layer on top of it - reliability needs to be baked into the 
entire IO system stack...


ric




Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Hans Reiser
Alan, I have seen only anecdotal evidence against reiserfsck, and I have
seen formal tests from Vitaly  (which it seems a user has replicated)
where our fsck did better than ext3s.  Note that these tests are of the
latest fsck from us: I am sure everyone understands that it takes time
for an fsck to mature, and that our early fsck's were poor.  I will also
say the V4's fsck is more robust than V3's because we made disk format
changes specifically to help fsck.

Now I am not dismissing your anecdotes as I will never dismiss data I
have not seen, and it sounds like you have seen more data than most
people, but I must dismiss your explanation of them. 

Being able to throw away all of the tree but the leaves and twigs with
extent pointers and rebuild all of it makes V4 very robust, more so than
ext3.  This business of inodes not moving, I don't see what the
advantage is, we can lose the directory entry and rebuild just as well
as ext3, probably better because we can at least figure out what
directory it was in.

Vitaly can say all of this more expertly than I

Hans


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Hans Reiser
Ric Wheeler wrote:

 Alan Cox wrote:



 You do it turns out. Its becoming an issue more and more that the sheer
 amount of storage means that the undetected error rate from disks,
 hosts, memory, cables and everything else is rising.



 I agree with Alan 

You will want to try our compression plugin, it has an ecc for every 64k

Hans


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread Hans Reiser
Gregory Maxwell wrote:

 This is why ZFS offers block checksums... it can then try all the
 permutations of raid regens to find a solution which gives the right
 checksum.

ZFS performance is pretty bad in the only benchmark I have seen of it. 
Does anyone have serious benchmarks of it?  I suspect that our
compression plugin (with ecc) will outperform it.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Ric Wheeler wrote:

Alan Cox wrote:

Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich:


WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
you don't need your filesystem beeing super-robust against bad sectors
and such stuff because:



You do it turns out. Its becoming an issue more and more that the sheer
amount of storage means that the undetected error rate from disks,
hosts, memory, cables and everything else is rising.


Most people use absolutely giant disks in laptops and desktop systems 
(300GB  500GB are common, 750GB on the way). File systems need to be as 
robust as possible for users of these systems as people are commonly 
storing personal critical data like photos mostly on these unprotected 
drives.


Their loss.  Robust FS is good, but really, if you aren't doing backup, 
you are going to lose data.  End of story.


Even for the high end users, array based mirroring and so on can only do 
so much to protect you.


Mirroring a corrupt file system to a remote data center will mirror your 
corruption.


Assuming it's undetected.  Why would it be undetected?

Rolling back to a snapshot typically only happens when you notice a 
corruption which can go undetected for quite a while, so even that will 
benefit from having reliability baked into the file system (i.e., it 
should grumble about corruption to let you know that you need to roll 
back or fsck or whatever).


Yes, the filesystem should complain about corruption.  So should the 
block layer -- if you don't trust the FS, use a checksum at the block 
layer.  So should...


There are just so many other, better places to do this than the FS.  The 
FS should complain, yes, but if the disk is bad, there's going to be 
corruption.


An even larger issue is that our tools, like fsck, which are used to 
uncover these silent corruptions need to scale up to the point that they 
can uncover issues in minutes instead of days.  A lot of the focus at 
the file system workshop was around how to dramatically reduce the 
repair time of file systems.


That would be interesting.  I know from experience that fsck.reiser4 is 
amazing.  Blew away my data with something akin to an rm -rf, and fsck 
fixed it.  Tons of crashing/instability in the early days, but only once 
-- before they even had a version instead of a date, I think -- did I 
ever have a case where fsck couldn't fix it.


So I guess the next step would be to make fsck faster.  Someone 
mentioned a fsck that repairs the FS in the background?


In a way, having super reliable storage hardware is only as good as the 
file system layer on top of it - reliability needs to be baked into the 
entire IO system stack...


That bit makes no sense.  If you have super reliable storage failure 
(never dies), and your FS is also reliable (never dies unless hardware 
does, but may go bat-shit insane when hardware dies), then you've got a 
super reliable system.


You're right, running Linux's HFS+ or NTFS write support is generally a 
bad idea, no matter how reliable your hardware is.  But this discussion 
was not about whether an FS is stable, but how well an FS survives 
hardware corruption.


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread Nate Diller

On 8/1/06, Andrew Morton [EMAIL PROTECTED] wrote:

On Tue, 01 Aug 2006 15:24:37 +0400
Vladimir V. Saveliev [EMAIL PROTECTED] wrote:

  The writeout code is ugly, although that's largely due to a mismatch 
between
  what reiser4 wants to do and what the VFS/MM expects it to do.

 Yes. reiser4 writeouts atoms. Most of pages get into atoms via
 sys_write. But pages dirtied via shared mapping do not. They get into
 atoms in reiser4's writepages address space operation.

It think you mean -writepage - reiser4 desn't implement -writepages().

I assume you considered hooking into -set_page_dirty() to do the
add-to-atom thing earlier on?

We'll merge mm-tracking-shared-dirty-pages.patch into 2.6.19-rc1, which
would make that approach considerably more successful, I expect.
-set_page_dirty() is a bit awkward because it can be called under
spinlock.

Maybe comething could also be gained from the new
vm_operations_struct.page_mkwrite(), although that's less obvious...

 That is why
 reiser4_sync_inodes has two steps: on first one it calls
 generic_sync_sb_inodes to call writepages for dirty inodes to capture
 pages dirtied via shared mapping into atoms. Second step flushes atoms.

  
  I agree --- both with it being ugly, and that being part of why.
 
If it
  works, we can live with it, although perhaps the VFS could be made smarter.
  
  
  I would be curious regarding any ideas on that.  Next time I read
  through that code, I will keep in mind that you are open to making VFS
  changes if it improves things, and I will try to get clever somehow and
  send it by you.  Our squalloc code though is I must say the most
  complicated and ugliest piece of code I ever worked on for which every
  cumulative ugliness had a substantive performance advantage requiring us
  to keep it.  If you spare yourself from reading that, it is
  understandable to do so.
 
  I'd say that resier4's major problem is the lack of xattrs, acls and
  direct-io.  That's likely to significantly limit its vendor uptake.

 xattrs is really a problem.

That's not good.  The ability to properly support SELinux is likely to be
important.


i disagreee that it will be difficult.  unfortunately, the patch that
I am working on right now, which fixes the various reiser4 specific
functions to avoid using VFS data structures unless needed, is a
prerequisite to enabling xattrs.  creating it is a time of tedium for
me, and it will cause a bit of internal churn (1000 lines and
counting).  it's all in the fs/reiser4 directory though, and it should
cause minimal disruption, as far as runtime bugs introduced.

once that's taken care of, i will be delighted to enable xattr support
in a way that will make selinux and beagle and such run as expected,
and will have the added advantage of some major scalability
improvements for certain lookup and update operations.

NATE


Re: reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end

2006-08-01 Thread Alexander Zarochentsev
Hello Ingo,

there is a new reiser4 / lock validator problem:

On Sunday 30 July 2006 22:57, Laurent Riffard wrote:
 ===
 [ INFO: possible circular locking dependency detected ]
 ---
 mv/29012 is trying to acquire lock:
  (txnh-hlock){--..}, at: [e0c8e09b] txn_end+0x191/0x368 [reiser4]

 but task is already holding lock:
  (atom-alock){--..}, at: [e0c8a640] txnh_get_atom+0xf6/0x39e
 [reiser4]

 which lock already depends on the new lock.

it is absolutely legal in reiser4 to lock atom first, then lock 
transaction handle.

i guess the lock validator recorded wrong dependency rule from one place 
where the spinlocks are taken in reverse order.  that place is in 
fs/reiser4/txnmgr.c:atom_begin_and_assign_to_txnh, that atom is new, 
just kmalloc'ed object which is inaccessible for others, so it can't a 
source for deadlock.

but how to explain that to the lock validator?


 the existing dependency chain (in reverse order) is:

 - #1 (atom-alock){--..}:
[c012ce2f] lock_acquire+0x60/0x80
[c0292968] _spin_lock+0x19/0x28
[e0c8bbd7] try_capture+0x7cf/0x1cd7 [reiser4]
[e0c786e1] longterm_lock_znode+0x427/0x84f [reiser4]
[e0ca55dc] coord_by_handle+0x2be/0x7f7 [reiser4]
[e0ca5f89] coord_by_key+0x1e3/0x22d [reiser4]
[e0c7dbd2] insert_by_key+0x8f/0xe0 [reiser4]
[e0cbf7f1] write_sd_by_inode_common+0x361/0x61a [reiser4]
[e0cbfce4] create_object_common+0xf1/0xf6 [reiser4]
[e0cbaebf] create_vfs_object+0x51d/0x732 [reiser4]
[e0cbb1fd] mkdir_common+0x43/0x4b [reiser4]
[c015ed33] vfs_mkdir+0x5a/0x9d
[c0160f5e] sys_mkdirat+0x88/0xc0
[c0160fa6] sys_mkdir+0x10/0x12
[c0102c2d] sysenter_past_esp+0x56/0x8d

 - #0 (txnh-hlock){--..}:
[c012ce2f] lock_acquire+0x60/0x80
[c0292968] _spin_lock+0x19/0x28
[e0c8e09b] txn_end+0x191/0x368 [reiser4]
[e0c7f97d] reiser4_exit_context+0x1c2/0x571 [reiser4]
[e0cbb091] create_vfs_object+0x6ef/0x732 [reiser4]
[e0cbb1fd] mkdir_common+0x43/0x4b [reiser4]
[c015ed33] vfs_mkdir+0x5a/0x9d
[c0160f5e] sys_mkdirat+0x88/0xc0
[c0160fa6] sys_mkdir+0x10/0x12
[c0102c2d] sysenter_past_esp+0x56/0x8d

 other info that might help us debug this:

 2 locks held by mv/29012:
  #0:  (inode-i_mutex/1){--..}, at: [c015f50b]
 lookup_create+0x1d/0x73
  #1:  (atom-alock){--..}, at: [e0c8a640]
 txnh_get_atom+0xf6/0x39e [reiser4]

 stack backtrace:
  [c0104df0] show_trace+0xd/0x10
  [c0104e0c] dump_stack+0x19/0x1d
  [c012bc62] print_circular_bug_tail+0x59/0x64
  [c012cc3e] __lock_acquire+0x814/0x9a5
  [c012ce2f] lock_acquire+0x60/0x80
  [c0292968] _spin_lock+0x19/0x28
  [e0c8e09b] txn_end+0x191/0x368 [reiser4]
  [e0c7f97d] reiser4_exit_context+0x1c2/0x571 [reiser4]
  [e0cbb091] create_vfs_object+0x6ef/0x732 [reiser4]
  [e0cbb1fd] mkdir_common+0x43/0x4b [reiser4]
  [c015ed33] vfs_mkdir+0x5a/0x9d
  [c0160f5e] sys_mkdirat+0x88/0xc0
  [c0160fa6] sys_mkdir+0x10/0x12
  [c0102c2d] sysenter_past_esp+0x56/0x8d

 (Linux antares.localdomain 2.6.18-rc2-mm1 #77 Sun Jul 30 15:09:34
 CEST 2006 i686 AMD Athlon(TM) XP 1600+ unknown GNU/Linux)

-- 
Alex.



Re: reiser4: maybe just fix bugs?

2006-08-01 Thread Nate Diller

On 8/1/06, David Masover [EMAIL PROTECTED] wrote:

Vladimir V. Saveliev wrote:

 Do you think that if reiser4 supported xattrs - it would increase its
 chances on inclusion?

Probably the opposite.

If I understand it right, the original Reiser4 model of file metadata is
the file-as-directory stuff that caused such a furor the last big push
for inclusion (search for Silent semantic changes in Reiser4):

foo.mp3/.../rwx# permissions
foo.mp3/.../artist # part of the id3 tag

So I suspect xattrs would just be a different interface to this stuff,
maybe just a subset of it (to prevent namespace collisions):

foo.mp3/.../xattr/ # contains files representing attributes

Of course, you'd be able to use the standard interface for
getting/setting these.  The point is, I don't think Hans/Namesys wants
to do this unless they're going to do it right, especially because they
already have the file-as-dir stuff somewhat done.  Note that these are
neither mutually exclusive nor mutually dependent -- you don't have to
enable file-as-dir to make xattrs work.

I know it's not done yet, though.  I can understand Hans dragging his
feet here, because xattrs and traditional acls are examples of things
Reiser4 is supposed to eventually replace.

Anyway, if xattrs were done now, the only good that would come of it is
building a userbase outside the vanilla kernel.  I can't see it as doing
anything but hurting inclusion by introducing more confusion about
plugins.

I could be entirely wrong, though.  I speak for neither
Hans/Namesys/reiserfs nor LKML.  Talk amongst yourselves...


i should clarify things a bit here.  yes, hans' goal is for there to
be no difference between the xattr namespace and the readdir one.
unfortunately, this is not feasible with the current VFS, and some
major work would have to be done to enable this without some
pathological cases cropping up.  some very smart people think that it
cannot be done at all.

xattr is a seperate VFS interface, which avoids those problems by
defining certain restrictions on how the 'files' which live in that
namespace can be manupulated.  for instance, hard links are
non-existent, and the 'mv' command cannot move a file between
different xattr namespaces.

enabling xattr would have no connection to the file-as-directory
stuff, and (without extra work) would not even allow access to the
things reiser4 defined in the '...' directory.  also enabling xattr in
the way i intend would in no way compromise hans' long-term vision.

HOWEVER, i *need* to point out that hans and i disagree somewhat on
the specifics here, and so i should say adamently i don't speak here
on behalf of hans or namesys.

that won't stop me from submitting my own patch though :)

NATE


Re: [BUG] nikita-1481, nikita-717 and nikita-373 here and there

2006-08-01 Thread Craig Shelley
On Fri, 2006-06-23 at 02:51 +0300, Jussi Judin wrote:
 After that I upgraded to Debian patched kernel 2.6.16-14 and to reiser4 
 patch 2.6.16-4 for that kernel and ran fsck.reiser4. Then I got errors 
 like this in kern.log after a while:
 
 WARNING: Error for inode 1731981 (-2)
 reiser4[nfsd(3817)]: key_warning 
 (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:
 WARNING: Error for inode 1703086 (-2)
 reiser4[nfsd(3818)]: key_warning 
 (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:
 WARNING: Error for inode 1726433 (-2)
 reiser4[nfsd(3818)]: key_warning 
 (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:

I too am getting these warnings:

Jul 27 06:28:15 prometheus kernel: reiser4[find(10770)]: key_warning
(fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:
Jul 27 06:28:15 prometheus kernel: WARNING: Error for inode 3922698 (-2)
[REPEATED 17 TIMES]
Jul 27 06:28:15 prometheus kernel: reiser4[find(10770)]: key_warning
(fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:
Jul 27 06:28:15 prometheus kernel: WARNING: Error for inode 3922697 (-2)
[REPEATED 17 TIMES]
Jul 27 06:28:16 prometheus kernel: reiser4[find(10770)]: key_warning
(fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:
Jul 27 06:28:16 prometheus kernel: WARNING: Error for inode 3922696 (-2)
[REPEATED 17 TIMES]

...
...

Jul 27 06:28:19 prometheus kernel: reiser4[find(10770)]:
cbk_level_lookup (fs/reiser4/search.c:961)[vs-3533]:
Jul 27 06:28:19 prometheus kernel: WARNING: Keys are inconsistent. Fsck?
Jul 27 06:28:19 prometheus kernel: reiser4[find(10770)]: key_warning
(fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]:
Jul 27 06:28:19 prometheus kernel: WARNING: Error for inode 3922690 (-5)


System information:
Kernel: 2.6.16.20
Patches: reiser4-for-2.6.16-4.patch.gz
Reiser4progs: 1.0.5

This machine is used for recording TV using a DVB card, compresses the
files, and serves them via NFS and Samba.

Until recently, the system ran kernel linux-2.6.11.6, and performed
flawlessly for over a year. After upgrading the kernel, I upgraded
reiser4progs, and fscked all reiser4 partitions. No errors were found.

The system is run on a UPS, and does not have a history of memory or IO
trouble.
I am currently investigating why the samba shares have failed, and
noticed this in the log.

I believe problem/bug this is related to the kernel upgrade, rather than
some random corruption because it seem too much of a coincidence to
happen so soon after upgrading the kernel.

Any help/advice is greatly appreciated.

...back to the samba investigation.

Many Thanks,

-- 
Craig Shelley
EMail: [EMAIL PROTECTED]
Jabber: [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)

2006-08-01 Thread Maciej Sołtysiak
Hello Sander,

Tuesday, August 1, 2006, 8:10:34 PM, you wrote:
 Yes, and in case of gentoo there are already people maintaining an
 ebuild which pull in r4 on the wiki.
 http://gentoo-wiki.com/HOWTO_Reiser4_With_Gentoo-Sources
Debian has reiser4progs and kernel-patch-2.6-reiser4:
- stable: 20040813-6
- testing: 20050715-1
- unstable: 20050715-1

Very old patches. Also the patch descriptions says:
WARNING: this software is to be considered usable but its deployment in
production environments is still not recommended. Use at your own risk.

I know it is very easy to create ubuntu kernel packages (I have done a few)
I might try to do one for current dapper kernel for i386. But it would have
to wait due to time my personal constraints (projects, etc.)

-- 
Best regards,
Maciej




reiserfs 3.6 with 2TB file size limitation!

2006-08-01 Thread Ricardo \(Tru64 User\)
Hi,
I read on reiserfs site, faq #1, about max file sizes:
max file size 2^60 - bytes = 1 Ei, 
but page cache limits this to 8 Ti on architectures
with 32 bit int for reiserfs 3.6

I do have a reiserfs 3.6 filesystem, on kernel 
2.6.12-21mdksmp (mandriva 2006) that would not take a
filesize greater than 2TB!

Filesize in question::
-rw-r--r--  1 myuser users 2147483647 Aug  1 16:41
myfile.out

Error received:
Filesize limit exceeded

This filesystem is on a scsi device connected via 
Adaptec AIC-7899P U160/m card (if it matters).

# debugreiserfs /dev/sda1
debugreiserfs 3.6.19 (2003 www.namesys.com)


Filesystem state: consistency is not checked after
last mounting

Reiserfs super block in block 16 on 0x801 of format
3.6 with standard journal
Count of blocks on the device: 292967356
Number of bitmaps: 8941
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps,
data, reserved] blocks): 172310677
Root block: 121672
Filesystem is NOT clean
Tree height: 5
Hash function used to sort names: r5
Objectid map size 2, max 972
Journal parameters:
Device [0x0]
Magic [0x5613f7b2]
Size 8193 blocks (including 1 for journal
header) (first block 18)
Max transaction length 1024 blocks
Max batch size 900 blocks
Max commit age 30
Blocks reserved by journal: 0
Fs state field: 0x0:
sb_version: 2
inode generation number: 1507612
UUID: 78e233e1-8210-4c9f-8f5d-7159c754db16
LABEL: 
Set flags in SB:
ATTRIBUTES CLEAN

Any idea why i would receive this limitation, with a
2.6 kernel and reiserfs 3.6? Can this be corrected to
allow the full file size?
I would appreciate any hints
I check the kernel-sources for the kernel version i am
running, and i dont see any mention of large
file/filesystem support that i recall was available
in  older kernel compiles, so it is probably
integrated now?


_Thanks

Richard

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)

2006-08-01 Thread Sander Sweers
On Tue, 2006-08-01 at 23:12 +0200, Maciej Sołtysiak wrote:
 Hello Sander,
Hey
 
 Tuesday, August 1, 2006, 8:10:34 PM, you wrote:
  Yes, and in case of gentoo there are already people maintaining an
  ebuild which pull in r4 on the wiki.
  http://gentoo-wiki.com/HOWTO_Reiser4_With_Gentoo-Sources
 Debian has reiser4progs and kernel-patch-2.6-reiser4:

Nice :)

 - stable: 20040813-6
 - testing: 20050715-1
 - unstable: 20050715-1

Ouch :( It is in serious need of updating.

With the approval of Namesys I would like to add a new entry to the wiki
frontpage. I would be someting like Get reiser4 now or Howto install
reiser4. Under that we detail the steps to get kernels for distros
which include reiser4 and how to patch it yourself.

 I know it is very easy to create ubuntu kernel packages (I have done a few)
 I might try to do one for current dapper kernel for i386. But it would have
 to wait due to time my personal constraints (projects, etc.)

Great :)

Are there any on the list who know of rpm's for Suse/Redhat/Mandrake
that include reiser4?

Greets
Sander



Re: reiserfs 3.6 with 2TB file size limitation!

2006-08-01 Thread Edward Shishkin

Ricardo (Tru64 User) wrote:

Hi,


Hello


I read on reiserfs site, faq #1, about max file sizes:
max file size 2^60 - bytes = 1 Ei, 
but page cache limits this to 8 Ti on architectures

with 32 bit int for reiserfs 3.6

I do have a reiserfs 3.6 filesystem, on kernel 
2.6.12-21mdksmp (mandriva 2006) that would not take a

filesize greater than 2TB!

Filesize in question::
-rw-r--r--  1 myuser users 2147483647 Aug  1 16:41
myfile.out


Hmm.. actually this file has size 2GB, not 2TB.
Did you try to specify O_LARGEFILE when creating a file?
(man creat)



Error received:
Filesize limit exceeded

This filesystem is on a scsi device connected via 
Adaptec AIC-7899P U160/m card (if it matters).


# debugreiserfs /dev/sda1
debugreiserfs 3.6.19 (2003 www.namesys.com)


Filesystem state: consistency is not checked after
last mounting

Reiserfs super block in block 16 on 0x801 of format
3.6 with standard journal
Count of blocks on the device: 292967356
Number of bitmaps: 8941
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps,
data, reserved] blocks): 172310677
Root block: 121672
Filesystem is NOT clean
Tree height: 5
Hash function used to sort names: r5
Objectid map size 2, max 972
Journal parameters:
Device [0x0]
Magic [0x5613f7b2]
Size 8193 blocks (including 1 for journal
header) (first block 18)
Max transaction length 1024 blocks
Max batch size 900 blocks
Max commit age 30
Blocks reserved by journal: 0
Fs state field: 0x0:
sb_version: 2
inode generation number: 1507612
UUID: 78e233e1-8210-4c9f-8f5d-7159c754db16
LABEL: 
Set flags in SB:

ATTRIBUTES CLEAN

Any idea why i would receive this limitation, with a
2.6 kernel and reiserfs 3.6? Can this be corrected to
allow the full file size?
I would appreciate any hints
I check the kernel-sources for the kernel version i am
running, and i dont see any mention of large
file/filesystem support that i recall was available
in  older kernel compiles, so it is probably
integrated now?


_Thanks

Richard

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 







Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Ian Stirling

David Masover wrote:

David Lang wrote:


On Mon, 31 Jul 2006, David Masover wrote:

Oh, I'm curious -- do hard drives ever carry enough 
battery/capacitance to cover their caches?  It doesn't seem like it 
would be that hard/expensive, and if it is done that way, then I 
think it's valid to leave them on.  You could just say that other 
filesystems aren't taking as much advantage of newer drive features 
as Reiser :P



there are no drives that have the ability to flush their cache after 
they loose power.



Aha, so back to the usual argument:  UPS!  It takes a fraction of a 
second to flush that cache.


You probably don't actually want to flush the cache - but to write
to a journal.
16M of cache - split into 32000 writes to single sectors spread over
the disk could well take several minutes to write. Slapping it onto
a journal would take well under .2 seconds.
That's a non-trivial amount of storage though - 3J or so, [EMAIL PROTECTED] -
a moderately large/expensive capacitor.

And if you've got to spin the drive up, you've just added another
order of magnitude.

You can see why a flash backup of the write cache may be nicer.
You can do it if the disk isn't spinning.
It uses moderately less energy - and at a much lower rate, which
means the power supply can be _much_ cheaper. I'd guess it's the
difference between under $2 and $10.
And if you can use it as a lazy write cache for laptops - things
just got better battery life wise too.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread David Masover

Ian Stirling wrote:

David Masover wrote:

David Lang wrote:


On Mon, 31 Jul 2006, David Masover wrote:

Oh, I'm curious -- do hard drives ever carry enough 
battery/capacitance to cover their caches?  It doesn't seem like it 
would be that hard/expensive, and if it is done that way, then I 
think it's valid to leave them on.  You could just say that other 
filesystems aren't taking as much advantage of newer drive features 
as Reiser :P



there are no drives that have the ability to flush their cache after 
they loose power.



Aha, so back to the usual argument:  UPS!  It takes a fraction of a 
second to flush that cache.


You probably don't actually want to flush the cache - but to write
to a journal.
16M of cache - split into 32000 writes to single sectors spread over
the disk could well take several minutes to write. Slapping it onto
a journal would take well under .2 seconds.
That's a non-trivial amount of storage though - 3J or so, [EMAIL PROTECTED] -
a moderately large/expensive capacitor.


Before we get ahead of ourselves, remember:  ~$200 buys you a huge 
amount of battery storage.  We're talking several minutes for several 
boxes, at the very least -- more like 10 minutes.


But yes, a journal or a software suspend.


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread David Masover

Nate Diller wrote:

On 8/1/06, David Masover [EMAIL PROTECTED] wrote:

Vladimir V. Saveliev wrote:



I could be entirely wrong, though.  I speak for neither
Hans/Namesys/reiserfs nor LKML.  Talk amongst yourselves...


i should clarify things a bit here.  yes, hans' goal is for there to
be no difference between the xattr namespace and the readdir one.
unfortunately, this is not feasible with the current VFS, and some
major work would have to be done to enable this without some
pathological cases cropping up.  some very smart people think that it
cannot be done at all.


But an xattr interface should work just fine, even if the rest of the 
system is inaccessible (no readdir interface) -- preventing all these 
pathological problems, except the one where Hans implements it the way 
I'm thinking, and kernel people hate it.