Re: Trying to port data-logging to RH 2.4.18-19.7.x kernel

2003-02-03 Thread Chris Mason
On Fri, 2003-01-31 at 16:28, John Dalbec wrote:

> The immediate caller is the "ReiserFS specific hack" in 
> fs/inode.c:get_inode signed <[EMAIL PROTECTED]>.  Is the BKL supposed to be 
> held when get_inode is called?  

Traditionally, the BKL is supposed to be held when iget or iget4 is
called.  RedHat might have patches that do away with that and simply
missed reiserfs, but it is more likely they have a patch to reduce BKL
use in NFS that missed the iget4 case.

So your two basic choices are adding the BKL to reiserfs_read_inode2, or
going into the nfsd source and putting them around the iget4 call.  You
might want to double check to see if their source had the BKL in
reiserfs_read_inode2 before you started the data logging port.   

If not, you should be able to reproduce the oops on an unmodified redhat
kernel (compiled with SMP on), and I'd appreciate it if you could send
them a bug report as well.

-chris





Re: when distros do not support official Marcelo kernels they arenot being team players (was Re: reiserfs on redhat advanced server?)

2003-02-03 Thread Chris Mason
On Mon, 2003-02-03 at 09:20, Hans Reiser wrote:

> It is different from refusing to support the user who downloads 
> Marcelo's kernel after it does ship (after the distro CD went into the 
> stamping plant). That is what I am complaining about.  The default 
> should be to support all Marcelo kernels unless there is a motivated 
> reason not to (e.g. he ships a broken NFS kernel and the user is 
> complaining about NFS).  Users should feel that they can download any 
> latest official stable kernel (it is okay though to tell them to check a 
> website created by the distro to see if it is a known bad/unsupported 
> kernel), and everything will be fine with the distro.  When distros 
> don't do this, they are not being team players.

Hans, the vanilla kernels are lacking both bug fixes and features that
are critical to what our users are doing.  Even if the bug fixes all got
in, there are various reasons the features probably won't.  

If there was any vanilla kernel that had everything we needed, we'd
support it, and do a dance around a bonfire made from all of our patch
maintenance scripts and code.

The whole point of buying the distro is that you don't have the time and
energy to collect and compile every application and turn it into
something you can easily install on your personal machine.  The kernel
is one of those applications.  Feel free to replace it, but it doesn't 
make sense to expect us to help you fix the problems when we don't have
control over the configuration, compile or sources.

That would be like switching engines in your car and expecting the
original car company to do a warranty repair on the new engine.

-chris (speaking only for himself and not SuSE)





when distros do not support official Marcelo kernels they are notbeing team players (was Re: reiserfs on redhat advanced server?)

2003-02-03 Thread Hans Reiser
Juan Quintela wrote:


"hans" == Hans Reiser <[EMAIL PROTECTED]> writes:
   


hans> I understand and support being pissed at Linus for calling it 2.4.0
hans> when it wasn't stable enough before 2.4.18 because VM and VFS were
hans> still being changed, but Marcelo is pretty stable in all of his
hans> official releases, and it is easy to get him to take good code.

But it is not possible to get Marcelo to adapt his release schedule to
the distro's release schedule :p That is one of the BIG problems.  If
when your release is about to freeze, marcelo kernel is in pre5/pre6
what do you do:

- bet that final kernel will be there by the end of the distro release
 and switch.  And in the proccess, invalidate all the testing that
 you have done so far.

- get the old known stable kernel, and adapt all the bugfixes that you
 found in the pre series?


This is reasonable, and I am not complaining about it. 

It is different from refusing to support the user who downloads 
Marcelo's kernel after it does ship (after the distro CD went into the 
stamping plant). That is what I am complaining about.  The default 
should be to support all Marcelo kernels unless there is a motivated 
reason not to (e.g. he ships a broken NFS kernel and the user is 
complaining about NFS).  Users should feel that they can download any 
latest official stable kernel (it is okay though to tell them to check a 
website created by the distro to see if it is a known bad/unsupported 
kernel), and everything will be fine with the distro.  When distros 
don't do this, they are not being team players.


What strategy do you think that is better?  If you bet (as almost
everybody) that second one is better, you are going to have a heavily
patched kernel.

And that is without taking into account that a lot of the bug fixes
that go to marcelo kernel go the route:

- user find bug
- user blame distro kernel
- distro kernel team found the problem (sometimes with cooperation
 with the subsystem maintainer)


I don't see as many ReiserFS bugs found/fixed by distro kernel teams 
responding to complaints by their users as I would expect.  Perhaps we 
are unusual, I lack the perspective to know.  I would like to see more 
of them, and I don't really understand the lack of them as I would 
expect to see more.

--
Hans




Re: reiserfs on redhat advanced server?

2003-02-03 Thread Juan Quintela
> "hans" == Hans Reiser <[EMAIL PROTECTED]> writes:

hans> I understand and support being pissed at Linus for calling it 2.4.0
hans> when it wasn't stable enough before 2.4.18 because VM and VFS were
hans> still being changed, but Marcelo is pretty stable in all of his
hans> official releases, and it is easy to get him to take good code.

But it is not possible to get Marcelo to adapt his release schedule to
the distro's release schedule :p That is one of the BIG problems.  If
when your release is about to freeze, marcelo kernel is in pre5/pre6
what do you do:

- bet that final kernel will be there by the end of the distro release
  and switch.  And in the proccess, invalidate all the testing that
  you have done so far.

- get the old known stable kernel, and adapt all the bugfixes that you
  found in the pre series?

What strategy do you think that is better?  If you bet (as almost
everybody) that second one is better, you are going to have a heavily
patched kernel.

And that is without taking into account that a lot of the bug fixes
that go to marcelo kernel go the route:

- user find bug
- user blame distro kernel
- distro kernel team found the problem (sometimes with cooperation
  with the subsystem maintainer)
- distro kernel team send the patch to subsystem maintainer
- subsytem maintainer send the patch to marcelo (perhaps after some
  local modification)

hans> I am not really opposed to vendors shipping their own kernels and
hans> supporting them, but I am opposed to them not supporting an official
hans> stable Marcelo kernel unless they have a specific reason not to.  The
hans> Marcelo kernels need to be considered the official supported ones by
hans> the entire community, regardless of what other ones might also be
hans> supported by parts of the community.

Believe me, if it will be possible (not indeed easy) to get that done,
my life will be much, much better :p

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy



Re: reiserfsck --rebuild-tree all-in-one problem.

2003-02-03 Thread Vitaly Fertman
On Sunday 02 February 2003 21:33, Brian Chu wrote:
> Hello.
>
> Last friday when I went to upgrade my server, I noticed that there had
> been a lot of kernel messages on my server that were saying that one
> partition was spewing this:
>
> Jan  5 13:48:14 simmy kernel: hde: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Jan  5 13:48:14 simmy kernel: hde: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=91887, high=0, low=91887, sector=91824
> Jan  5 13:48:14 simmy kernel: end_request: I/O error, dev 21:01 (hde),
> sector 91824
> Jan  5 13:48:14 simmy kernel: vs-13070: reiserfs_read_inode2: i/o failure
> occurred trying to find stat data of [7495 7710 0x0 SD]
>
> I gave up that night, because running dd once took 7 hours and
> reiserfsck twice took 2 hours each, so the whole day was wasted.  I had
> read on the first time I ran --rebuild-tree that a "dd_rescue" was
> suggested, so I downloaded it, installed it, and ran it again (since I had
> used just plain dd the first time). I'm not sure if that made a difference
> or not.

Right, dd seems to produce an output with just skipped bad blocks not writing 
anything into the output.

> Today I started again, assuming that with dd_rescue, I would have a
> greater chance of getting the filesystem recovered, but --check told me I
> had to run --rebuild-tree, and this time I just did --logfile /dev/null,
> because screen dumps during the run would make it impossible to see what's
> going on. But again, it stopped again at the same place- Pass 2. Since the
> logfiles spit so much STUFF out, I have none at the moment (I can remake
> them if needed).
>
> Screen dump:
>
> Pass 2:
> 0%20%40%..  left 36, 0
> /sec
>
> And it stops there. top indicates reiserfsck is using all of the cpu
> cycles, even after it seemingly freezes.

Looks like you built the reiserfsck on another mashine. Could you rebuild it 
on the same mashine you run it. It is possible to suppress the logfile with 
-n option, but I think the logfile was so big due to this endless loop.

-- 

Thanks,
Vitaly Fertman