Re: Checking filesystems periodically

2012-05-07 Thread Martin Langhoff
On Fri, May 4, 2012 at 9:41 AM, Daniel Drake d...@laptop.org wrote:
 Thirdly, fsck is not magic. It cannot detect/repair all corruption. As
 far as I know, we have not yet found a case of corruption which can be
 meaningfully fixed by fsck. We did do quite a bit of testing for this
 at an earlier point.

It is worthwhile expanding on this: ext3 is a journalled fs, so it can
fix _some_ issues thanks to the journal, and does so on boot. So the
harsh environment issues are mostly handled by ext3 journal-based
recovery.

For problems not fixable with ext3's journal-based recovery... cannot
(currently) be fixed by fsck as dsd writes. At least that's what we
found so far.

Given those findings, it fell in our priorities list, and there is
harsh competition there! :-)

Now, fsck and our choice of FS are not frozen in stone, so help is
welcome on this track. Improving our plymouth screens to have an I'm
doing extra work this time, so booting slower image would be good.
Not easy, but definitely good.

(Note that any boot time repair must be fully automated. 6 year olds
won't be telling fsck what to do with the broken inode table.)

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Checking filesystems periodically

2012-05-04 Thread Anish Mangal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I am curious to know why we are not periodically checking file systems
after every N boots on the XO laptop.

The biggest reason I can assume is because every Nth time the system
will appear to boot very slowly to the user thus creating the
impression that something is wrong.

However, in light of the fact that XO's often operate in harsh
environments and (probably) greater possibility of hardware failure
because of that, would it be a good idea in doing so?

Perhaps we could mount /boot and /home on different partitions and
check /boot periodically.

One issue that is pertinent to this discussion is [1]

Thoughts?

[1] http://dev.laptop.org.au/issues/1214

Cheers,
Anish

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPo5mRAAoJEBoxUdDHDZVp3dwH/0vOsuR9zxv/u+2wzyMCIXNb
7SKQeevki8oBaZW6S4fVm5UZUT02hNothrFJhfz8XFwYigahVKyPUnYTPmC9kHVu
C88azxuDRHcUeRyuHvVZdp/G7Xyd+FUI+DfjnEi/ts8lgRBMFfiz+g/8MljW2rsN
32Bu6/fevujT3khvkFn3LMKSfRJtVCXMt9PddRHBVXxmIjxS/3JV8kAYJiheCE7o
Na3zsTN3G+nTLEXd+oPJ5JeDiJnuz94genQu160vD4nqhtwsDD5HtsTIXf0tp3BD
yd8TYSGDHuolSU0UqDfPy52mYuA0qTLBjbBQ7TXnwuUXaGJpTyF+XYFgSaorFeo=
=Kl8n
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Checking filesystems periodically

2012-05-04 Thread Daniel Drake
On Fri, May 4, 2012 at 2:55 AM, Anish Mangal an...@activitycentral.com wrote:
 Hi,

 I am curious to know why we are not periodically checking file systems
 after every N boots on the XO laptop.

Historically, we have used a filesystem without a checker (jffs2).

Now that we use ext* on newer laptops, there are a few reasons:

Firstly, its non-trivial, due to the design/setup of our initramfs and
filesystem.

Secondly, the user experience:

 The biggest reason I can assume is because every Nth time the system
 will appear to boot very slowly to the user thus creating the
 impression that something is wrong.

Until F17 we haven't had a good way of communicating this via the boot
animation. Now we can do that easily but it lacks implementation.

Thirdly, fsck is not magic. It cannot detect/repair all corruption. As
far as I know, we have not yet found a case of corruption which can be
meaningfully fixed by fsck. We did do quite a bit of testing for this
at an earlier point.

 However, in light of the fact that XO's often operate in harsh
 environments and (probably) greater possibility of hardware failure
 because of that, would it be a good idea in doing so?

Despite all the above, yes, it would be a good idea, it's something
we'll hopefully get around to at some point. Your help implementing it
is welcome.

Daniel
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Checking filesystems periodically

2012-05-04 Thread Chris Ball
Hi,

On Fri, May 04 2012, Daniel Drake wrote:
 Until F17 we haven't had a good way of communicating this via the boot
 animation. Now we can do that easily but it lacks implementation.

 Thirdly, fsck is not magic. It cannot detect/repair all corruption. As
 far as I know, we have not yet found a case of corruption which can be
 meaningfully fixed by fsck. We did do quite a bit of testing for this
 at an earlier point.

Also, our users cannot be expected to understand (or obey) a requirement
that they not turn off the machine while it's doing something dangerous:
so if powering down half way through fsck leaves the filesystem in a
worse state than it was before fsck ran, we probably shouldn't do it
at all.

- Chris.
-- 
Chris Ball   c...@laptop.org   http://printf.net/
One Laptop Per Child
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Checking filesystems periodically

2012-05-04 Thread Chris Ball
Hi,

On Fri, May 04 2012, Chris Ball wrote:
 Hi,

 On Fri, May 04 2012, Daniel Drake wrote:
 Until F17 we haven't had a good way of communicating this via the boot
 animation. Now we can do that easily but it lacks implementation.

 Thirdly, fsck is not magic. It cannot detect/repair all corruption. As
 far as I know, we have not yet found a case of corruption which can be
 meaningfully fixed by fsck. We did do quite a bit of testing for this
 at an earlier point.

 Also, our users cannot be expected to understand (or obey) a requirement
 that they not turn off the machine while it's doing something dangerous:
 so if powering down half way through fsck leaves the filesystem in a
 worse state than it was before fsck ran, we probably shouldn't do it
 at all.

Another also: sometimes when fsck finds an inconsistency it asks you for
the root password, but some of our users don't have the root password,
so they might end up in a reboot loop where they can't progress.

- Chris.
-- 
Chris Ball   c...@laptop.org   http://printf.net/
One Laptop Per Child
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Checking filesystems periodically

2012-05-04 Thread Anish Mangal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri 04 May 2012 07:57:47 PM IST, Chris Ball wrote:
 Hi,

 On Fri, May 04 2012, Chris Ball wrote:
 Hi,

 On Fri, May 04 2012, Daniel Drake wrote:
 Until F17 we haven't had a good way of communicating this via the boot
 animation. Now we can do that easily but it lacks implementation.

 Thirdly, fsck is not magic. It cannot detect/repair all corruption. As
 far as I know, we have not yet found a case of corruption which can be
 meaningfully fixed by fsck. We did do quite a bit of testing for this
 at an earlier point.

 Also, our users cannot be expected to understand (or obey) a requirement
 that they not turn off the machine while it's doing something dangerous:
 so if powering down half way through fsck leaves the filesystem in a
 worse state than it was before fsck ran, we probably shouldn't do it
 at all.


This is a valid point. _If_ there is a possibility that fsck will leave
the system in a worse state if the laptop is accidentally powered off,
this is a bad idea.

However, we can probably do a 'read-only' fsck, and provide a
notification (perhaps to contact technical support) if it finds
problems. That could be a fail-safe way of implementing this.

Am I right in assuming (with my limited knowledge in this area) that
the fsck-on-boot is by default read-only?

 Another also: sometimes when fsck finds an inconsistency it asks you for
 the root password, but some of our users don't have the root password,
 so they might end up in a reboot loop where they can't progress.


As above, a non-intrusive way of doing this would be providing a
notification to contact technical support. The biggest caveat then
becomes that the problem doesn't happen too often ;-) or else tech
support start cursing us!

In general, I wouldn't expect users to open a terminal and type
commands and passwords to repair their machine. If it has to be
implemented, it has to be completely handled by a GUI.

 - Chris

- --
Anish
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPo/NaAAoJEBoxUdDHDZVp7XMH/jQOdpdgeY0KPkgjwu7ImnyA
8kxSJcZUXtQ2bVA9ASXBmQH5+4gW/ZWUl+L0zbj/NwtFxef6clWnyeOtGwoTvRDp
pBBNrj/5y3NjeshEvZGFCZH0VxjmNS7I77I5b9f9qVsp2y03ZUSgC8E9z6WPet4q
fwAfjpgEOW+P3wclVDiWpzDuWFl6rXkiMssETuIIwPWCVEl04Wo7GikNrQrLe1M/
ySh6XSlZJkMv4RexBWzg/MHb9XNNFy+HwHmvWcD6h4wSUC6GjDBmUi14RvmS1Bz6
+a70xhiOA+bp8rsqvkEBzLHJA89g2BRImuLinez8Y2aR9K+rW1TrfzGwjCffdv8=
=q8zv
-END PGP SIGNATURE-

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Checking filesystems periodically

2012-05-04 Thread Mikus Grinbergs
On Fri, May 4, 2012 at 2:55 AM, Anish Mangalan...@activitycentral.com 
 wrote:

 I am curious to know why we are not periodically checking file systems
 after every N boots on the XO laptop.


I think the question is:  Should functions which affect the 'system' be 
performed automatically, or should they be explicitly invoked ?


The currently-existing OLPC precedent is the 'Software updates' facility 
-- which gets invoked manually to check whether the system's Activities 
are up-to-date.  My suggestion is to package the required software 
(fsck?) into a new 'Check system' facility - and add an icon for that 
new facility to the 'My Settings' panel - then the user who invokes it 
will know why his XO is tied up for a while.  [The local XO-laptop 
distributing organization can publish guidelines as to how often such a 
'Check system' facility ought to be used.]


mikus

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Checking filesystems periodically

2012-05-04 Thread Walter Bender
On Fri, May 4, 2012 at 2:46 PM, Mikus Grinbergs mi...@bga.com wrote:
 On Fri, May 4, 2012 at 2:55 AM, Anish Mangalan...@activitycentral.com
  wrote:

  I am curious to know why we are not periodically checking file systems

  after every N boots on the XO laptop.


 I think the question is:  Should functions which affect the 'system' be
 performed automatically, or should they be explicitly invoked ?

 The currently-existing OLPC precedent is the 'Software updates' facility --
 which gets invoked manually to check whether the system's Activities are
 up-to-date.  My suggestion is to package the required software (fsck?) into
 a new 'Check system' facility - and add an icon for that new facility to the
 'My Settings' panel - then the user who invokes it will know why his XO is
 tied up for a while.  [The local XO-laptop distributing organization can
 publish guidelines as to how often such a 'Check system' facility ought to
 be used.]


+1

 mikus


 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel



-- 
Walter Bender
Sugar Labs
http://www.sugarlabs.org
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel