Re: [Server-devel] Issue with ds-backup in XS 0.4

2008-11-11 Thread Martin Langhoff
On Tue, Nov 11, 2008 at 12:18 AM, Bill Bogstad [EMAIL PROTECTED] wrote:
 I was just about to try to upgrade my XS 0.4 to 0.5 dev8 and noticed
 something odd concerning ds-backup.  When I originally installed 0.4,

Thanks for the report! As Douglas mentions, you can force a
re-registration; however I can't think of any good reason for the
upgraded XO to not perform its backups.

Is there any evidence of the laptop attempting the backups? Some ideas
for debugging:

on the XO
 - look for entries in the cron log
 - check that /etc/cron.d/ds-backup is in place (you can edit it to
get logs of the execution)

on the XS
 - look for entries in the logs that indicate logins via ssh
 - check for permissions/ownership issues in the homedir

 This raises an operational question.  If someone already has deployed
 XS (0.4 or earlier) against which older XO releases are registered,
 what do they do in order to take advantage of ds-backup?

It should just work - any bugs here are worthy of diagnosis.

 Should all
 the XOs re-register?

No -- an important change we want to make - and Douglas alluded to -
is that future versions of the XO sw should 're-register' regularly
and automatically. As the XS gets smarter, it wants more info about
the XO.

cheers,



m
-- 
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Issue with ds-backup in XS 0.4

2008-11-11 Thread Bill Bogstad
On Tue, Nov 11, 2008 at 1:58 PM, Martin Langhoff
[EMAIL PROTECTED] wrote:
 On Tue, Nov 11, 2008 at 12:18 AM, Bill Bogstad [EMAIL PROTECTED] wrote:
 I was just about to try to upgrade my XS 0.4 to 0.5 dev8 and noticed
 something odd concerning ds-backup.  When I originally installed 0.4,

 Thanks for the report! As Douglas mentions, you can force a
 re-registration; however I can't think of any good reason for the
 upgraded XO to not perform its backups.

 Is there any evidence of the laptop attempting the backups? Some ideas
 for debugging:

 on the XO
  - look for entries in the cron log

On the XO which isn't being backed up:

# grep ds-backup.sh /var/log/cron | tail -10
Nov 11 15:30:01 localhost CROND[658]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 16:00:01 localhost CROND[816]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 16:30:01 localhost CROND[1051]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 17:00:02 localhost CROND[1228]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 17:30:01 localhost CROND[1421]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 18:00:02 localhost CROND[1582]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 18:30:01 localhost CROND[1749]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 19:00:02 localhost CROND[1909]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 19:30:02 localhost CROND[2067]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)
Nov 11 20:00:01 localhost CROND[2228]: (olpc) CMD
((/usr/bin/ds-backup.sh 21 )  /dev/null)

Ample evidence that attempts are being made to backup.

  - check that /etc/cron.d/ds-backup is in place (you can edit it to
 get logs of the execution)

 on the XS
  - look for entries in the logs that indicate logins via ssh

On the XS machine:

[EMAIL PROTECTED] ~]# grep Accepted /var/log/secure
Nov  9 19:02:45 schoolserver sshd[17206]: Accepted publickey for
CSN74800E35 from 10.0.0.22 port 36015 ssh2
Nov  9 19:02:45 schoolserver sshd[17211]: Accepted publickey for
CSN74800E35 from 10.0.0.22 port 36016 ssh2
Nov 10 19:08:42 schoolserver sshd[18943]: Accepted publickey for
CSN74800E35 from 10.0.0.22 port 47021 ssh2
Nov 10 19:08:42 schoolserver sshd[18948]: Accepted publickey for
CSN74800E35 from 10.0.0.22 port 47022 ssh2
Nov 10 23:30:42 schoolserver sshd[19173]: Accepted publickey for root
from 10.0.0.8 port 54741 ssh2

That CSN is for the machine that IS being being backed up.

[EMAIL PROTECTED] ~]# grep 'closed' /var/log/secure | tail -10
Nov 11 10:13:30 schoolserver sshd[20289]: Connection closed by 10.0.0.24
Nov 11 10:40:44 schoolserver sshd[20312]: Connection closed by 10.0.0.24
Nov 11 11:12:24 schoolserver sshd[20334]: Connection closed by 10.0.0.24
Nov 11 11:47:46 schoolserver sshd[20354]: Connection closed by 10.0.0.24
Nov 11 12:11:03 schoolserver sshd[20397]: Connection closed by 10.0.0.24
Nov 11 12:35:45 schoolserver sshd[20462]: Connection closed by 10.0.0.24
Nov 11 13:14:50 schoolserver sshd[20486]: Connection closed by 10.0.0.24
Nov 11 13:39:57 schoolserver sshd[20504]: Connection closed by 10.0.0.24
Nov 11 14:11:49 schoolserver sshd[20533]: Connection closed by 10.0.0.24
Nov 11 14:44:29 schoolserver sshd[20553]: Connection closed by 10.0.0.24

That IP address is the one assigned by my DHCP server to the XO that
isn't getting backed up. I'm not using the schoolserver to do DHCP.
My DHCP server has the MAC address of my XO's hardwired to always give
the same IP address to a particular machine.  So these entries are
always from the 'bad' XO.  This would indicate to me that attempts are
reaching the XS, but are failing.

  - check for permissions/ownership issues in the homedir

The files they have in common appear to have the appropriate
ownership/Unix permissions. (I didn't check ACLs.)  The failing XO
home directory has NONE of the datastore entries.  Not the timestamped
ones, nor the -current or -latest entries.  Could this be it?

I'll do some more looking around and wait for your response before I
manually create those entries.

Bill Bogstad
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Issue with ds-backup in XS 0.4

2008-11-11 Thread Bill Bogstad
Okay, I've found the problem with the XO that was failing to backup
and it may imply some issues with older XO releases...

Instead of enabling logging for the cron entry, I copied the
ds-backup.sh script and modified it to not delay and to run
/usr/bin/ds-backup.py explicitly.
ds-backup.py output error messages from ssh complaining about bad
permissions on the ssh key files.

Here are the permissions on the failing machine:

[EMAIL PROTECTED] default]$ ls -l ~olpc/.sugar/default/owner*
-rwxr-xr-x 1 olpc olpc 668 2007-12-26 03:01 /home/olpc/.sugar/default/owner.key
-rwxr-xr-x 1 olpc olpc 590 2007-12-26 03:01
/home/olpc/.sugar/default/owner.key.pub

And here's the working machine:
-bash-3.2# ls -l ~olpc/.sugar/default/owner*
-rw--- 1 olpc olpc 668 2008-10-15 00:07 /home/olpc/.sugar/default/owner.key
-rw-r--r-- 1 olpc olpc 590 2008-10-15 00:07
/home/olpc/.sugar/default/owner.key.pub

The failing machine shows overly permissive permissions on the key
files.  In particular, ds-backup.py generated the following message
when it failed:

__main__.TransferError: ('rsync error code 12, message:',
@@@\r\n@
   WARNING: UNPROTECTED PRIVATE KEY FILE!
@\r\n@@@\r\nPermissions
0755 for '/home/olpc/.sugar/default/owner.key' are too open.\r\nIt is
recommended that your private key files are NOT accessible by
others.\r\nThis private key will be ignored.\r\nbad permissions:
ignore key: /home/olpc/.sugar/default/owner.key\r\nPermission denied
(publickey).\r\nrsync: connection unexpectedly closed (0 bytes
received so far) [sender]\nrsync error: error in rsync protocol data
stream (code 12) at io.c(635) [sender=3.0.3]\n)

I believe that ssh has long had checks which disallow use of key files
which are world readable.  If anyone could read your private key file
then they could attempt to brute force your passphrase.   In this
case, I don't think the private key file even has a passphrase which
makes it even worse.  SSH is unaware of the OLPC's single user
environment.  I changed the permission on the key files to match those
of the machine that works and was able successfully complete a backup
to my XS schoolserver.

The open question is how did the keyfiles get those permissions on the
bad machine? You'll note that the mod time of the file is around
Christmas 2007.  The XO in question was a gift to my daughter and it's
entirely plausible that is when it was first turned on and setup.  I'm
not 100% sure that the machine wasn't reflashed, but based on the date
I doubt it.  This would seem to indicate that somehow the permissions
were set wrong from the moment the keys were generated.   I have a
third G1G1 which is still running the 703 build (my other daughter's
machine). I just checked and the permissions on her key files are also
bad.  The modtime is within a couple of minutes of the other bad
machine.  This would strongly incline me to believe that the
permissions problem was something in the original G1G1 XO install
image.  On the other hand, I just checked trac and there have been
issues in the past with olpc-update changing permissions in ways that
ssh didn't like.

A survey of large number of XOs in the field and/or test
installs/updates using old XO images might be a good idea.  If there
is a latent bug
in many deployed machines which causes backups to fail, it would be a
good idea to know.  I'm not inclined to sacrifice my XO installs
(particularly not my daughter's machines), but could certainly work
with people at 1CC on this.  I might even be able to stop by to help
with the test installs...

Bill Bogstad
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel