Re: [Server-devel] Issue with ds-backup in XS 0.4
On Tue, Nov 11, 2008 at 12:18 AM, Bill Bogstad [EMAIL PROTECTED] wrote: I was just about to try to upgrade my XS 0.4 to 0.5 dev8 and noticed something odd concerning ds-backup. When I originally installed 0.4, Thanks for the report! As Douglas mentions, you can force a re-registration; however I can't think of any good reason for the upgraded XO to not perform its backups. Is there any evidence of the laptop attempting the backups? Some ideas for debugging: on the XO - look for entries in the cron log - check that /etc/cron.d/ds-backup is in place (you can edit it to get logs of the execution) on the XS - look for entries in the logs that indicate logins via ssh - check for permissions/ownership issues in the homedir This raises an operational question. If someone already has deployed XS (0.4 or earlier) against which older XO releases are registered, what do they do in order to take advantage of ds-backup? It should just work - any bugs here are worthy of diagnosis. Should all the XOs re-register? No -- an important change we want to make - and Douglas alluded to - is that future versions of the XO sw should 're-register' regularly and automatically. As the XS gets smarter, it wants more info about the XO. cheers, m -- [EMAIL PROTECTED] [EMAIL PROTECTED] -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Issue with ds-backup in XS 0.4
On Tue, Nov 11, 2008 at 1:58 PM, Martin Langhoff [EMAIL PROTECTED] wrote: On Tue, Nov 11, 2008 at 12:18 AM, Bill Bogstad [EMAIL PROTECTED] wrote: I was just about to try to upgrade my XS 0.4 to 0.5 dev8 and noticed something odd concerning ds-backup. When I originally installed 0.4, Thanks for the report! As Douglas mentions, you can force a re-registration; however I can't think of any good reason for the upgraded XO to not perform its backups. Is there any evidence of the laptop attempting the backups? Some ideas for debugging: on the XO - look for entries in the cron log On the XO which isn't being backed up: # grep ds-backup.sh /var/log/cron | tail -10 Nov 11 15:30:01 localhost CROND[658]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 16:00:01 localhost CROND[816]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 16:30:01 localhost CROND[1051]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 17:00:02 localhost CROND[1228]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 17:30:01 localhost CROND[1421]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 18:00:02 localhost CROND[1582]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 18:30:01 localhost CROND[1749]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 19:00:02 localhost CROND[1909]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 19:30:02 localhost CROND[2067]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Nov 11 20:00:01 localhost CROND[2228]: (olpc) CMD ((/usr/bin/ds-backup.sh 21 ) /dev/null) Ample evidence that attempts are being made to backup. - check that /etc/cron.d/ds-backup is in place (you can edit it to get logs of the execution) on the XS - look for entries in the logs that indicate logins via ssh On the XS machine: [EMAIL PROTECTED] ~]# grep Accepted /var/log/secure Nov 9 19:02:45 schoolserver sshd[17206]: Accepted publickey for CSN74800E35 from 10.0.0.22 port 36015 ssh2 Nov 9 19:02:45 schoolserver sshd[17211]: Accepted publickey for CSN74800E35 from 10.0.0.22 port 36016 ssh2 Nov 10 19:08:42 schoolserver sshd[18943]: Accepted publickey for CSN74800E35 from 10.0.0.22 port 47021 ssh2 Nov 10 19:08:42 schoolserver sshd[18948]: Accepted publickey for CSN74800E35 from 10.0.0.22 port 47022 ssh2 Nov 10 23:30:42 schoolserver sshd[19173]: Accepted publickey for root from 10.0.0.8 port 54741 ssh2 That CSN is for the machine that IS being being backed up. [EMAIL PROTECTED] ~]# grep 'closed' /var/log/secure | tail -10 Nov 11 10:13:30 schoolserver sshd[20289]: Connection closed by 10.0.0.24 Nov 11 10:40:44 schoolserver sshd[20312]: Connection closed by 10.0.0.24 Nov 11 11:12:24 schoolserver sshd[20334]: Connection closed by 10.0.0.24 Nov 11 11:47:46 schoolserver sshd[20354]: Connection closed by 10.0.0.24 Nov 11 12:11:03 schoolserver sshd[20397]: Connection closed by 10.0.0.24 Nov 11 12:35:45 schoolserver sshd[20462]: Connection closed by 10.0.0.24 Nov 11 13:14:50 schoolserver sshd[20486]: Connection closed by 10.0.0.24 Nov 11 13:39:57 schoolserver sshd[20504]: Connection closed by 10.0.0.24 Nov 11 14:11:49 schoolserver sshd[20533]: Connection closed by 10.0.0.24 Nov 11 14:44:29 schoolserver sshd[20553]: Connection closed by 10.0.0.24 That IP address is the one assigned by my DHCP server to the XO that isn't getting backed up. I'm not using the schoolserver to do DHCP. My DHCP server has the MAC address of my XO's hardwired to always give the same IP address to a particular machine. So these entries are always from the 'bad' XO. This would indicate to me that attempts are reaching the XS, but are failing. - check for permissions/ownership issues in the homedir The files they have in common appear to have the appropriate ownership/Unix permissions. (I didn't check ACLs.) The failing XO home directory has NONE of the datastore entries. Not the timestamped ones, nor the -current or -latest entries. Could this be it? I'll do some more looking around and wait for your response before I manually create those entries. Bill Bogstad ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Issue with ds-backup in XS 0.4
Okay, I've found the problem with the XO that was failing to backup and it may imply some issues with older XO releases... Instead of enabling logging for the cron entry, I copied the ds-backup.sh script and modified it to not delay and to run /usr/bin/ds-backup.py explicitly. ds-backup.py output error messages from ssh complaining about bad permissions on the ssh key files. Here are the permissions on the failing machine: [EMAIL PROTECTED] default]$ ls -l ~olpc/.sugar/default/owner* -rwxr-xr-x 1 olpc olpc 668 2007-12-26 03:01 /home/olpc/.sugar/default/owner.key -rwxr-xr-x 1 olpc olpc 590 2007-12-26 03:01 /home/olpc/.sugar/default/owner.key.pub And here's the working machine: -bash-3.2# ls -l ~olpc/.sugar/default/owner* -rw--- 1 olpc olpc 668 2008-10-15 00:07 /home/olpc/.sugar/default/owner.key -rw-r--r-- 1 olpc olpc 590 2008-10-15 00:07 /home/olpc/.sugar/default/owner.key.pub The failing machine shows overly permissive permissions on the key files. In particular, ds-backup.py generated the following message when it failed: __main__.TransferError: ('rsync error code 12, message:', @@@\r\n@ WARNING: UNPROTECTED PRIVATE KEY FILE! @\r\n@@@\r\nPermissions 0755 for '/home/olpc/.sugar/default/owner.key' are too open.\r\nIt is recommended that your private key files are NOT accessible by others.\r\nThis private key will be ignored.\r\nbad permissions: ignore key: /home/olpc/.sugar/default/owner.key\r\nPermission denied (publickey).\r\nrsync: connection unexpectedly closed (0 bytes received so far) [sender]\nrsync error: error in rsync protocol data stream (code 12) at io.c(635) [sender=3.0.3]\n) I believe that ssh has long had checks which disallow use of key files which are world readable. If anyone could read your private key file then they could attempt to brute force your passphrase. In this case, I don't think the private key file even has a passphrase which makes it even worse. SSH is unaware of the OLPC's single user environment. I changed the permission on the key files to match those of the machine that works and was able successfully complete a backup to my XS schoolserver. The open question is how did the keyfiles get those permissions on the bad machine? You'll note that the mod time of the file is around Christmas 2007. The XO in question was a gift to my daughter and it's entirely plausible that is when it was first turned on and setup. I'm not 100% sure that the machine wasn't reflashed, but based on the date I doubt it. This would seem to indicate that somehow the permissions were set wrong from the moment the keys were generated. I have a third G1G1 which is still running the 703 build (my other daughter's machine). I just checked and the permissions on her key files are also bad. The modtime is within a couple of minutes of the other bad machine. This would strongly incline me to believe that the permissions problem was something in the original G1G1 XO install image. On the other hand, I just checked trac and there have been issues in the past with olpc-update changing permissions in ways that ssh didn't like. A survey of large number of XOs in the field and/or test installs/updates using old XO images might be a good idea. If there is a latent bug in many deployed machines which causes backups to fail, it would be a good idea to know. I'm not inclined to sacrifice my XO installs (particularly not my daughter's machines), but could certainly work with people at 1CC on this. I might even be able to stop by to help with the test installs... Bill Bogstad ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel