Re: [Lxc-users] errors
On 05/23/2013 01:46 AM, Tamas Papp wrote: On 05/23/2013 04:27 AM, Stéphane Graber wrote: Oops, looks like I broke lxc-ls --fancy with my recent get_ips() API change. I'll fix it directly to staging (trivial fix) and trigger a new daily build, you should be able to update to a fixed package in the next couple of hours. hi, Although the the package is not here, I downloaded the raw file from github, and the function is indeed fixed. Though still there is the FUTEX_WAIT error strace -ff lxc-ls --fancy: [...] geteuid() = 0 statfs(/dev/shm, {f_type=0x1021994, f_bsize=4096, f_blocks=16498192, f_bfree=16498156, f_bavail=16498156, f_files=16498192, f_ffree=16498155, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0 futex(0x7ff06b4cc31c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 open(/dev/shm/sem.lxcapi.jcb-vmc02, O_RDWR|O_NOFOLLOW) = 3 fstat(3, {st_mode=S_IFREG|0640, st_size=32, ...}) = 0 mmap(NULL, 32, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7ff06b65d000 close(3)= 0 stat(/var/lib/lxc/jcb-vmc02/config, {st_mode=S_IFREG|0644, st_size=1293, ...}) = 0 futex(0x7ff06b65d000, FUTEX_WAIT, 0, NULL And it's waiting here... What is it waiting for? I quite lost now.. Thanks, tamas That looks like broken locking, though Serge would know for sure. You may want to try clearing /dev/shm/*lxc* and see if that fixes the problem (not usually recommended as those locks are there for a reason). -- Stéphane Graber Ubuntu developer http://www.ubuntu.com signature.asc Description: OpenPGP digital signature -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 03:35 PM, Stéphane Graber wrote: That looks like broken locking, though Serge would know for sure. You may want to try clearing /dev/shm/*lxc* and see if that fixes the problem (not usually recommended as those locks are there for a reason). OK. At this moment I'm trying to reproduce the problem on a test machine, but still no luck. I'll let you know, as soon as I have something. Thanks, tamas -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 03:39 PM, Tamas Papp wrote: On 05/23/2013 03:35 PM, Stéphane Graber wrote: That looks like broken locking, though Serge would know for sure. You may want to try clearing /dev/shm/*lxc* and see if that fixes the problem (not usually recommended as those locks are there for a reason). OK. At this moment I'm trying to reproduce the problem on a test machine, but still no luck. I'll let you know, as soon as I have something. Well, I still was success in this. No I have 3 other system with the symptoms (FUTEX_WAIT, segfaults). Regerding to lock file, I renamed /dev/shm/sem.lxcapi.jcb-vmc02 and it's recreated automatically. It is still there, no changes. tamas -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
Quoting Tamas Papp (tom...@martos.bme.hu): On 05/23/2013 03:39 PM, Tamas Papp wrote: On 05/23/2013 03:35 PM, Stéphane Graber wrote: That looks like broken locking, though Serge would know for sure. You may want to try clearing /dev/shm/*lxc* and see if that fixes the problem (not usually recommended as those locks are there for a reason). OK. At this moment I'm trying to reproduce the problem on a test machine, but still no luck. I'll let you know, as soon as I have something. Well, I still was success in this. No I have 3 other system with the symptoms (FUTEX_WAIT, segfaults). Regerding to lock file, I renamed /dev/shm/sem.lxcapi.jcb-vmc02 and it's recreated automatically. It is still there, no changes. I'm in the process of replacing those locks. In the meantime, the /dev/shm/sem.* files are removed when things exit normally, but if you kill a lxc-* command while it is holding the lock, you may have to remove it manually. If the commands are still running, then of course they should still be there, as the intent is to keep two programs which are both doing c-set_config_item(lxc.rootfs, x) from overwiting each other. I'll try to get the replacement for that ready for staging by tomorrow night. -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 07:34 PM, Serge Hallyn wrote: Quoting Tamas Papp (tom...@martos.bme.hu): On 05/23/2013 03:39 PM, Tamas Papp wrote: On 05/23/2013 03:35 PM, Stéphane Graber wrote: That looks like broken locking, though Serge would know for sure. You may want to try clearing /dev/shm/*lxc* and see if that fixes the problem (not usually recommended as those locks are there for a reason). OK. At this moment I'm trying to reproduce the problem on a test machine, but still no luck. I'll let you know, as soon as I have something. Well, I still was success in this. No I have 3 other system with the symptoms (FUTEX_WAIT, segfaults). Regerding to lock file, I renamed /dev/shm/sem.lxcapi.jcb-vmc02 and it's recreated automatically. It is still there, no changes. I'm in the process of replacing those locks. In the meantime, the /dev/shm/sem.* files are removed when things exit normally, but if you kill a lxc-* command while it is holding the lock, you may have to remove it manually. If the commands are still running, then of course they should still be there, as the intent is to keep two programs which are both doing c-set_config_item(lxc.rootfs, x) from overwiting each other. I'll try to get the replacement for that ready for staging by tomorrow night. FYI, now it works fine now with 0.9.0.0~staging~20130523-0240-0ubuntu1~ppa1~precise1 . For this I had to stop all containers then start them again. Thanks for you help guys, tamas ps.: Can I expect similar bugs from these changes, like this was? -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
Quoting Tamas Papp (tom...@martos.bme.hu): On 05/23/2013 07:34 PM, Serge Hallyn wrote: Quoting Tamas Papp (tom...@martos.bme.hu): On 05/23/2013 03:39 PM, Tamas Papp wrote: On 05/23/2013 03:35 PM, Stéphane Graber wrote: That looks like broken locking, though Serge would know for sure. You may want to try clearing /dev/shm/*lxc* and see if that fixes the problem (not usually recommended as those locks are there for a reason). OK. At this moment I'm trying to reproduce the problem on a test machine, but still no luck. I'll let you know, as soon as I have something. Well, I still was success in this. No I have 3 other system with the symptoms (FUTEX_WAIT, segfaults). Regerding to lock file, I renamed /dev/shm/sem.lxcapi.jcb-vmc02 and it's recreated automatically. It is still there, no changes. I'm in the process of replacing those locks. In the meantime, the /dev/shm/sem.* files are removed when things exit normally, but if you kill a lxc-* command while it is holding the lock, you may have to remove it manually. If the commands are still running, then of course they should still be there, as the intent is to keep two programs which are both doing c-set_config_item(lxc.rootfs, x) from overwiting each other. I'll try to get the replacement for that ready for staging by tomorrow night. FYI, now it works fine now with 0.9.0.0~staging~20130523-0240-0ubuntu1~ppa1~precise1 . For this I had to stop all containers then start them again. Ok - I suspect what was happening was that every lxc-ls was segfaulting while holding the semaphore, then the next lxc-* action using the api would hang. Thanks for you help guys, tamas ps.: Can I expect similar bugs from these changes, like this was? Not sure what you mean. -serge -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 07:55 PM, Serge Hallyn wrote: FYI, now it works fine now with 0.9.0.0~staging~20130523-0240-0ubuntu1~ppa1~precise1 . For this I had to stop all containers then start them again. Ok - I suspect what was happening was that every lxc-ls was segfaulting while holding the semaphore, then the next lxc-* action using the api would hang. Thanks for you help guys, tamas ps.: Can I expect similar bugs from these changes, like this was? Not sure what you mean. Are the planned (lock file) changes going to be risky, so it may happen, that, say, it starts to segfaulting again? I'm just interested, wheter it's safe to upgrade to the latest version in the next few days. Unfortunately there is only daily ppa for 0.9 tree. Thanks, tamas -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 09:47 PM, Serge Hallyn wrote: The lxc lock had nothing to do with the segfaulting - and no, the new hanges will simply swap out use of named semaphore for a flock on an open fd (so that they get auto-cleaned if process is killed). Any potential for segvs *should* be found while I test. OK I'm just interested, wheter it's safe to upgrade to the latest version in the next few days. Unfortunately there is only daily ppa for 0.9 tree. raring and saucy are based on 0.9. For the most part the daily ppa is actually very stable. It goes through quite a bit of automated testing before it gets published. It's what I use on my main dev box, though when Im working on something that takes a few days, I sometimes end up a few days behind while I test my feature. Yes, I know and I use it in production on a couple of servers. Thanks to you! I think this is the second time a bug has been exposed mainly through lxc-list (because it uses quite a bit of the api), so lxc-list should be added explicitly to the testsuite. You mean lxc-ls? BTW, lxc-list: # lxc-list WARNING: lxc-list is deprecated, please use lxc-ls --fancy. This symlink will be dropped in LXC 1.0. NAME STATEIPV4IPV6 AUTOSTART - I use this alias as an lxc-ls --fancy: alias lxc-list='lxc-ls --fancy --fancy-format name,state,ipv4,autostart' Whould it not be better if lxc-list would be left as it is now, or even it's behaviour would be defined in /etc/lxc or /etc/default/lxc. Something like: LXC_LIST_SWITCHES=--fancy --fancy-format name,state,ipv4,autostart Just a stupid idea. tamas -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
Quoting Tamas Papp (tom...@martos.bme.hu): On 05/23/2013 09:47 PM, Serge Hallyn wrote: The lxc lock had nothing to do with the segfaulting - and no, the new hanges will simply swap out use of named semaphore for a flock on an open fd (so that they get auto-cleaned if process is killed). Any potential for segvs *should* be found while I test. OK I'm just interested, wheter it's safe to upgrade to the latest version in the next few days. Unfortunately there is only daily ppa for 0.9 tree. raring and saucy are based on 0.9. For the most part the daily ppa is actually very stable. It goes through quite a bit of automated testing before it gets published. It's what I use on my main dev box, though when Im working on something that takes a few days, I sometimes end up a few days behind while I test my feature. Yes, I know and I use it in production on a couple of servers. Thanks to you! I think this is the second time a bug has been exposed mainly through lxc-list (because it uses quite a bit of the api), so lxc-list should be added explicitly to the testsuite. You mean lxc-ls? BTW, lxc-list: # lxc-list WARNING: lxc-list is deprecated, please use lxc-ls --fancy. This symlink will be dropped in LXC 1.0. NAME STATEIPV4IPV6 AUTOSTART - I use this alias as an lxc-ls --fancy: alias lxc-list='lxc-ls --fancy --fancy-format name,state,ipv4,autostart' Whould it not be better if lxc-list would be left as it is now, or Yeah, I'm hoping to convince Stéphane to leave lxc-list after all :) Even if it is just shipped as a symlink, with lxc-ls deciding how to behave based on argv[0]. even it's behaviour would be defined in /etc/lxc or /etc/default/lxc. Something like: LXC_LIST_SWITCHES=--fancy --fancy-format name,state,ipv4,autostart Just a stupid idea. Hm, yeah. If there are other common long options that people like to use (I don't use any others) then this would be worthwhile. -serge -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 01:02 AM, Tamas Papp wrote: hi All, # lxc-ls --fancy Traceback (most recent call last): File /usr/bin/lxc-ls, line 221, in module ips = container.get_ips(protocol=protocol, timeout=1) TypeError: 'protocol' is an invalid keyword argument for this function # lxc-info -n sc --state-is=running # echo $? 1 The container is running. ii lxc 0.9.0.0~staging~20130521-1727-0ubuntu1~ppa1~pre Linux Containers userspace tools # lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 12.04.2 LTS Release:12.04 Codename:precise Linux virt102 3.2.0-43-generic #68-Ubuntu SMP Wed May 15 03:33:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux I'm not sure, it's related, but the server is after a restart, because lxc-* commands were segfaulting. Do you have an idea? I downgraded to 0.9.0.0~staging~20130516-1655-0ubuntu1~ppa1~precise1, segfault. Then reboot and now it's stuck at this stage: Process 77488 detached ... wait4 resumed [{WIFEXITED(s) WEXITSTATUS(s) == 0}], 0, NULL) = 77488 --- SIGCHLD (Child exited) @ 0 (0) --- fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(3, , 6) = 0 close(3)= 0 geteuid() = 0 open(/dev/shm/sem.lxcapi.bioreg-vmc01, O_RDWR|O_NOFOLLOW) = 3 fstat(3, {st_mode=S_IFREG|0640, st_size=32, ...}) = 0 mmap(NULL, 32, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7fe5c5fce000 close(3)= 0 stat(/var/lib/lxc/bioreg-vmc01/config, {st_mode=S_IFREG|0644, st_size=1552, ...}) = 0 futex(0x7fe5c5fce000, FUTEX_WAIT, 0, NULL This is the first container on the list. Any idea? Thanks, tamas -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/22/2013 07:46 PM, Tamas Papp wrote: On 05/23/2013 01:36 AM, Tamas Papp wrote: On 05/23/2013 01:02 AM, Tamas Papp wrote: hi All, # lxc-ls --fancy Traceback (most recent call last): File /usr/bin/lxc-ls, line 221, in module ips = container.get_ips(protocol=protocol, timeout=1) TypeError: 'protocol' is an invalid keyword argument for this function # lxc-info -n sc --state-is=running # echo $? 1 The container is running. ii lxc 0.9.0.0~staging~20130521-1727-0ubuntu1~ppa1~pre Linux Containers userspace tools # lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 12.04.2 LTS Release:12.04 Codename:precise Linux virt102 3.2.0-43-generic #68-Ubuntu SMP Wed May 15 03:33:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux I'm not sure, it's related, but the server is after a restart, because lxc-* commands were segfaulting. Do you have an idea? I downgraded to 0.9.0.0~staging~20130516-1655-0ubuntu1~ppa1~precise1, segfault. Then reboot and now it's stuck at this stage: Process 77488 detached ... wait4 resumed [{WIFEXITED(s) WEXITSTATUS(s) == 0}], 0, NULL) = 77488 --- SIGCHLD (Child exited) @ 0 (0) --- fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(3, , 6) = 0 close(3)= 0 geteuid() = 0 open(/dev/shm/sem.lxcapi.bioreg-vmc01, O_RDWR|O_NOFOLLOW) = 3 fstat(3, {st_mode=S_IFREG|0640, st_size=32, ...}) = 0 mmap(NULL, 32, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7fe5c5fce000 close(3)= 0 stat(/var/lib/lxc/bioreg-vmc01/config, {st_mode=S_IFREG|0644, st_size=1552, ...}) = 0 futex(0x7fe5c5fce000, FUTEX_WAIT, 0, NULL This is the first container on the list. Sorry for the massmail... In outline there was a segfault. I rebooted the machine, then there was no segfault, but api protocol error. I downgraded lxc version, reboot and it was stuck. I upgraded lxc (no reboot), and segfault is there. I have two this kind of machines: lxc latest version from the daily ppa, zfs backend, but different kernel (3.2 vs. 3.8 - backported). 10-30 container. Both produce the issue. There is a similar (actually a couple of days ago installed machine, with 3 easy containers and no issue). I hope, it helps, tamas Oops, looks like I broke lxc-ls --fancy with my recent get_ips() API change. I'll fix it directly to staging (trivial fix) and trigger a new daily build, you should be able to update to a fixed package in the next couple of hours. Thanks for the report. -- Stéphane Graber Ubuntu developer http://www.ubuntu.com signature.asc Description: OpenPGP digital signature -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] errors
On 05/23/2013 04:27 AM, Stéphane Graber wrote: Oops, looks like I broke lxc-ls --fancy with my recent get_ips() API change. I'll fix it directly to staging (trivial fix) and trigger a new daily build, you should be able to update to a fixed package in the next couple of hours. hi, Although the the package is not here, I downloaded the raw file from github, and the function is indeed fixed. Though still there is the FUTEX_WAIT error strace -ff lxc-ls --fancy: [...] geteuid() = 0 statfs(/dev/shm, {f_type=0x1021994, f_bsize=4096, f_blocks=16498192, f_bfree=16498156, f_bavail=16498156, f_files=16498192, f_ffree=16498155, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0 futex(0x7ff06b4cc31c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 open(/dev/shm/sem.lxcapi.jcb-vmc02, O_RDWR|O_NOFOLLOW) = 3 fstat(3, {st_mode=S_IFREG|0640, st_size=32, ...}) = 0 mmap(NULL, 32, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7ff06b65d000 close(3)= 0 stat(/var/lib/lxc/jcb-vmc02/config, {st_mode=S_IFREG|0644, st_size=1293, ...}) = 0 futex(0x7ff06b65d000, FUTEX_WAIT, 0, NULL And it's waiting here... What is it waiting for? I quite lost now.. Thanks, tamas -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users