Re: [boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux
hi Christian yes that is exactly what I was doing initially, except 1. the Account_ file has my weak auth instead of the full auth. 2 as I have already said, the remote_hosts.cfg and gui_rpc_auth.cfg files have been customised But it is this set up that is causing me problems, hence my thought to introduce a minimal client state file to pre-empt the cpid -- just to be clear I have not given the client a minimal client_state.xml yet, it my next thing to try, with a pre-filled cpid in it, I still do not understand why your clusters are working, and my diskless workstations are getting the server confused... R~~ On 24 January 2017 at 09:35, Christian Beerwrote: > On 23.01.2017 22:27, trueriver wrote: > > As for the network booting machines, I already have a script running in > the initrd that sets up the boinc directories ready for the client to be > started. At present I give the client exactly the same files it gets from > the post install trigger, plus an account_xml file so that it thinks it > is already connected to a project. > > Sudden thought: will the client get confused by having an account_xml > file but no client_state.xml ?? I have just realised I am giving it a > combination of files that would not occur in normal use, so even though I > believe that SHOULD work, it is a corner case that may not have been tested > in development > > If you want to attach to a project right after startup automatically you > only need an account_.xml file. The client_state.xml should not be > transfered. This file saves the state of a specific client on a specific > machine. There should be no information in there that you need to copy over > to a new computer. > > See here: http://boinc.berkeley.edu/wiki/Creating_custom_installers and > here http://boinc.berkeley.edu/wiki/Initialization_files on how to prime > the Client with a project account or an account manager. > > Btw: In our Cluster environment we supply a basic account_*.xml right > after installing the distribution supplied package which works very nice. > The content is like this: > > PROJECT_MASTER_URL > 12345678901234567890123456789012 > > This should do the same as the project_init.xml described in the links > above. > > Regards > Christian > ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
Re: [boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux
On 23.01.2017 22:27, trueriver wrote: > As for the network booting machines, I already have a script running > in the initrd that sets up the boinc directories ready for the client > to be started. At present I give the client exactly the same files it > gets from the post install trigger, plus an account_xml file so > that it thinks it is already connected to a project. > > Sudden thought: will the client get confused by having an > account_xml file but no client_state.xml ?? I have just realised I > am giving it a combination of files that would not occur in normal > use, so even though I believe that SHOULD work, it is a corner case > that may not have been tested in development If you want to attach to a project right after startup automatically you only need an account_.xml file. The client_state.xml should not be transfered. This file saves the state of a specific client on a specific machine. There should be no information in there that you need to copy over to a new computer. See here: http://boinc.berkeley.edu/wiki/Creating_custom_installers and here http://boinc.berkeley.edu/wiki/Initialization_files on how to prime the Client with a project account or an account manager. Btw: In our Cluster environment we supply a basic account_*.xml right after installing the distribution supplied package which works very nice. The content is like this: > | PROJECT_MASTER_URL > 12345678901234567890123456789012 > | This should do the same as the project_init.xml described in the links above. Regards Christian ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
Re: [boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux
hi Christian thanks for your reply. I will look at the relevant threads on the devs forum, but I don't initially see how the scenarios you describe apply to either of my situations. For one thing the CLI (Debian Jessie) boxes do not have virtual network devices, and have exactly one network card, which must be running for the initrd to exit to systemd (because they are diskless, initrd needs a working network to find its nfs root file system). This is a fairly simple boot process which I think I understand (base system, sshd, and boinc; nothing else). I am not so confident about knowing what goes on when the Mint boxes boot up - virtual box is mentioned but I believe only to say that various modules are not being loaded as we are not in a virtual environment. But thanks for the tip, it is something I must check out I do see the issue if huge clusters generated new host records every time they booted up: I understand now the devs motivation for moving away from having a brand new cpid on each boot. I see why my changes are therefore not ones that you would want in the official release. I certainly would not claim to "understand" .deb packages, but I do know enough to add to an existing post-install script. As for the network booting machines, I already have a script running in the initrd that sets up the boinc directories ready for the client to be started. At present I give the client exactly the same files it gets from the post install trigger, plus an account_xml file so that it thinks it is already connected to a project. Sudden thought: will the client get confused by having an account_xml file but no client_state.xml ?? I have just realised I am giving it a combination of files that would not occur in normal use, so even though I believe that SHOULD work, it is a corner case that may not have been tested in development... you said > > The occasional problems you see might stem from the fact that the > network may not be working at the time when the BOINC Client tries to > get the MAC address and uses a random host-CPID instead. > > random would work for me, as it would be different each time. It must default to a _deterministic_ host-CPID to create collisions with older hosts. The suggestion on the Prime Grid forum was that it hashes the data directory with the MAC address, and that if the MAC is missing for any reason then, as the data directory is constant you always get the same hash Thanks for the links to the post on the dev forum, and to the formal description of the hash, I am off to read them now warm regards River~~ ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
Re: [boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux
On 22.01.2017 20:24, trueriver wrote: > I believe I am seeing hash collisions in cpid on installing the client in > Debian and Mint. I also believe the Mint package is the unchanged Debian > one, inherited via Ubuntu. Hi, there was a similar issue on the BOINC dev forums recently. https://boinc.berkeley.edu/dev/forum_thread.php?id=11397 It turned out that the user had a software installed that created a virtual network device using the same MAC on two different computers. This virtual device was listed first when doing an "ipconfig" (Windows) and thus was used to create the Host-CPID. I guess something along these lines is also happening to you. To recapitulate: the Client takes the first MAC it finds and creates the md5 of it. This is send to the server and is used there to find out if there is an old host entry that this host belongs to. What could help on the PG server side is this commit: https://github.com/BOINC/boinc/commit/9daab7acb1a8d0137fb9d52f87a8845d126b5bff it makes sure that the hosts may only differ by GPU. This would create a new host entry if the old host and the new are different. The reason of the MAC hashing is so that in a Cluster environment where a node gets reinstalled on a regular basis (at least it gets a fresh BOINC Client directory) the server doesn't create thousands of host entries but reuses old ones. This assumes that within a Cluster each node has a unique MAC. The idea about overriding host-CPID hashing and make it random may in fact solve your problem. For reference, here is the 7.6.33 implementation of how the host-CPID is determined: https://github.com/BOINC/boinc/blob/client_release/7/7.6/client/hostinfo_network.cpp#L127 If you are familiar with how debian packages work you could write a patch that you apply locally and always create a random string or read the cpid from an external file (if it exists). I'm not sure if that should go upstream. I don't see this use-case to be widespread but I don't see any danger from it right now. The occasional problems you see might stem from the fact that the network may not be working at the time when the BOINC Client tries to get the MAC address and uses a random host-CPID instead. Regards Christian ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
Re: [boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux
Hello, starting with your second point, I rest assured that a patch that helps you and does not harm or confuse others would be well received also by Gianfranco. We just need a very simple way to phrase that you want to present a machine as new to BOINC even though BOINC has already seen it. I have installed BOINC on many machines and yet have not encountered any identity theft myself. In the contrary, an attempt to start BOINC as a regular user introduced a clone of the machine. I can somehow feel with you that something is a bit weird at times, but nothing reproducible on my side, yet. For your doppelganger issue I somehow feel that it is less a Debian thingy but something erratic in the boinc client code. What does "hostid" tell on the two machines that think to be the same? I admit not to know if hostid is used by BOINC, would make sense if not, though. Gianfranco may have extra insights. Best, Steffen On 22/01/2017 20:24, trueriver wrote: > apols if this is a double post, I am not convinced the first copy went out > OK. > -- > > hi thanks Steffen and Christian for your kind words. > > I believe I am seeing hash collisions in cpid on installing the client in > Debian and Mint. I also believe the Mint package is the unchanged Debian > one, inherited via Ubuntu. > > The symptoms I am seeing are that when a new computer is added to my little > farm, it sometimes is taken by the PrimeGrid (PG)server to be an existing > host. > > This is bad for two reasons, and irritating for a third. > > 1. I cannot rely on setting the default location for new computers, because > the new machine will come up in whatever location the doppelganger had. > This means that it may download and start crunching work that, for example, > will run for longer than that host has really got. > > 2. If the doppelganger had work in progress, then that work is marked as > abandoned. That means that a new task is sent to someone else, wasting the > collective time of the project. > > (PG have installed two work arounds that ensure that if I go on crunching I > do not lose credit. If a task completes and is shjown as abandoned at the > time of completion, it is sent for validation as if it were not abandoned. > If a task trickles up then it reverts to being in progress or overdue, and > then when it subsequently reports it goes for validation. Providing either > of these happen before the WU is deleted from the server, the user gets > credit -- neither feature is standard on other projects, or so I understand) > > With credit assured, providing I finish the work, that gives me a moral > dilemma when the allegedly abandoined work is 10% into a 20 day task. If I > abort it I lose credit, but if i continue it I am getting the last 80% of > the credit for work I know is now being done by TWO other machines, which > is a waste of the project's resources. > > 3 (a lesser irritation) when I am testing out different settings (running > with and with hyperthreading, say) by mixing up historic hosts it makes it > harder for me to track which host was doing what when. > > I have seen this happen among three laptops, running LinuxMInt Mate 17.1, > Cinnamon 18, and Cinnamon 18.1. Two of these laptops have the same CPU > model, but i7-6500U, but the third has a model number that looks rather > different, m5 6y54. The cpus are similar in that they are all at the > expensive end of the mobile processor range, > > When this has happened with these laptops, each time the respective OS was > installed from live CD/USB, and boinc installed with synaptic, searching > for the boinc meta package. > > The first time it happened, March 2016, I was told that I had provoked the > problem by using the same usb ethernet dongle and the MAC address was > therefore the same. So I went out and bought another couple of dongles, and > labelled them for the respective machines. I honestly believe I have not > swapped them around indavertently. > > This week (jan 2017) the same happened again, involving one of the original > two laptops and one that had not been involved before. Different cpu, > different usb dongle, even different kernel versions as I had not ywt > updated the older machine's kernel at that time. Different manufacturer, so > different hardware on motherboard, etc etc. > > The oddest feature is that after updating from both laptops a number of > times, all of a sudden the server was showing them as separate machines, > and had correctly assigned all 8 tasks issued to the new machine to that > machine, and correctly assigned all the historic tasks and stats to the old > machine. > > So I am wondering how it did that. Perhaps it is not the cpid at all, > perhaps it is the server software being too clever? > > This effect also leaves oddities on the server, like this from my first > experience of this issue > > http://www.primegrid.com/show_host_detail.php?hostid=512618 > > as you can see the computer has a different creation and last contact time,
[boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux
apols if this is a double post, I am not convinced the first copy went out OK. -- hi thanks Steffen and Christian for your kind words. I believe I am seeing hash collisions in cpid on installing the client in Debian and Mint. I also believe the Mint package is the unchanged Debian one, inherited via Ubuntu. The symptoms I am seeing are that when a new computer is added to my little farm, it sometimes is taken by the PrimeGrid (PG)server to be an existing host. This is bad for two reasons, and irritating for a third. 1. I cannot rely on setting the default location for new computers, because the new machine will come up in whatever location the doppelganger had. This means that it may download and start crunching work that, for example, will run for longer than that host has really got. 2. If the doppelganger had work in progress, then that work is marked as abandoned. That means that a new task is sent to someone else, wasting the collective time of the project. (PG have installed two work arounds that ensure that if I go on crunching I do not lose credit. If a task completes and is shjown as abandoned at the time of completion, it is sent for validation as if it were not abandoned. If a task trickles up then it reverts to being in progress or overdue, and then when it subsequently reports it goes for validation. Providing either of these happen before the WU is deleted from the server, the user gets credit -- neither feature is standard on other projects, or so I understand) With credit assured, providing I finish the work, that gives me a moral dilemma when the allegedly abandoined work is 10% into a 20 day task. If I abort it I lose credit, but if i continue it I am getting the last 80% of the credit for work I know is now being done by TWO other machines, which is a waste of the project's resources. 3 (a lesser irritation) when I am testing out different settings (running with and with hyperthreading, say) by mixing up historic hosts it makes it harder for me to track which host was doing what when. I have seen this happen among three laptops, running LinuxMInt Mate 17.1, Cinnamon 18, and Cinnamon 18.1. Two of these laptops have the same CPU model, but i7-6500U, but the third has a model number that looks rather different, m5 6y54. The cpus are similar in that they are all at the expensive end of the mobile processor range, When this has happened with these laptops, each time the respective OS was installed from live CD/USB, and boinc installed with synaptic, searching for the boinc meta package. The first time it happened, March 2016, I was told that I had provoked the problem by using the same usb ethernet dongle and the MAC address was therefore the same. So I went out and bought another couple of dongles, and labelled them for the respective machines. I honestly believe I have not swapped them around indavertently. This week (jan 2017) the same happened again, involving one of the original two laptops and one that had not been involved before. Different cpu, different usb dongle, even different kernel versions as I had not ywt updated the older machine's kernel at that time. Different manufacturer, so different hardware on motherboard, etc etc. The oddest feature is that after updating from both laptops a number of times, all of a sudden the server was showing them as separate machines, and had correctly assigned all 8 tasks issued to the new machine to that machine, and correctly assigned all the historic tasks and stats to the old machine. So I am wondering how it did that. Perhaps it is not the cpid at all, perhaps it is the server software being too clever? This effect also leaves oddities on the server, like this from my first experience of this issue http://www.primegrid.com/show_host_detail.php?hostid=512618 as you can see the computer has a different creation and last contact time, so you might think it had contacted the server at least twice. But by the server's own count, it has done so zero times. Maybe you can see how that makes sense (apart from it being a tunnelling effect of your quantum computing module ;) I am now told on the PG forum that "Linux sometimes fails to pick up the MAC address". ALSO, I have seen this among my collection of 11 desktop machines, 2 of which are identical apart from MAC address, and 1 is a NFS server, and 8 are diskless loading their OS from the server using PXE and root=/dev/nfs. The server runs LinuxMInt 18.1, The other desktop machines run a minimal Debian command line OS, netinstall plus ssh plus boinc-client. These are cloned, but the boinc directories are re-initialised each time to contain only the four config files in /etc/boinc-client and softlinks to them from /var/lib/boinc, plus a minimal account_www.primegird.xml that provides my weak auth code. In particular, there is no contamination of the value as the file that holds that value is not cloned. Running the diskless machines one at a time works fine, but it does