Re: [boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux

2017-01-22 Thread Steffen Möller
Hello,

starting with your second point, I rest assured that a patch that helps
you and does not harm or confuse others would be well received also by
Gianfranco. We just need a very simple way to phrase that you want to
present a machine as new to BOINC even though BOINC has already seen it.

I have installed BOINC on many machines and yet have not encountered any
identity theft myself. In the contrary, an attempt to start BOINC as a
regular user introduced a clone of the machine. I can somehow feel with
you that something is a bit weird at times, but nothing reproducible on
my side, yet.

For your doppelganger issue I somehow feel that it is less a Debian
thingy but something erratic in the boinc client code. What does
"hostid" tell on the two machines that think to be the same? I admit not
to know if hostid is used by BOINC, would make sense if not, though.
Gianfranco may have extra insights.

Best,

Steffen


On 22/01/2017 20:24, trueriver wrote:
> apols if this is a double post, I am not convinced the first copy went out
> OK.
> --
>
> hi thanks Steffen and Christian for your kind words.
>
> I believe I am seeing hash collisions in cpid on installing the client in
> Debian and Mint. I also believe the Mint package is the unchanged Debian
> one, inherited via Ubuntu.
>
> The symptoms I am seeing are that when a new computer is added to my little
> farm, it sometimes is taken by the PrimeGrid  (PG)server to be an existing
> host.
>
> This is bad for two reasons, and irritating for a third.
>
> 1. I cannot rely on setting the default location for new computers, because
> the new machine will come up in whatever location the doppelganger had.
> This means that it may download and start crunching work that, for example,
> will run for longer than that host has really got.
>
> 2. If the doppelganger had work in progress, then that work is marked as
> abandoned. That means that a new task is sent to someone else, wasting the
> collective time of the project.
>
> (PG have installed two work arounds that ensure that if I go on crunching I
> do not lose credit. If a task completes and is shjown as abandoned at the
> time of completion, it is sent for validation as if it were not abandoned.
> If a task trickles up then it reverts to being in progress or overdue, and
> then when it subsequently reports it goes for validation. Providing either
> of these happen before the WU is deleted from the server, the user gets
> credit -- neither feature is standard on other projects, or so I understand)
>
> With credit assured, providing I finish the work, that gives me a moral
> dilemma when the allegedly abandoined work is 10% into a 20 day task. If I
> abort it I lose credit, but if i continue it I am getting the last 80% of
> the credit for work I know is now being done by TWO other machines, which
> is a waste of the project's resources.
>
> 3 (a lesser irritation) when I am testing out different settings (running
> with and with hyperthreading, say) by mixing up historic hosts it makes it
> harder for me to track which host was doing what when.
>
> I have seen this happen among three laptops, running LinuxMInt Mate 17.1,
> Cinnamon 18, and Cinnamon 18.1. Two of these laptops have the same CPU
> model, but i7-6500U, but the third has a model number that looks rather
> different, m5 6y54. The cpus are similar in that they are all at the
> expensive end of the mobile processor range,
>
> When this has happened with these laptops, each time the respective OS was
> installed from live CD/USB, and boinc installed with synaptic, searching
> for the boinc meta package.
>
> The first time it happened, March 2016, I was told that I had provoked the
> problem by using the same usb ethernet dongle and the MAC address was
> therefore the same. So I went out and bought another couple of dongles, and
> labelled them for the respective machines. I honestly believe I have not
> swapped them around indavertently.
>
> This week (jan 2017) the same happened again, involving one of the original
> two laptops and one that had not been involved before. Different cpu,
> different usb dongle, even different kernel versions as I had not ywt
> updated the older machine's kernel at that time. Different manufacturer, so
> different hardware on motherboard, etc etc.
>
> The oddest feature is that after updating from both laptops a number of
> times, all of a sudden the server was showing them as separate machines,
> and had correctly assigned all 8 tasks issued to the new machine to that
> machine, and correctly assigned all the historic tasks and stats to the old
> machine.
>
> So I am wondering how it did that. Perhaps it is not the cpid at all,
> perhaps it is the server software being too clever?
>
> This effect also leaves oddities on the server, like this from my first
> experience of this issue
>
> http://www.primegrid.com/show_host_detail.php?hostid=512618
>
> as you can see the computer has a different creation and last contact time,

[boinc_dev] Suspected cpid hash collisions for newly installed clients on Linux

2017-01-22 Thread trueriver
apols if this is a double post, I am not convinced the first copy went out
OK.
--

hi thanks Steffen and Christian for your kind words.

I believe I am seeing hash collisions in cpid on installing the client in
Debian and Mint. I also believe the Mint package is the unchanged Debian
one, inherited via Ubuntu.

The symptoms I am seeing are that when a new computer is added to my little
farm, it sometimes is taken by the PrimeGrid  (PG)server to be an existing
host.

This is bad for two reasons, and irritating for a third.

1. I cannot rely on setting the default location for new computers, because
the new machine will come up in whatever location the doppelganger had.
This means that it may download and start crunching work that, for example,
will run for longer than that host has really got.

2. If the doppelganger had work in progress, then that work is marked as
abandoned. That means that a new task is sent to someone else, wasting the
collective time of the project.

(PG have installed two work arounds that ensure that if I go on crunching I
do not lose credit. If a task completes and is shjown as abandoned at the
time of completion, it is sent for validation as if it were not abandoned.
If a task trickles up then it reverts to being in progress or overdue, and
then when it subsequently reports it goes for validation. Providing either
of these happen before the WU is deleted from the server, the user gets
credit -- neither feature is standard on other projects, or so I understand)

With credit assured, providing I finish the work, that gives me a moral
dilemma when the allegedly abandoined work is 10% into a 20 day task. If I
abort it I lose credit, but if i continue it I am getting the last 80% of
the credit for work I know is now being done by TWO other machines, which
is a waste of the project's resources.

3 (a lesser irritation) when I am testing out different settings (running
with and with hyperthreading, say) by mixing up historic hosts it makes it
harder for me to track which host was doing what when.

I have seen this happen among three laptops, running LinuxMInt Mate 17.1,
Cinnamon 18, and Cinnamon 18.1. Two of these laptops have the same CPU
model, but i7-6500U, but the third has a model number that looks rather
different, m5 6y54. The cpus are similar in that they are all at the
expensive end of the mobile processor range,

When this has happened with these laptops, each time the respective OS was
installed from live CD/USB, and boinc installed with synaptic, searching
for the boinc meta package.

The first time it happened, March 2016, I was told that I had provoked the
problem by using the same usb ethernet dongle and the MAC address was
therefore the same. So I went out and bought another couple of dongles, and
labelled them for the respective machines. I honestly believe I have not
swapped them around indavertently.

This week (jan 2017) the same happened again, involving one of the original
two laptops and one that had not been involved before. Different cpu,
different usb dongle, even different kernel versions as I had not ywt
updated the older machine's kernel at that time. Different manufacturer, so
different hardware on motherboard, etc etc.

The oddest feature is that after updating from both laptops a number of
times, all of a sudden the server was showing them as separate machines,
and had correctly assigned all 8 tasks issued to the new machine to that
machine, and correctly assigned all the historic tasks and stats to the old
machine.

So I am wondering how it did that. Perhaps it is not the cpid at all,
perhaps it is the server software being too clever?

This effect also leaves oddities on the server, like this from my first
experience of this issue

http://www.primegrid.com/show_host_detail.php?hostid=512618

as you can see the computer has a different creation and last contact time,
so you might think it had contacted the server at least twice. But by the
server's own count, it has done so zero times. Maybe you can see how that
makes sense (apart from it being a tunnelling effect of your quantum
computing module ;)


I am now told on the PG forum that "Linux sometimes fails to pick up the
MAC address".

ALSO, I have seen this among my collection of 11 desktop machines, 2 of
which are identical apart from MAC address, and 1 is a NFS server, and 8
are diskless loading their OS from the server using PXE and root=/dev/nfs.

The server runs LinuxMInt 18.1, The other desktop machines run a minimal
Debian command line OS, netinstall plus ssh plus boinc-client. These are
cloned, but the boinc directories are re-initialised each time to contain
only the four config files in /etc/boinc-client and softlinks to them from
/var/lib/boinc, plus a minimal account_www.primegird.xml that provides my
weak auth code. In particular, there is no contamination of the 
value as the file that holds that value is not cloned.

Running the diskless machines one at a time works fine, but it does 

Re: [boinc_dev] is this the right place to ask about...

2017-01-22 Thread trueriver
hi thanks Steffen and Christian for your kind words.

I believe I am seeing hash collisions in cpid on installing the client in
Debian and Mint. I also believe the Mint package is the unchanged Debian
one, inherited via Ubuntu.

The symptoms I am seeing are that when a new computer is added to my little
farm, it sometimes is taken by the PrimeGrid  (PG)server to be an existing
host.

This is bad for two reasons, and irritating for a third.

1. I cannot rely on setting the default location for new computers, because
the new machine will come up in whatever location the doppelganger had.
This means that it may download and start crunching work that, for example,
will run for longer than that host has really got.

2. If the doppelganger had work in progress, then that work is marked as
abandoned. That means that a new task is sent to someone else, wasting the
collective time of the project.

(PG have installed two work arounds that ensure that if I go on crunching I
do not lose credit. If a task completes and is shjown as abandoned at the
time of completion, it is sent for validation as if it were not abandoned.
If a task trickles up then it reverts to being in progress or overdue, and
then when it subsequently reports it goes for validation. Providing either
of these happen before the WU is deleted from the server, the user gets
credit -- neither feature is standard on other projects, or so I understand)

With credit assured, providing I finish the work, that gives me a moral
dilemma when the allegedly abandoined work is 10% into a 20 day task. If I
abort it I lose credit, but if i continue it I am getting the last 80% of
the credit for work I know is now being done by TWO other machines, which
is a waste of the project's resources.

3 (a lesser irritation) when I am testing out different settings (running
with and with hyperthreading, say) by mixing up historic hosts it makes it
harder for me to track which host was doing what when.

I have seen this happen among three laptops, running LinuxMInt Mate 17.1,
Cinnamon 18, and Cinnamon 18.1. Two of these laptops have the same CPU
model, but i7-6500U, but the third has a model number that looks rather
different, m5 6y54. The cpus are similar in that they are all at the
expensive end of the mobile processor range,

When this has happened with these laptops, each time the respective OS was
installed from live CD/USB, and boinc installed with synaptic, searching
for the boinc meta package.

The first time it happened, March 2016, I was told that I had provoked the
problem by using the same usb ethernet dongle and the MAC address was
therefore the same. So I went out and bought another couple of dongles, and
labelled them for the respective machines. I honestly believe I have not
swapped them around indavertently.

This week (jan 2017) the same happened again, involving one of the original
two laptops and one that had not been involved before. Different cpu,
different usb dongle, even different kernel versions as I had not ywt
updated the older machine's kernel at that time. Different manufacturer, so
different hardware on motherboard, etc etc.

The oddest feature is that after updating from both laptops a number of
times, all of a sudden the server was showing them as separate machines,
and had correctly assigned all 8 tasks issued to the new machine to that
machine, and correctly assigned all the historic tasks and stats to the old
machine.

So I am wondering how it did that. Perhaps it is not the cpid at all,
perhaps it is the server software being too clever?

This effect also leaves oddities on the server, like this from my first
experience of this issue

http://www.primegrid.com/show_host_detail.php?hostid=512618

as you can see the computer has a different creation and last contact time,
so you might think it had contacted the server at least twice. But by the
server's own count, it has done so zero times. Maybe you can see how that
makes sense (apart from it being a tunnelling effect of your quantum
computing module ;)


I am now told on the PG forum that "Linux sometimes fails to pick up the
MAC address".

ALSO, I have seen this among my collection of 11 desktop machines, 2 of
which are identical apart from MAC address, and 1 is a NFS server, and 8
are diskless loading their OS from the server using PXE and root=/dev/nfs.

The server runs LinuxMInt 18.1, The other desktop machines run a minimal
Debian command line OS, netinstall plus ssh plus boinc-client. These are
cloned, but the boinc directories are re-initialised each time to contain
only the four config files in /etc/boinc-client and softlinks to them from
/var/lib/boinc, plus a minimal account_www.primegird.xml that provides my
weak auth code. In particular, there is no contamination of the 
value as the file that holds that value is not cloned.

Running the diskless machines one at a time works fine, but it does seem
random whether it picks up thew history of its own hardware, or of one of
the 

Re: [boinc_dev] is this the right place to ask about...

2017-01-22 Thread Christian Beer
Hi,

if this is a packaging related problem than it's better to directly
contact the package maintainer but the Debian maintainer is also reading
this email list so you may try it out here before opening a Debian bug
report.

Regards
Christian

On 21.01.2017 18:44, trueriver wrote:
> hi everyone,
>
> before I launch into a description and some questions, may I check this is
> the right place to ask about problems that seem to occur with running Boinc
> on multiple Linux machines?
>
> I am wondering, in particular, if the install triggers in the .deb can be
> improved to avoid a particular issue. I may be offering to assist with
> that, depending what the issue turns out to be.
>
> So, is this the right place to ask, and if not can you kindly signpost me
> to the right place please?
>
> regards,
> River~~
> ___
> boinc_dev mailing list
> boinc_dev@ssl.berkeley.edu
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.