Got it, thanks very much.
On Fri, Nov 8, 2013 at 2:32 AM, Samuel Merritt <s...@swiftstack.com> wrote: > On 11/7/13 5:59 AM, Daniel Li wrote: > >> >> Thanks very much for your help, and please see my inline >> comments/questions. >> >> On Thu, Nov 7, 2013 at 2:30 AM, Samuel Merritt <s...@swiftstack.com >> <mailto:s...@swiftstack.com>> wrote: >> >> On 11/6/13 7:12 AM, Daniel Li wrote: >> >> Hi, >> I have a question about swift: what does swift do if the >> auditor >> find that all 3 replicas are corrupt? >> will it notify the owner of the object(email to the account >> owner)? >> what will happen if the GET request to the corrupted object? >> will it return a special error telling that all the replicas are >> corrupted? >> Or will it just say that the object is not exist? >> Or it just return one of the corrupted replica? >> Or something else? >> >> >> If all 3 (or N) replicas are corrupt, then the auditors will >> eventually quarantine all of them, and subsequent GET requests will >> receive 404 responses. >> >> No notifications are sent, nor is it really feasible to start >> sending them. "The auditor" is not a single process; there is one >> Swift auditor process running on each node in a cluster. Therefore, >> when an object is quarantined, there's no way for its auditor to >> know if the other copies are okay or not. >> >> Note that this is highly unlikely to ever happen, at least with the >> default of 3 replicas. When an auditor finds a corrupt object, it >> quarantines it (moves it to a "quarantines" directory). >> >> Did you mean that when the auditor found the corruption, it did not >> copy good replica from other object server to overwrite the corrupted >> one, it just moved it to a quarantines directory? >> > > That is correct. The object auditors don't perform any network IO, and in > fact do not use the ring at all. All they do is scan the filesystems and > quarantine bad objects in an infinite loop. > > (Of course, there are also container and account auditors that do similar > things, but for container and account databases.) > > > Then, since that object is missing, the replication processes will >> recreate the object by copying it from a node with a good copy. >> >> When did the replication processes recreated the object by copying it >> from a node with a good copy? Does the auditor send a message to >> replication so the replication will do the copy immediately? And what is >> a 'good' copy? Does the good copy's MD5 value is checked before copying? >> > > It'll happen whenever the other replicators, which are running on other > nodes, get around to it. > > Replication in Swift is push-based, not pull-based; there is no receiver > here to which a message could be sent. > > Currently, a "good" copy is one that hasn't been quarantined. Since > replication uses rsync to push files around the network, there's no > checking of MD5 at copy time. However, there is work underway to develop a > replication protocol that avoids rsync entirely and uses the object server > throughout the entire replication process, and that would give the object > server a chance to check MD5 checksums on incoming writes. > > Note that this is only important should 2 replicas experience > near-simultaneous bitrot; in that case, there is a chance that bad-copy A > will get quarantined and replaced with bad-copy B. Eventually, though, a > bad copy will get quarantined and replaced with a good copy, and then > you've got 2 good copies and 1 bad one, which reduces to a > previously-discussed scenario. > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev