Re: A story of deleted blobs

Brad Fitzpatrick Sun, 06 May 2018 19:28:59 -0700

Ouch! Sorry to hear. :-(

We are aware that the re-indexing errors aren't useful enough on
failure. I filed https://github.com/perkeep/perkeep/issues/1122
recently. I plan to address that when I redo the reindexer to make it
faster.


But it's easy enough to for us to add some --keep-going sort of flag
to perkeepd so you can at least get your server back up, even if the
index & blobs are incomplete. At least searches would work, if you're
trying to find your unique perkeep-only content.

Really it should give you a path to each missing blob so you could
have some context for what it might be.

Also, do you have a backup of an old index anywhere? We might be able
to recover a bunch of the blobs just from the index, as long as you
still have your signing GPG keyring.

And if a bunch of those 0.2% blobs are JPEG data, we could trace back
from missing chunk to file to permanode to imported Google Photos
item, and just re-download the file. Of course, it'd make sha224 blobs
now, but from the original photo you could find the range of the file
that's the sha1 blob you're looking for.

Which step would be most helpful first? A flag to the reindexer to --keep-going?


On Sun, May 6, 2018 at 11:50 AM, stephen.searles
<[email protected]> wrote:
> So, a couple months ago, I made a mistake and deleted some data... I'm going
> to share the experience here, and outline some of the plans I've got to move
> on. Help would be welcome, but mostly my aim here is to provide some insight
> as to how things go when they go wrong, for considerations on improvements.
>
> I was working on testing out configuring Digital Ocean's new Spaces product
> as various backing store to Perkeep (well, a version that was still
> camlistore). At some point during the process, I must have deleted some
> blobpacked files on my server. I know I was making odd rsync commands at one
> point that day, but I don't know what did it. Whatever it was it deleted an
> early segment of blobs starting with "sha1-000" up through about "sha1-004".
> At the time, I didn't realize. Perkeep must have had blobs cached, and so
> things were fine in the UI. I had eventually set aside work on that Spaces
> implementation, because of (DO's) performance problems. Then, the other day,
> I went to go update to a more recent version (to Perkeep!). After a little
> config updating, I eventually got it running, but the UI wasn't showing my
> content properly. It just looked like a huge list of folders, with no names
> and no meaningful contents (just poking at a few). Searches I used to run in
> the UI don't return any results. I ran reindexing and eventually did both
> recovery modes. That's when I started seeing the errors about missing blobs,
> all in that early range of sha1s.
>
>> May  6 10:51:22 new perkeepd[2237]: 2018/05/06 10:51:22 Error reindexing
>> sha1-0009743ea7ead6126510ad334cfe4199c2c383f7: index: failed to fetch
>> sha1-0009743ea7ead6126510ad334cfe4199c2c383f7 for reindexing: file does not
>> exist
>
>
> So my first observation here: problems on the backend storage can easily go
> unseen. Earlier detection of the problem may have allowed me to recover from
> cache. The second observation: recovery/reindex errors cause the instance to
> fail to start with limited info for repair: (what's telling it to reindex
> something that doesn't exist?).
>
> So the situation I find myself in now: I deleted about 0.2% of my data, but
> the instance is more or less hosed. (And the data is just gone: this is how
> I realized my backups aren't covering the block storage I moved my perkeep
> data off to... oops). I have a few years of data in that instance. I have a
> few importers: google photos and rss feeds. I have a few attempts at syncing
> my music and whole home directories up to it. Then there's just some odds
> and ends uploaded via the UI. Those are the important bits that would be
> great to recover, mostly because I'm not sure what might be there, and the
> rest is recoverable elsewhere.
>
> The crossroads is between: clean up the surviving data to repair the
> instance or search for non-re-importable data and starting with a fresh new
> instance. I'm not sure what the process for repairing the instance would
> be... when the index complains about blobs being missing, what causes it to
> expect those blobs? Is it possible to look up blobs which reference other
> specific (but non-existent) blobs? I'm not sure what schema relations to
> interrogate, so searching for non-re-importable data sounds easier. I can
> start to build up a search query full of exclusions until I've gotten rid of
> all the imported data. That's what I intend to try first.
>
> That said, my third observation: there doesn't seem to be a good way to
> analyze a perkeep instance's data in aggregate without a lot of manual
> labor. (Or I just haven't seen it yet?)
>
> I'm not sure what, if any, good improvements we could make to perkeep based
> on this information, but I'm happy to keep discussing or share more as my
> discovery/recovery process continues.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Perkeep" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: A story of deleted blobs

Reply via email to