On Sat, 10 Nov 2018 at 16:55, Ralph Corderoy <[email protected]> wrote:
>
> Hi,

Hi,

> I've just watched Brad and Mathieu's LinuxFest Northwest 2018 talk on
> Perkeep, https://youtu.be/PlAU_da_U4s and have a couple of questions.
>
> Say a `pk get $hash1' shows some Perkeep metadata as JSON.  If I were to
> `pk put' some text that was valid JSON Perkeep metadata then I assume
> Perkeep initially treats it as if it were genuine when re-building the
> index from just the blobs.  Can problems be caused by it being faulty
> metadata, e.g. an incorrect schema, or referring to blobs that don't
> exist?  If not, because those problems are ignored on the assumption it
> wasn't real Perkeep-authored metadata after all, that would mean genuine
> problems, e.g. caused by a bug, might go undetected at this stage?

Afair, when the index receives a blob, It checks whether it is a valid
claim. If not, it is simply ignored. Then, in some measure, it is also
checked whether the mutation introduced by the claim makes sense. If
not, it is ignored.
Does that answer your question?
If not, please try to propose a concrete example to demonstrate?

> I understand the rolling-checksum deduplication that Perkeep already
> does.  Are the resulting 0-16 MiB blobs ever compressed when stored?

Not that I know of. Well it all depends on what kind of blobserver
implementation you use. For example, the blobpacked implementation
stores blobs pretty much like in a zip file. So I can imagine
compression could be enabled for these.

> Has any thought been given to deduplication at other granularities?
> Given,
>
>     foo.png
>     bar.pdf has foo.png within it
>     xyzzy.mbox has a base64'd bar.pdf within it
>     xyzzy.mbox.gz is exactly a gzip'd xyzzy.mbox
>     all.tar has all the above
>
> it's conceivable that some background process can continually look over
> the blobs for dedupe opportunities.  Is this something that could fit in
> with Perkeep's model, or does the default lack of blob deletion (for
> good reasons) get in the way?

I don't know.

> Lastly, https://perkeep.org could benefit from having an up to date
> `here's some of the things you could use it for' on the front page.
> I can find https://perkeep.org/doc/uses but it's probably out of date
> and doesn't touch on the tantalising interesting answers in the Q&A at
> the end of the presentation.  Read from three PKs, write to the one with
> space, etc.  An up to date list of importers would be good too as many
> might arrive with a social-media site in mind, e.g. Google+ given it's
> declared demise.  Why I'd be interested in using it could be better sold
> on the first page I reach.

Sure. Contributions are welcome. :-)

> --
> Cheers, Ralph.
> https://plus.google.com/+RalphCorderoy
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Perkeep" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to