I guess the rolling hash produces small chunks by intention. The perkeep
source code mentions 16MB as maximum chunk size in some places.
Splitting the file is the intended behavior. It has some advantages
compared to just hashing a complete file. The most important advantage
is probably to upload for example multiple virtual machine images which
just differ in some places within the image. The rolling checksums
should make sure that most of the common parts in the different files
will only be stored once on the underlying storage backend in order to
save disk space.
On 02.04.20 22:01, Joe Moore wrote:
I'm new here, but interested.
So if your file is small enough (doesn't trip the rolling hash
blob-breaking function) the reference will be to the file on disk.
Can you just tune the hash function so that it never sees the need to
split your blob? I don't know if different storage backends would be
able to have different splitting behaviors.
--Joe
--
You received this message because you are subscribed to the Google
Groups "Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/perkeep/1f4197a2-00b3-4460-bcb9-e30a9f939066%40googlegroups.com
<https://groups.google.com/d/msgid/perkeep/1f4197a2-00b3-4460-bcb9-e30a9f939066%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/perkeep/cc1d6768-13fa-68d6-4ccd-4cb1da9a9932%40gmail.com.