Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-10 Thread Paul Hammant
Yikes! I'll have to run through the build setup for Subversion - 52 pages
of requirements *before* launching ./config, right? - including
wind-of-newt, left handed screwdriver, *two* four-leaf clovers ;) :-P

Seriously: sounds good - and I appreciate it :)


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-10 Thread Branko Čibej
On 10.10.2017 16:02, Paul Hammant wrote:
> As my forthcoming multi-user app that uses Subversion as a backing
> store is going to kill Svn it with Depth-∞ PROPFINDs from root, I
> really want to see this implemented.  
>
> Because I can't wait

... I will implement a patch that exposes directory checksums in
Subversion. How 'bout that instead? :)

-- Brane



Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-10 Thread Paul Hammant
As my forthcoming multi-user app that uses Subversion as a backing store is
going to kill Svn it with Depth-∞ PROPFINDs from root, I really want to see
this implemented.

Because I can't wait I will implement something that calculates SHA1s for
the directory in question, and drops it into a '.sha1' file in the folder
in question. It'll be a job that runs perpetually, and keeps committing as
things change.

To cut through the permissions based quandary that y'all are having, how
about ignore the subsetting that's implicit from permissions, and just calc
the SHA1 for the 'all' situation, requiring the client-side end-user of the
this setup to correlate a localally calculated SHA1 for the permitted set
and the remote canonical SHA1 for the potential superset. If the same
algorithm is used on the client and the server, then all the end user can
determine is that they are not (for some directories) permitted to read all
files.

Crude example, using command line unix tools:

$ cat foo
2013.json 1674790a70b984c9041ab86c370f942861ead004
2016.json 194f6519cd60b773a82857cf1aeba8dad4a223ed
2020.json 20e3ff1ade2385c593f73fd44fd157391d2424e7
2050.json 19b2da433a273840deddb7a46b16891acab16e3f
2060.json 45418423999c155abc434e175d42ccf6534bee6d
2068.json 69576d3632c7ce8b0b2a42d87e9e75049bdaff9d
$ cat foo | sha1sum
c1874a0ba80cb51245cb78f567dbbd46271d7ba1  -
$ head -3 foo | sponge foo
$ cat foo
2013.json 1674790a70b984c9041ab86c370f942861ead004
2016.json 194f6519cd60b773a82857cf1aeba8dad4a223ed
2020.json 20e3ff1ade2385c593f73fd44fd157391d2424e7
$ cat foo | sha1sum
1cbbec9747ab817dac26892cfd437f72d7366858  -

^ the server side says c1874a... for the 'directory' that foo represents
and passes that to all clients that ask (provided they are permitted to the
directory).  The client side holds that ref, *and* 1cbbec... maybe in a
database of some sort. This way, it can track whether it has kept abreast
of the Svn server or not. It may determine that in needs to do a depth-1
PROPFIND and because the server side SHA1 for the directory changed, and in
the not-permitted-to-see-all situation may determine that that was
ultimately unnecessary.

Of course I can alternatively correlate subversion revision numbers right
now with locally calc'd SHA1, but it doesn't feel very Merkle-like.


RE: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Bert Huijben


> -Original Message-
> From: Branko Čibej [mailto:br...@apache.org]
> Sent: donderdag 5 oktober 2017 19:29
> To: dev@subversion.apache.org
> Subject: Re: Merkle trees in svn [was: Quick question about the sha1-
> checksum for directories in svn.]
> 
> On 05.10.2017 19:12, Daniel Shahaf wrote:
> > Branko Čibej wrote on Thu, 05 Oct 2017 18:44 +0200:
> >> On 05.10.2017 16:46, Julian Foad wrote:
> >>> Calculation of a directory's hash would have to happen for each
> >>> directory where the user has mixed access to the immediate children,
> >>> and for all parents of such a directory up to the root.
> >> And /that/ is the painful part: the fact that you need a depth-first
> >> traversal of the tree in order to calculate the hash for the root
> >> directory. And the reason why we're not exposing the directory hash,
> >> even if the FS stores it.
> > What if we only returned a checksum for nodes to which the user had
> > full recursive access?  E.g., with "[/A/B] *=", the caller would be
> > able to retrieve checksums for /A/C, /A/D, /A/mu, and /A's property
> > hash, and for descendants of the first two, but that's it.
> 
> That would leak permission settings. A user would know that she only sees a
> partial directory merely by checking for the presence of the directory
> checksum.

We already explicitly leak that there are server excluded subtrees in/for our 
delta editor / reporter design, so this would not be a security regression.

Bert
> 
> -- Brane



Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Branko Čibej
On 05.10.2017 19:12, Daniel Shahaf wrote:
> Branko Čibej wrote on Thu, 05 Oct 2017 18:44 +0200:
>> On 05.10.2017 16:46, Julian Foad wrote:
>>> Calculation of a directory's hash would have to happen for each
>>> directory where the user has mixed access to the immediate children,
>>> and for all parents of such a directory up to the root.
>> And /that/ is the painful part: the fact that you need a depth-first
>> traversal of the tree in order to calculate the hash for the root
>> directory. And the reason why we're not exposing the directory hash,
>> even if the FS stores it.
> What if we only returned a checksum for nodes to which the user had full
> recursive access?  E.g., with "[/A/B] *=", the caller would be able to 
> retrieve
> checksums for /A/C, /A/D, /A/mu, and /A's property hash, and for descendants
> of the first two, but that's it.

That would leak permission settings. A user would know that she only
sees a partial directory merely by checking for the presence of the
directory checksum.

-- Brane


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Daniel Shahaf
Branko Čibej wrote on Thu, 05 Oct 2017 18:44 +0200:
> On 05.10.2017 16:46, Julian Foad wrote:
> > Calculation of a directory's hash would have to happen for each
> > directory where the user has mixed access to the immediate children,
> > and for all parents of such a directory up to the root.
> 
> And /that/ is the painful part: the fact that you need a depth-first
> traversal of the tree in order to calculate the hash for the root
> directory. And the reason why we're not exposing the directory hash,
> even if the FS stores it.

What if we only returned a checksum for nodes to which the user had full
recursive access?  E.g., with "[/A/B] *=", the caller would be able to retrieve
checksums for /A/C, /A/D, /A/mu, and /A's property hash, and for descendants
of the first two, but that's it.


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Branko Čibej
On 05.10.2017 16:46, Julian Foad wrote:
> Branko Čibej wrote:
>> On 05.10.2017 16:19, Paul Hammant wrote:
>>> Not that my vote counts for much, but I'd prefer w/o props, obeying
>>> read permissions.
>>
>> "Obeying read permissions" means that the directory hashes would have to
>> be computed dynamically for each user.
>
> Correct, but let's not imply that's a showstopper.

I never said it was.

> Calculation of a directory's hash would have to happen for each
> directory where the user has mixed access to the immediate children,
> and for all parents of such a directory up to the root.

And /that/ is the painful part: the fact that you need a depth-first
traversal of the tree in order to calculate the hash for the root
directory. And the reason why we're not exposing the directory hash,
even if the FS stores it.

> For any subtree where the user has full access, we can use a stored
> value.

Assuming we know that said user has full access ... which we might,
depending on where the hash calculation would take place, i.e., whether
it would have useful access to the authz info.


-- Brane


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Paul Hammant
Agree.

For those that are super interested in this, I did a series of four blog
entries on Merkle trees and Blockchains.


   - Sept 28th » Choosing Between Blockchains And Vanilla Merkle Trees
   

   - Sept 28th » Blockchains in pictures
   
   - Sept 17th » 'Old-School' Merkle Trees Rock
   
   - Sept 17th » Merkle Trees In Pictures
   


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Julian Foad

Branko Čibej wrote:

On 05.10.2017 16:19, Paul Hammant wrote:

Not that my vote counts for much, but I'd prefer w/o props, obeying
read permissions.


"Obeying read permissions" means that the directory hashes would have to
be computed dynamically for each user.


Correct, but let's not imply that's a showstopper.

Calculation of a directory's hash would have to happen for each 
directory where the user has mixed access to the immediate children, and 
for all parents of such a directory up to the root. For any subtree 
where the user has full access, we can use a stored value.


That means in typical authz patterns (a few subtrees excluded) there is 
very little calculation required, as long as the authz subsystem can 
efficiently tell us whether the user has full or mixed or no access to a 
given subtree.


- Julian


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Paul Hammant
> "Obeying read permissions" means that the directory hashes would have to
> be computed dynamically for each user.

Yes I know.

I've a frankenstein project over here -
https://github.com/paul-hammant/merkle-rust - There's a solid Rust merkle
tree that watches for changes at nodes then re-calcs the SHA1 of the parent
dir (and the parent of that, and all the way back to root).

Because I can't actually write Rust yet, I've made a Python webserver, and
Python test that shows what nodes should report on the wire (plain text in
this demo). Here's is the GET result for node A/AK/A/Alaska/ ( a list of
resources and their SHA1s) ->

96b55f350ccb1911d7f9d7e7645d6d11178ad753
2013.json 1674790a70b984c9041ab86c370f942861ead004
2016.json 194f6519cd60b773a82857cf1aeba8dad4a223ed
2020.json 20e3ff1ade2385c593f73fd44fd157391d2424e7
2050.json 19b2da433a273840deddb7a46b16891acab16e3f
2060.json 45418423999c155abc434e175d42ccf6534bee6d
2068.json 69576d3632c7ce8b0b2a42d87e9e75049bdaff9d
2070.json 2d50d3100744fb7c4c5ead72a2896909fbe2ba6a
2090.json 434e72590bdaa03176ebabc18e52dc0a24918da9
2100.json a884cf405af5a20276b3b1cc72885833905ffa97
2105.json f7bfd6be756c919e4ea69cb466f31ae0b2fd213d
2110.json c88ed14784907db94b11941b760892742bd043f3
2122.json e5247216a7a1f32851807d4b4b9cd1cf152cd57d
2130.json 375fc259d26710afe31e1367a59de3d5dea729b1
2150.json b5c17ce941a9647fbc179cb1b769348cfdaebb98
2164.json ab2f71a47910463bdde92a77313f7e5edba00063
2170.json c1d0842e2c77d53bee5c684e0e8ad580a2fff05f
2180.json 9759646b5a2811efc6bfaee84a2b813f85cb1e1a
2185.json e2beddc3f245ca79908d9a1590aa177151e66e4d
2188.json 5881cdc3a15eac0cba15a3e07ad54176cabeeedf
2195.json b7085173d8685deecf70c6efb63d92ec6db2cf21
2198.json 78a399df3eeb4f696812d6d46882e4029190cf67
2220.json 594a20ec3d550eae2ad848fa8c0e08d50bd4e7fa
2230.json ec4c2c8dfc3cf4c1e6719de0544aaeba7731fc9c
2240.json 798b13c1c4aa7ee2e5744f8b8a2b553c1447229b
2261.json 0cd460d99cc7a871230cbe0f6f2ab703339e1630
2270.json 4e77074d4adc2ac7bb486408b00bda26f98820ea
2275.json 91ce27fc8b643540dda88afdc5b5b62d97a026fa
2282.json 3bec0f1f9e3ae818dd096303f796ce28e2fe6d08
2290.json fd4014808dd77351b939060f1283d395deb0cea5

Here's the GET result for the SHA1 of that A/AK/A/Alaska/.sha1 ->


96b55f350ccb1911d7f9d7e7645d6d11178ad753


You know already, Brane, but others might not: I take the first text (the
list) run it through 'sha1sum' to make the second text (the single SHA1).
That the relatively dynamic SHA1s that would obey read permissions for the
user in question, only need to SHA1 a small amount of data, not whole files
again. Of course that is recursive so it would need to be maintained once
per user group, or user where there's different permissions.

- Paul


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Branko Čibej
On 05.10.2017 16:19, Paul Hammant wrote:
> Not that my vote counts for much, but I'd prefer w/o props, obeying
> read permissions.

"Obeying read permissions" means that the directory hashes would have to
be computed dynamically for each user.

-- Brane


Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Paul Hammant
Not that my vote counts for much, but I'd prefer w/o props, obeying read
permissions.


Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

2017-10-05 Thread Julian Foad

Paul Hammant wrote:
Observation: If [a dir had a SHA1] then Subversion would qualify as a *full* Merkle 
tree implementation. Albeit through APIs designed for other purposes :-p ;)


There's a bit more to it than just adding a SHA1 for a directory:

  * The currently implemented SHA1 of a file doesn't include its 
properties, for a start. That's easy to address: add another that does.


  * We might want one Merkle tree that includes svn:mergeinfo values 
and another that doesn't, so that branches can compare equal even though 
the physical storage representation of their mergeinfo differs. That 
could be said to be a deficiency of the way mergeinfo is currently 
stored, and an alternative solution would be to change that.


  * We might also want a Merkle tree excluding all properties, so that 
the tree can be compared with a plain unversioned tree.


  * How to deal with server authz restrictions. Should a Merkle tree 
represent only that part of the tree that the current user is allowed to 
see, and/or the "whole" tree regardless of authz restrictions? Perhaps 
both kinds are useful.


- Julian


Re: Quick question about the sha1-checksum for directories in svn.

2017-10-05 Thread Paul Hammant
> Correct.

Thanks Julian.

Observation: If it did then Subversion would qualify as a *full* Merkle
tree implementation. Albeit through APIs designed for other purposes :-p ;)


Re: Quick question about the sha1-checksum for directories in svn.

2017-10-05 Thread Julian Foad

Paul Hammant wrote:

While a _directory_ has a revision number, it does not have a SHA1 does it?

If so there's no point asking how it's calculated, is there ?


Correct.

- Julian


Quick question about the sha1-checksum for directories in svn.

2017-10-05 Thread Paul Hammant
While a _directory_ has a revision number, it does not have a SHA1 does it?

If so there's no point asking how it's calculated, is there ?

- Paul