Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-11-01 Thread Paul Hammant
With a maintained server-side merkle tree for repos/users y'all would be a 
little closer to the 'have set' fu of Perforce :)

Sent from my iPhone

> On Nov 1, 2016, at 6:29 AM, Julian Foad  wrote:
> 
> Branko Čibej wrote:
>> Paul Hammant wrote:
>>> I still think it would be good for sha1's to be calculated for directories 
>>> *too*. Better still if those obeyed user permissions too.
>> 
>> Server CPU is a limited resource. File content hashes are not affected
>> by user permissions, so they can be computed once and reused. A hash of
>> a user-specific view of directory contents would have to be recalculated
>> at every access.
> 
> It would only have to be recalculated when the user's access has
> changed. More precisely, when the user's access doesn't match that of
> any already-cached result. That can be more efficient.
> 
> And server network IO is a limited resource too. Any server-side
> calculation required might in any case be preferable to sending all
> the data out so that each client can calculate the result itself.
> 
> - Julian


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-11-01 Thread Julian Foad
Branko Čibej wrote:
> Paul Hammant wrote:
>> I still think it would be good for sha1's to be calculated for directories 
>> *too*. Better still if those obeyed user permissions too.
>
> Server CPU is a limited resource. File content hashes are not affected
> by user permissions, so they can be computed once and reused. A hash of
> a user-specific view of directory contents would have to be recalculated
> at every access.

It would only have to be recalculated when the user's access has
changed. More precisely, when the user's access doesn't match that of
any already-cached result. That can be more efficient.

And server network IO is a limited resource too. Any server-side
calculation required might in any case be preferable to sending all
the data out so that each client can calculate the result itself.

- Julian


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-31 Thread Branko Čibej
On 31.10.2016 13:31, Paul Hammant wrote:
> Thanks everyone - I was able to progress with this and pluck out SHA1s in a 
> single operation.
>
> I still think it would be good for sha1's to be calculated for directories 
> *too*. Better still if those obeyed user permissions too.

Server CPU is a limited resource. File content hashes are not affected
by user permissions, so they can be computed once and reused. A hash of
a user-specific view of directory contents would have to be recalculated
at every access.

-- Brane



Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-31 Thread Paul Hammant
Thanks everyone - I was able to progress with this and pluck out SHA1s in a 
single operation.

I still think it would be good for sha1's to be calculated for directories 
*too*. Better still if those obeyed user permissions too.

- Paul 

Sent from my iPhone

> On Oct 12, 2016, at 9:05 AM, Ivan Zhakov  wrote:
> 
>> On 12 October 2016 at 14:03, Paul Hammant  wrote:
>> It's very exciting to hear that Subversion already calculates shas somewhere
>> in the backend :)
> I noted this multiple times on this thread: SHAs are optional at the
> repository backend layer.
> 
> -- 
> Ivan Zhakov


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-12 Thread Ivan Zhakov
On 12 October 2016 at 14:03, Paul Hammant  wrote:
> It's very exciting to hear that Subversion already calculates shas somewhere
> in the backend :)
I noted this multiple times on this thread: SHAs are optional at the
repository backend layer.

-- 
Ivan Zhakov


RE: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-12 Thread Bert Huijben
This doesn’t look like some kind of update request (more like a commit). We use 
many different propfind requests, which usually only return the requested 
information as that is far more efficient than requesting all properties.

 

I don’t see why we would need it on commit, so I’m not surprised that we don’t 
request it from the server just to slow the request down.

 

 

In the implementation I see that we declare the sha1-checksum as live 
properties, so you should be able to request them if you construct an 
appropriate PROPFIND request. The ‘cadaver’ project build on top of the neon 
library might be an easy way to construct such a request.

 

Bert

 

 

 

From: Paul Hammant [mailto:p...@hammant.org] 
Sent: woensdag 12 oktober 2016 12:55
To: Subversion Development <dev@subversion.apache.org>
Subject: Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

 

You're right - and in the fullness of time, I'll replace all the Svn uses with 
their wire equivalents.  If Shas were implemented at some future date, I'd be 
happy for them to be available via PROPFIND. I's be even more happy for them to 
be passed back to me in the response of a PUT.  

 

Or are you saying that shas are presently implemented but I have missed it?

 

Cranking up the proxy server, Charles, is a great way to see what Svn is doing 
on the wire. Here is svnmucc pushing up a new resource to Svn (no working copy):

 

1. ROOT 401 OPTIONS 0.0.0.0:32768 <http://0.0.0.0:32768>  /svn/testrepo Tue Oct 
11 22:58:41 EDT 2016 49 1244 Complete 

2. ROOT 200 OPTIONS 0.0.0.0:32768 <http://0.0.0.0:32768>  /svn/testrepo Tue Oct 
11 22:58:43 EDT 2016 28 2404 Complete 

3. ROOT 200 OPTIONS 0.0.0.0:32768 <http://0.0.0.0:32768>  /svn/testrepo Tue Oct 
11 22:58:43 EDT 2016 14 1470 Complete 

4. ROOT 200 OPTIONS 0.0.0.0:32768 <http://0.0.0.0:32768>  /svn/testrepo Tue Oct 
11 22:58:43 EDT 2016 15 2332 Complete 

5. ROOT/!svn/rvr/64/TestFile 404 PROPFIND 0.0.0.0:32768 <http://0.0.0.0:32768>  
/svn/testrepo/!svn/rvr/64/TestFile Tue Oct 11 22:58:43 EDT 2016 14 858 Complete 

6. ROOT/!svn/rvr/64/TestFile 404 PROPFIND 0.0.0.0:32768 <http://0.0.0.0:32768>  
/svn/testrepo/!svn/rvr/64/TestFile Tue Oct 11 22:58:43 EDT 2016 9 858 Complete 

7. ROOT/!svn/rvr/64 207 PROPFIND 0.0.0.0:32768 <http://0.0.0.0:32768>  
/svn/testrepo/!svn/rvr/64 Tue Oct 11 22:58:43 EDT 2016 8 857 Complete 

8. ROOT/!svn/me 201 POST 0.0.0.0:32768 <http://0.0.0.0:32768>  
/svn/testrepo/!svn/me Tue Oct 11 22:58:43 EDT 2016 179 711 Complete 

9. ROOT/TestFile 404 HEAD 0.0.0.0:32768 <http://0.0.0.0:32768>  
/svn/testrepo/TestFile Tue Oct 11 22:58:43 EDT 2016 22 334 Complete 

10. ROOT/!svn/txr/64-30/TestFile 201 PUT 0.0.0.0:32768 <http://0.0.0.0:32768>  
/svn/testrepo/!svn/txr/64-30/TestFile Tue Oct 11 22:58:43 EDT 2016 86 20391 
Complete 

11. ROOT 200 MERGE 0.0.0.0:32768 <http://0.0.0.0:32768>  /svn/testrepo Tue Oct 
11 22:58:43 EDT 2016 526 1898 Complete 

 

Doing a second svnmucc operation on the same (now existing) resource, I can see 
via Charles that the second PROPFIND is returning 'rvr/65' for the now-existing 
resource (now a 207 response status). That's certainly not a sha, so I think 
you mistyped.



Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-12 Thread Paul Hammant
You're right - and in the fullness of time, I'll replace all the Svn uses
with their wire equivalents.  If Shas were implemented at some future date,
I'd be happy for them to be available via PROPFIND. I's be even more happy
for them to be passed back to me in the response of a PUT.

Or are you saying that shas are presently implemented but I have missed it?

Cranking up the proxy server, Charles, is a great way to see what Svn is
doing on the wire. Here is svnmucc pushing up a new resource to Svn (no
working copy):

1. ROOT 401 OPTIONS 0.0.0.0:32768 /svn/testrepo Tue Oct 11 22:58:41 EDT
2016 49 1244 Complete
2. ROOT 200 OPTIONS 0.0.0.0:32768 /svn/testrepo Tue Oct 11 22:58:43 EDT
2016 28 2404 Complete
3. ROOT 200 OPTIONS 0.0.0.0:32768 /svn/testrepo Tue Oct 11 22:58:43 EDT
2016 14 1470 Complete
4. ROOT 200 OPTIONS 0.0.0.0:32768 /svn/testrepo Tue Oct 11 22:58:43 EDT
2016 15 2332 Complete
5. ROOT/!svn/rvr/64/TestFile 404 PROPFIND 0.0.0.0:32768
/svn/testrepo/!svn/rvr/64/TestFile Tue Oct 11 22:58:43 EDT 2016 14 858
Complete
6. ROOT/!svn/rvr/64/TestFile 404 PROPFIND 0.0.0.0:32768
/svn/testrepo/!svn/rvr/64/TestFile Tue Oct 11 22:58:43 EDT 2016 9 858
Complete
7. ROOT/!svn/rvr/64 207 PROPFIND 0.0.0.0:32768 /svn/testrepo/!svn/rvr/64
Tue Oct 11 22:58:43 EDT 2016 8 857 Complete
8. ROOT/!svn/me 201 POST 0.0.0.0:32768 /svn/testrepo/!svn/me Tue Oct 11
22:58:43 EDT 2016 179 711 Complete
9. ROOT/TestFile 404 HEAD 0.0.0.0:32768 /svn/testrepo/TestFile Tue Oct 11
22:58:43 EDT 2016 22 334 Complete
10. ROOT/!svn/txr/64-30/TestFile 201 PUT 0.0.0.0:32768
/svn/testrepo/!svn/txr/64-30/TestFile Tue Oct 11 22:58:43 EDT 2016 86 20391
Complete
11. ROOT 200 MERGE 0.0.0.0:32768 /svn/testrepo Tue Oct 11 22:58:43 EDT 2016
526 1898 Complete

Doing a second svnmucc operation on the same (now existing) resource, I can
see via Charles that the second PROPFIND is returning 'rvr/65' for the
now-existing resource (now a 207 response status). That's certainly not a
sha, so I think you mistyped.


RE: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-12 Thread bert
If you are using dav autoversioning, then why do you want to obtain the sha 
using 'svn’.

You should be able to obtain the sha using a PROPFIND request against the 
server.

We use that checksum from there to avoid downloading the same file multiple 
times in our streamlined v2 http protocol.

Bert

Sent from my Windows 10 phone

From: Paul Hammant
Sent: dinsdag 11 oktober 2016 14:09
To: Subversion Development
Subject: Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

Considering ..

     svn info --xml 
https://svn.apache.org/repos/asf/subversion/trunk/subversion/mod_dav_svn/mod_dav_svn.c

I would hope for a  element at root level:




https://svn.apache.org/repos/asf/subversion/trunk/subversion/mod_dav_svn/mod_dav_svn.c
^/subversion/trunk/subversion/mod_dav_svn/mod_dav_svn.c

https://svn.apache.org/repos/asf
13f79535-47bb-0310-9956-ffa450edef68


3bc64b30547e9a0448feba6c6af48447dff2b980
ivan
2016-01-08T12:28:35.243550Z




Considering ..

svn ls --xml 
https://svn.apache.org/repos/asf/subversion/trunk/subversion/mod_dav_svn/

Similarly resulting in the insertion of :



https://svn.apache.org/repos/asf/subversion/trunk/subversion/mod_dav_svn;>

...


mod_dav_svn.c
42444

3bc64b30547e9a0448feba6c6af48447dff2b980
ivan
2016-01-08T12:28:35.243550Z



...




svn-ls doesn't have and entry for "." of course. It's parent has that node, and 
svn-ls works on directories just fine.

For the entry of directory that contains mod_dav_svn.c, I'd hope for the SHA1 
to be a function of the SHA1s of the files within. That's Merkle-tree style - a 
super important feature generally as well as specifically to my use-case.

For my use-case to work, I need to have a reasonable chance of recalculating 
the SHA1 on the client file system without access to the remote repo, or the 
presence of a .svn directory. That's why I'm calling the element content-sha1. 
There could be a sibling element complete-sha1 which is the content-sha1 and 
whatever properties should be included too. I would not use that element, but 
properties were mentioned before.

I don't have an opinion about symlinks, of experience of them with Svn. I'm 
unfamilar with the hat-syntax wc-centric use of svn-ls. Therefore I don't know 
what to say about it. 

I've read the ?kw=1 section of the release notes. My use case would not need 
keyword replacement. In fact it would need it to be off.

Something about something Greek in 
https://svn.apache.org/repos/asf/subversion/tree/readme ? - I'm lost and need 
further guidance as to reading materials, please.

Regards,

- Paul


On Tue, Oct 11, 2016 at 2:40 AM, Daniel Shahaf <danie...@apache.org> wrote:
Paul Hammant wrote on Mon, Oct 10, 2016 at 22:23:25 -0400:
> In that page, there is a mention of 'ModMimeUsePathInfo' that can add
> properties transparently. One like it could optionally add a sha1 as a
> property and that be transient like svn:log, svn:date and svn:author.
>

Please don't worry about implementation details at this stage.  Adding
a per-file attribute is easy.  (It won't be like svn:log, however,
because that is a revprop, as opposed to a nodeprop.)

The real question is, what information you are asking to be provided.
Given the standard Greek tree (see subversion/tests/README), what would
be the outputs of «svn ls --xml ^/iota» and «svn ls --xml ^/A/»?

Are you asking for information to be provided for plain files?  For
symlinks?  For directories?  What is the value of the new attribute in
each of those cases?  If it's a checksum, is it the repository-normal
version or the keywords-expanded version (like ?kw=1 in mod_dav_svn, see
1.8 release notes)?

Don't worry about how the information would be encoded on the wire; just
about what information you would like to have on the client.

Cheers,

Daniel

> Re the commands svn-ls and svn-info. They have an --xml flag already, and
> it would be cool if there was a way of adding select properties to that.
> Note that --xml and --show-item fight each other presently (and are
> singular).




Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-11 Thread Paul Hammant
>
> > As a test, I'm using openssl to make huge files that
> > change wholly with every revision, and trying to find the top limits of
> > Subversion. Sadly I've only found the top limits of Docker on the mac so
> > far - 60GB.
>
> 60GB being the size of each revision of a single versioned file?
>

40-ish revisions of that 1.3GB file. Turns out grow/shrink of a docker
container after creation is still on their todo list. I've a 1TB SSD in my
(work) Mac and it has 700GB space which I was intending to use for a test.
Renting a huge Amazon machine with terrabytes of storage for a few hours
would be next up, if I wanted to continue the size experiments.

-ph


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-11 Thread Daniel Shahaf
Paul Hammant wrote on Tue, Oct 11, 2016 at 20:14:02 -0400:
> > > I've read the ?kw=1 section of the release notes. My use case would not
> > > need keyword replacement. In fact it would need it to be off.
> 
> 
> > Are you sure?  The only situations in which you'd need keywords
> > expansion *off* is if your files *do* have svn:keywords set, but you
> > used neither 'svn export' nor 'svn checkout' to extract the tree in the
> > first place.
> >
> 
> Files will come up and down to Svn with curl (not the svn client).
> I'm taking advantage of auto-versioning (SVNAutoversioning == on).
> They'll be binary, and potentially large too - movie sized. Definitely
> need keyword expansion to be off.

Keyword expansion is an opt-in feature.  Subversion defaults to treating
file contents as opaque binary blobs to be preserved verbatim.

Anyway, in your situation, keywords aren't and won't be a factor.
Consequently you can use the server's precomputed checksum of the
content, which makes the implementation an order of magnitude easier.

> As a test, I'm using openssl to make huge files that
> change wholly with every revision, and trying to find the top limits of
> Subversion. Sadly I've only found the top limits of Docker on the mac so
> far - 60GB.

60GB being the size of each revision of a single versioned file?

Daniel


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-11 Thread Paul Hammant
>
> Thanks for these details, they clarify the picture.
>

:)


> > I've read the ?kw=1 section of the release notes. My use case would not
> > need keyword replacement. In fact it would need it to be off.


> Are you sure?  The only situations in which you'd need keywords
> expansion *off* is if your files *do* have svn:keywords set, but you
> used neither 'svn export' nor 'svn checkout' to extract the tree in the
> first place.
>

Files will come up and down to Svn with curl (not the svn client). I'm
taking advantage of auto-versioning (SVNAutoversioning == on). They'll be
binary, and potentially large too - movie sized. Definitely need keyword
expansion to be off.  As a test, I'm using openssl to make huge files that
change wholly with every revision, and trying to find the top limits of
Subversion. Sadly I've only found the top limits of Docker on the mac so
far - 60GB.


> I gave you the wrong filename earlier, sorry; the correct one is
> https://svn.apache.org/repos/asf/subversion/trunk/
> subversion/tests/greek-tree.txt
>
> But it's not important; your examples with subversion/mod_dav_svn/
> sufficed.
>
>
I read it anyway, thanks :)

- ph


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-11 Thread Daniel Shahaf
Paul Hammant wrote on Tue, Oct 11, 2016 at 08:09:06 -0400:
> kind="file">
> 3bc64b30547e9a0448feba6c6af48447dff2b980
⋮
> For the entry of directory that contains mod_dav_svn.c, I'd hope for the
> SHA1 to be a function of the SHA1s of the files within.
⋮
> For my use-case to work, I need to have a reasonable chance of
> recalculating the SHA1 on the client file system without access to the
> remote repo, or the presence of a .svn directory.

Thanks for these details, they clarify the picture.

> I don't have an opinion about symlinks, of experience of them with Svn. I'm
> unfamilar with the hat-syntax wc-centric use of svn-ls. Therefore I don't
> know what to say about it.

^/foo is shorthand for the URL "${REPOS_ROOT_URL}/foo".  I used that
colloquially; didn't mean to imply existence of a wc.

> I've read the ?kw=1 section of the release notes. My use case would not
> need keyword replacement. In fact it would need it to be off.
> 

Are you sure?  The only situations in which you'd need keywords
expansion *off* is if your files *do* have svn:keywords set, but you
used neither 'svn export' nor 'svn checkout' to extract the tree in the
first place.

> Something about something Greek in
> https://svn.apache.org/repos/asf/subversion/tree/readme ? - I'm lost and
> need further guidance as to reading materials, please.
> 

I gave you the wrong filename earlier, sorry; the correct one is
https://svn.apache.org/repos/asf/subversion/trunk/subversion/tests/greek-tree.txt

But it's not important; your examples with subversion/mod_dav_svn/ sufficed.

Daniel


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-10-11 Thread Paul Hammant
Personally and especially for my use case, I am not interested in the sha1 of 
properties. Others might be though. Therefore two sha1 hashes - one without 
props, and one with.

Sent from my iPhone

> On Oct 11, 2016, at 8:13 AM, Branko Čibej  wrote:
> 
>> On 11.10.2016 14:09, Paul Hammant wrote:
>> For the entry of directory that contains mod_dav_svn.c, I'd hope for the
>> SHA1 to be a function of the SHA1s of the files within. That's Merkle-tree
>> style - a super important feature generally as well as specifically to my
>> use-case.
> 
> 
> But are you only interested in file contents or in files? The difference
> being that files have contents /and/ properties, and we only
> (optionally) store the SHA1 of the contents.
> 
> -- Brane


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-28 Thread Vincent Lefevre
On 2016-09-26 10:04:50 +0100, Julian Foad wrote:
> Daniel Shahaf wrote:
> > What would content hashes provide that comparing node-rev id's would not?
> 1. A node-rev id only exists for a tree that has been committed to
> the repository: there is no way to generate a node-rev id for an
> external tree of content client-side. Note what Paul Hammant wrote
> about the use case:
> 
> "I need to compare to a *local* representation of the same tree
> that's not under subversion control"

But what should the behavior be if there are keywords?

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-26 Thread Paul Hammant
Merkle trees / hashes can help a server maintained graph of objects survive 
"Bitrot" 
(http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/
- data in SSD or HD being corrupted by (say) nutrinos over time. See also a 
guy/gal lamenting their corrupted photo collection - 
https://blog.barthe.ph/2014/06/10/hfs-plus-bit-rot/)
 
A svn server, in some background activity could detect that bitrot has happened 
by calculating the hashes afresh, then ask a replica for its version of the 
same file@revision to heal it silently.
 
Of course you could take the policy view that subversion should rely on a file 
system that can deliberately repair corruptions -  
https://blogs.oracle.com/timc/entry/demonstrating_zfs_self_healing - though 
BTRFS may also repair corruptions.  If you do it in Svn itself, you could take 
advantage of *ordinary* file systems, and the fact that people wanting their 
(say) photos to survive a house fire, should probably have their in-house 
file-sync server (say Raspberry Pi + SDCard) replicating to something outside 
the house too.

RE: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-26 Thread Bert Huijben


> -Original Message-
> From: Daniel Shahaf [mailto:d...@daniel.shahaf.name]
> Sent: maandag 26 september 2016 09:09
> To: Julian Foad <julianf...@apache.org>
> Cc: Paul Hammant <p...@hammant.org>; Subversion Development
> <dev@subversion.apache.org>
> Subject: Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.


> What would content hashes provide that comparing node-rev id's would not?

Just one example:
Stability over dump-load and different subversion filesystems (bdb, fsfs, fsx 
and different versions of those)

If we don't need it for very specific new features I think we should keep 
node-rev id's strictly server/repository side, as the moment we expose them we 
have to promise them to be stable over dump-load, which would probably imply 
another level of redirection. And at that point those IDs might not be the most 
efficient format any more.

Bert



Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-26 Thread Paul Hammant
Replying to various.  I'm making a Dropbox-a-like client that uses
Svn/WebDav/AutoIncrement as the server. Critical design goal - to *not*
have a classic Svn working tree locally.  Think 50GB of binary files sync'd
down to a client, and a wish to not have that take 100GB of local storage.

> What would content hashes provide that comparing node-rev id's would not?

I can client side detect change to a file, without a subversion working
tree. I store the Sha1 as the server had it. I would calculate that for
every file changed via a inotify/FSEvents/ReadDirectoryChangesW
notification mechanism, before pushing up to the svn server (curl push).

I can't calculate a node id on the client side. That's a function of an
actual commit.  I'd need double the storage to maintain a checkout's
working-copy/tree and that defeats a design goal.

Regardless of you folks implementing the server-side hashes or not, I'm
close to completion of a Python3 script that does all the above. It just
has to do calculations as soon as items come down from svn to the client.

> Node-rev id's get changed on every text change, property change, or copy
of the node itself, but aren't changed when a parent of the node
gets copied.

If you implement Sha1 merkle-trees for items held in Svn, please exclude
properties related to merge-tracking from the amalcgum of what you're
hashing.


As an aside, there's a technology called 'Sparkleshare' for Git (& a Git
remote) that does file sync, that I *also* have a pull request in that
introduces Svn as a backend (svn client required; uses Svn working copy) -
https://github.com/hbons/SparkleShare/pull/1721. For extra shits and
giggles I have a Perforce capability under development too -
https://github.com/ph-hs/SparkleShare/tree/perforce.

Note too, I would love it if y'all would circle back to
https://issues.apache.org/jira/browse/SVN-4454 for an implementation.

- Paul


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-26 Thread Julian Foad

Daniel Shahaf wrote:

Julian Foad wrote:
> Hi Paul. I'm +1 on the concept that implementing content hashes in
> Subversion would be useful. I think if we were designing Subversion today,
> the question would be "Why on earth wouldn't we design in a Merkle tree
> content hash?" as it is obviously (to those who have already thought about
> it) useful for these sorts of operation, for people building functionality
> on top of Subversion.

I appreciate that that's your opinion, but I'm going to play devil's
advocate and question it.
The only operation one can do with a content hash is compare it to
another content hash.  Our API already has an object with this property:
svn_fs_id_t.  The equality relation of node-rev id's is a refinement of
the equality relation of content hashes: equal node-rev id's imply equal
content hashes, but the converse is not true.

What would content hashes provide that comparing node-rev id's would not?

1. A node-rev id only exists for a tree that has been committed to the 
repository: there is no way to generate a node-rev id for an external tree of 
content client-side. Note what Paul Hammant wrote about the use case:

"I need to compare to a *local* representation of the same tree that's not under 
subversion control"

2. As you point out, equal content does not imply equal node-rev-ids. The large 
doc-string above svn_fs_id_t says it this way:

"note: Commonly, a node revision will have the same content as some other node 
revisions in the same node and in different nodes. [...]"

Thanks for questioning. That draws out some important points.

Regards,
- Julian



Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-26 Thread Daniel Shahaf
Julian Foad wrote on Sun, Sep 25, 2016 at 22:30:54 +0100:
> Daniel Shahaf wrote:
> >Paul Hammant wrote:
> >>[...]  It is easiest to
> >>hit up the root note and ask for a sha1, [...]
> >
> >Can you explain more about your use-case?  [...]
> 
> Hi Paul. I'm +1 on the concept that implementing content hashes in
> Subversion would be useful. I think if we were designing Subversion today,
> the question would be "Why on earth wouldn't we design in a Merkle tree
> content hash?" as it is obviously (to those who have already thought about
> it) useful for these sorts of operation, for people building functionality
> on top of Subversion.
> 

I appreciate that that's your opinion, but I'm going to play devil's
advocate and question it.

The only operation one can do with a content hash is compare it to
another content hash.  Our API already has an object with this property:
svn_fs_id_t.  The equality relation of node-rev id's is a refinement of
the equality relation of content hashes: equal node-rev id's imply equal
content hashes, but the converse is not true.

What would content hashes provide that comparing node-rev id's would not?

Cheers,

Daniel
(Node-rev id's( get changed on every text change, property change, or
copy of the node itself, but aren't changed when a parent of the node
gets copied.)


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-25 Thread Julian Foad

Daniel Shahaf wrote:

Paul Hammant wrote:

[...]  It is easiest to
hit up the root note and ask for a sha1, [...]


Can you explain more about your use-case?  [...]


Hi Paul. I'm +1 on the concept that implementing content hashes in 
Subversion would be useful. I think if we were designing Subversion 
today, the question would be "Why on earth wouldn't we design in a 
Merkle tree content hash?" as it is obviously (to those who have already 
thought about it) useful for these sorts of operation, for people 
building functionality on top of Subversion.


I think your email subject line misses the point, though, and implies we 
already have a content hash defined and available. We don't. The key 
thing needed is to design and implement content hashes in Subversion, 
rather than about presenting the hash in a specific command.


(Note that any SHA1 hash available in the client-side APIs today is only 
of a file's 'text' content, not of the whole node including its 
properties, and certainly not of a whole tree.)


So I think a good way forward would be to start a new thread with a 
draft proposal for the main feature which is supporting content hashes. 
Give at least one real-world example use case, like Daniel asks, so 
people can see the point. Then propose exactly how a hash could be 
defined on a tree: the property names and values of a node are 
represented in form X in the order Y, and the node text is in its 
repository form, not 'translated' to WC newline style; and so on. Then 
consider any other major issues about the design. One issue I've briefly 
discussed before is that repository authorization controls may give a 
particular no read access to part of the tree. Then the canonical hash 
for the repository's copy of that tree won't match the version of the 
tree that that user will see. There are various ways that issue could be 
addressed; so propose one.


- Julian


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-25 Thread Stefan Sperling
On Sat, Sep 24, 2016 at 08:34:05PM -0400, Paul Hammant wrote:
> Can I put the item in Jira and y'all mark it as a 2.0 feature.

Just filing a ticket in the issue tracker won't lead to any sort of progress.

You'll need to convince us why *we* want this feature and do the work,
or why we should spend time reviewing someone else's work on this.

That's why we ask everyone to raise a discussions on the mailing list
instead of filling our project's ticket database with their ideas.


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-25 Thread Daniel Shahaf
Paul Hammant wrote on Sat, Sep 24, 2016 at 08:36:37 -0400:
> More info: the technology I'm playing with doesn't do a svn checkout, but
> instead monitors the the repo via 'svn ls' (via polling). It is easiest to
> hit up the root note and ask for a sha1, then walk the tree (remotely) to
> get the actually changes nodes deeper in the tree. Sure, the revision
> integer is there too - but I need to compare to a *local* representation of
> the same tree that's not under subversion control, and I'll have to
> calculate SHA1 of the resource immediately after bringing it down from the
> server (rather that just trusting the server's version).

Can you explain more about your use-case?  We already have a solution
for monitoring repository-side changes (svnpubsub, in tools/) and for
determining which files are different in a local worktree to in the
repository ('svn diff -r HEAD', 'svn status -u').

Cheers,

Daniel



Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-24 Thread Paul Hammant
I can pull down files from the Svn server (any version) and calc SHA1s on
the client side in only a few lines of Python. I'd keep a local database to
correlate Svn revision integers.

Can I put the item in Jira and y'all mark it as a 2.0 feature.  As with
Merkle trees, you'd want it to extend from leaf-most to root-most
meaningfully (it should work for directories too) and while that is easy
enough, it'd get more complicated if you factor in different read
permissions for different directories (and different people).

- Paul


On Sat, Sep 24, 2016 at 6:06 PM, Ivan Zhakov  wrote:
On 24 September 2016 at 15:36, Paul Hammant  wrote:
> In order to be able to do some Merkle-tree style functions on sets of
files
> canonically held in Subversion, it would be great to ask Svn for a SHA1
for
> the files, or collections thereof from that node downwards.
>
> I would raise a new feature request direct into Svn, but the JIRA notes
says
> to not do that, and instead to come here to discuss.
>
> More info: the technology I'm playing with doesn't do a svn checkout, but
> instead monitors the the repo via 'svn ls' (via polling). It is easiest to
> hit up the root note and ask for a sha1, then walk the tree (remotely) to
> get the actually changes nodes deeper in the tree. Sure, the revision
> integer is there too - but I need to compare to a *local* representation
of
> the same tree that's not under subversion control, and I'll have to
> calculate SHA1 of the resource immediately after bringing it down from the
> server (rather that just trusting the server's version).
>
> Of course, I'm focussed on 'svn ls' and I am sure there are other
functions
> that could report the SHA1 too.
>
> Someone else might say SHA-2 or 3, and I'm happy to bow to their
expertise.
>
The problem that SHA-1 checksum for files is optional: older
repositories/servers may not have this information stored.

--
Ivan Zhakov


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-24 Thread Ivan Zhakov
On 24 September 2016 at 15:36, Paul Hammant  wrote:
> In order to be able to do some Merkle-tree style functions on sets of files
> canonically held in Subversion, it would be great to ask Svn for a SHA1 for
> the files, or collections thereof from that node downwards.
>
> I would raise a new feature request direct into Svn, but the JIRA notes says
> to not do that, and instead to come here to discuss.
>
> More info: the technology I'm playing with doesn't do a svn checkout, but
> instead monitors the the repo via 'svn ls' (via polling). It is easiest to
> hit up the root note and ask for a sha1, then walk the tree (remotely) to
> get the actually changes nodes deeper in the tree. Sure, the revision
> integer is there too - but I need to compare to a *local* representation of
> the same tree that's not under subversion control, and I'll have to
> calculate SHA1 of the resource immediately after bringing it down from the
> server (rather that just trusting the server's version).
>
> Of course, I'm focussed on 'svn ls' and I am sure there are other functions
> that could report the SHA1 too.
>
> Someone else might say SHA-2 or 3, and I'm happy to bow to their expertise.
>
The problem that SHA-1 checksum for files is optional: older
repositories/servers may not have this information stored.

-- 
Ivan Zhakov


Re: New SHA1 property for nodes returned 'svn ls --xml' invocations.

2016-09-24 Thread Paul Hammant
Isn't 'ls --xml' a machine interface?




>
> Using the Subversion API directly would be the best way to do this. The
> checksum is available at the API level, but wouldn't serve any useful
> purpose in the output of 'svn ls' -- which is, after all, a user, not
> machine interface.
>
> -- Brane
>