Re: A two-part vision for Subversion and large binary objects.

2022-04-07 Thread Julian Foad
Julian Foad wrote:
> Pristines (#525):
>  - #4888 authz denied during textbase sync
>(an edge case issue, not sure if it's a blocker)
>  - #4889 per-WC config
>(wanted)
>  - #4891 fix disabled tests
>(a few different edge cases; much of the analysis is posted in the issue)
> 
> Getting multi-wc-format ready for release (#4883):
>  - #4885 WC upgraded and not-upgraded notifications
>(still open for some nice-to-haves, but probably done enough for MVP)
>  - #4886 config for default WC version for checkout & upgrade
>()
>  - #4887 clarify/unify option names for compatible-version
>(perhaps change '--compatible-version' to '--wc-compatible-version'
> or '--min-compatible-client')
>  - API review; thread: "multi-wc-format review"
>(state is APIs are mostly private and a bit messy; not clear what,
> if anything, we would want to change)

Further updates:

#4888: demoted to non-blocker
#4889: blocker, in progress
#4891: blocker, in progress (I've processed a bunch of the sub-issues in it)
#4885: done enough (now non-blocker)
#4886: not sure (currently marked non-blocker)
#4887: not sure (currently marked blocker)
API review: not sure
Merge to trunk: new thread "Pristines-on-demand: OK to merge to trunk?"

In #4891 "fix disabled tests", the remaining sub-issues don't look like
show-stoppers. Likely we will soon demote it to non-blocker.

Issue #4889 "per-WC config" is the subject of Johan's new dev@ post
"Pristines-on-demand=enabled == format 32?". We already concurred that
it's wise to decouple "pristines-on-demand mode is enabled in this WC"
from "the WC format is (at least) 32 so can support that mode".
. This may be considered
higher priority than fixing the remaining tests. I previously drafted a
proof-of-concept for such a config setting. I'm going to spend two or
three hours and see if I can complete an acceptable minimal version of it.

This (#4889) conceptually also relates to #4886 "config for default WC
version for checkout & upgrade"; I am not yet sure if both are
separately necessary.

Two other issues Karl and I discussed were:

* regression tests:
  -> current status is devs need to run test suite both with and without
the new '--wc-format-version=1.15' knob;
  -> this adds another knob to the several existing knobs;
  -> the resulting exponential increase in test runs is a concern but
not a new problem in itself;
  -> we should make build bots run that combination.
  -> Filed as #4898 "Pristines-on-demand: make buildbots test it"

* simplified user documentation:
  -> not sure, maybe existing is sufficient initially (just needs to be
put where users can find it?);
  -> maybe someone else will be able to rewrite into a simpler, more
digestible form?



Re: A two-part vision for Subversion and large binary objects.

2022-04-05 Thread Julian Foad
A status review from my P.O.V.


Pristines (#525):

* issues filed (potential blockers):

  - #4888 authz denied during textbase sync
(an edge case issue, not sure if it's a blocker)
  - #4889 per-WC config
(wanted)
  - #4891 fix disabled tests
(a few different edge cases; much of the analysis is posted in the issue)

* next milestone: merge to trunk

  - Are any of the issues blockers for merge to trunk? I would suggest not.


Getting multi-wc-format ready for release (#4883):

* issues filed (potential blockers):

  - #4885 WC upgraded and not-upgraded notifications
(still open for some nice-to-haves, but probably done enough for MVP)
  - #4886 config for default WC version for checkout & upgrade
()
  - #4887 clarify/unify option names for compatible-version
(perhaps change '--compatible-version' to '--wc-compatible-version'
or '--min-compatible-client')

* not filed:

  - API review; thread: "multi-wc-format review"
(state is APIs are mostly private and a bit messy; not clear what,
if anything, we would want to change)


Release notes are drafted:




Testing:

Some testing by devs has been done of both multi-wc-format and
pristines-on-demand. It seems to be generally in good shape; no glaring
issues found.

Next steps I suggest:

  - propose merge to trunk
  - review the issues mentioned: fix or decide to postpone each

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-03-15 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 17:59:20 -0600:
> There are reasonable arguments both ways for
> shipping MVP with/without x-hydrate functionality.
> 
> What do others think?

Just bumping Karl's question.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-15 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 17:59:20 -0600:
> On 08 Mar 2022, Daniel Shahaf wrote:
> > Sure.  I was asking whether by "once the user has a local pristine" you
> > meant a pristine — as in, a file under .svn/pristine/ that .svn/wc.db
> > knows about and uses — or Alice making a local copy of the contents of
> > file@BASE somewhere libsvn doesn't know about.
> 
> Well, depending on the context, I may be using the word "pristine" flexibly.
> Sometimes I mean a literal integrated-into-wc-metadata pristine, and
> sometimes I just mean "an extra copy of the file, that the user has made
> locally".
> 

I see.

> (It's possible that the degree of precision you would like in this
> sub-discussion is not one I'm willing to adhere to consistently :-).  I
> can't always predict what will matter to a given interlocutor.  But I'll try
> to be sufficiently precise in my responses below at least.)

Thanks, Karl.  I hope I'm not frustrating you.  I do try to be
interoperable with as many interlocutors as possible, but using "foo" to
sometimes mean "bar" and sometimes mean "poor man's alternative to bar"
does in fact create ambiguities.

> > A manual copy of the BASE revision would "serve for local diffs and
> > reverts", indeed, but I would hestitate to recommend this, because diff
> > and revert are both core operations.  If users need to reinvent these
> > two wheels, then:
> > 
> > - All the advantages of having just that one well-known «svn revert»
> >  button that all the users' GUI clients and scripts can press  are lost
> > 
> > - The local disk storage cost will be paid, but without all the
> >  benefits: e.g., commit will use a self-delta rather than a  delta
> >  against BASE even if the file format does lend itself to binary  diffs;
> >  ra_serf's ability to not download a file if the wc has another  file
> >  with the same sha1 won't be used; the keyword-contraction and
> >  diff-ignore-content-type features of «svn diff» will need to be
> >  reimplemented; etc.
> > 
> > - We might leave a bad impression on potential users
> > 
> > As an MVP alternative, some sort of command to hydrate a single file,
> > perhaps, as you have proposed?  CLI-wise, I'll just say we might want to
> > mark such a command as experimental (name it "x-foo" and document it has
> > reduced forward compatibility promises).  Backend-wise, we'll want to
> > ensure a manually-hydrated file doesn't get dehydrated too soon.
> > 
> > What's "too soon"?  Until the user explicitly requests or permits
> > dehydration.  If hydration was manual, so should dehydration be.
> > 
> > Makes sense?
> 
> Yes, thanks for the suggestion, and I agree.  I would love for MVP or MVP+1
> to have an explicit "rehydrate" UI.  I think there *might* be some value to
> shipping MVP without such a feature, in order to first get some real-world
> experience with how people use pristine-less working copies, before we make
> long-lasting UI decisions.
> 
> But anyway, +1 to the general idea.

Filed: https://issues.apache.org/jira/browse/SVN-4894

> > The context of all this is whether 'update' should fetch pristines for
> > modified files.  I guess it should not do so by default (there's no
> > reason to incur the costs, and the user has opted in to
> > pristines-on-demand),
> > but I don't think we should tell users to keep pristines _and not tell
> > libsvn_wc about them_.  The cost of implementing «svn x-hydrate»
> > (however named) is smaller than the cost of asking users to reimplement
> > core version control functionality.
> 
> Users can already copy files behind Subversion's back, of course.
> 
> I'm worried that implementing 'svn x-hydrate' commands now would be
> premature -- we don't know enough about real-world usage yet. I'd feel more
> comfortable putting out one release (of x-hydrate-less MVP) to get feedback
> on pristine-less working copies.  We could even say that we're considering
> adding x-hydrate commands but that we're waiting until the next release so
> we can make sure our UI ideas match people's actual needs.
> 
> Anyone else have thoughts on this?
> 

Just to make sure you noticed I'm proposing this as an x-* command,
i.e., without promising it'll behave in 1.16 as it does in 1.15, or even
exist at all in 1.16?

We could write a Python script to explicitly hydrate something, even
after 1.15.0-GA, to let people experiment with that to some degree.  (It
won't preserves hydration through commits, of course.)

> > This way, by default «commit» will send self-deltas, but if the user
> > wants a pristine for diffs or reverts, then reverts, diffs, and commits
> > will all use the pristine.  There shouldn't be any need for the user to
> > reimplement their own pristine store and their own diff and revert
> > operations.
> > 
> > And yes, commit might not want to use pristines this way, but that's
> > actually a separate feature request: a request to change the "When
> > committing a change to a pristineful file, send a delta against BASE 

Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Daniel Sahlberg
Den fre 11 mars 2022 kl 13:05 skrev Julian Foad :

> Here is an approach that does *not* satisfy both sides of this argument:
>
> [[[
> svn propset "svn:no-pristines" "*" doc/
>
> cat >> ~/.subversion/config <<-EOF
> [auto-props]
> src/**/*.exe = "svn:no-pristines = *"
> EOF
> ]]]
>
> and we make standard Subversion control its pristine storage based on
> looking for the versioned property "svn:no-pristines".
>
> Here is an approach that *does* satisfy both sides of this argument. It
> is indirect.
>
> First step, we introduce one level of indirection, so that client
> behaviour knobs are not attached directly to the versioned data:
>
> [[[
> svn propset "danielsahlberg:no-pristines" "*" doc/
>
> cat >> ~/.subversion/config <<-EOF
> [auto-props]
> src/**/*.exe = "danielsahlberg:no-pristines = *"
> [working-copy]
> omit-pristines-where-this-prop-is-set = "danielsahlberg:no-pristines"
> EOF
> ]]]
>
> and we make standard Subversion control its pristine storage in
> accordance with the config option "omit-pristines-where-this-prop-is-set".
>
> Second step, we provide server-side configuration for automating those
> config settings, as follows:
>
> [[[
> svn propset --revprop -r0 \
> "svn:server-dictated-config:config:auto-props:src/**/*.exe" \
> "danielsahlberg:no-pristines = *"
> svn propset --revprop -r0 \
> "svn:server-dictated-config:config:working-copy:omit-pristines-where-this-prop-is-set"
> \
> "danielsahlberg:no-pristines"
> ]]]
>
> and we make standard Subversion read config options from the
> repository's r0 revprops, and use those as default values for local
> config options.
>
> This is not a concrete proposal, just trying to make a clear explanation.
>

Sorry for bringing auto-props into the discussion. I forgot this was a
client side configuration and I realise now that I can't have that cookie
without setting up a system for distributing client side settings. I
believe this is out of scope for #525.

Agreed, those are two possible solutions and the second one would solve
both the "managing pristines centrally" and "setting auto-props config". It
also seems to me that the second option would include significantly more
code (fetching client-side settings both from the config file and from -r0
revprops as well as checking for the name of the pristine-controlling
property). I could live with just an "svn:no-pristines=*" and setting up
some script to automatically add these properties when things are committed
to the repository.

/Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Mark Phippard
On Fri, Mar 11, 2022 at 5:16 AM Daniel Sahlberg
 wrote:
>
> Den fre 11 mars 2022 kl 11:04 skrev Julian Foad :
>>
>> Daniel Sahlberg wrote:
>> > I'm taking an opposite position with regards on where this should be
>> > administred. [...] I would prefer a multi-level approach where the
>> > repository (through svn:foo properties) could suggest pristine-less WC
>>
>> I understand completely your case, but the solution you need is a way to
>> configure your client's behaviour remotely, and that is not necessarily
>> best done by Subversion versioned properties. Do you see the
>> distinction? Rather, what you need is for client configuration to be
>> managed centrally and obeyed by your clients. The server and clients
>> involved *could* be your Subversion repository server and Subversion
>> clients, but could alternatively be some other mechanism. You just need
>> some mechanism that works and is easy enough to deploy.
>
>
> I do see that distinction and I completely agree with your analysis.
>
> My position is that svn properties is the easiest way for me to distribute 
> this kind of client configuration (we could call it "client hints"). If there 
> is a majority that Subversion should not provide that, then I won't stand in 
> the way of consensus.
>
> There are a lot of other options as well to configure the clients, AD group 
> policies probably being the most common in a corporate environment but these 
> have a higher bar to get started.


I agree with Daniel completely ... including not wanting to stand in
the way of consensus. I think it just depends if you are more used to
supporting "users" in the corporate world vs thinking like a
super-experience *nix hacker like Karl and Julian.

I also think that the primary use case for this feature is to offer
better handling for large binary files. And regardless of whether you
are a corporate user or an experienced hacker there is going to be
very little use for storing a second copy of those files in the
pristines. So I have always thought that a svn: property based
approach makes the most sense for distributing this information to the
clients.

I would favor making it simple for the user and if you really have
strong beliefs that the client should have full control then allow a
power-user to have options to override those defaults.

Again ... I do not want to stand in the way of consensus or alter the
MVP. Like Daniel, I am just saying let's not shut down the possibility
of this approach in the future.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Julian Foad
Here is an approach that does *not* satisfy both sides of this argument:

[[[
svn propset "svn:no-pristines" "*" doc/

cat >> ~/.subversion/config <<-EOF
[auto-props]
src/**/*.exe = "svn:no-pristines = *"
EOF
]]]

and we make standard Subversion control its pristine storage based on
looking for the versioned property "svn:no-pristines".

Here is an approach that *does* satisfy both sides of this argument. It
is indirect.

First step, we introduce one level of indirection, so that client
behaviour knobs are not attached directly to the versioned data:

[[[
svn propset "danielsahlberg:no-pristines" "*" doc/

cat >> ~/.subversion/config <<-EOF
[auto-props]
src/**/*.exe = "danielsahlberg:no-pristines = *"
[working-copy]
omit-pristines-where-this-prop-is-set = "danielsahlberg:no-pristines"
EOF
]]]

and we make standard Subversion control its pristine storage in
accordance with the config option "omit-pristines-where-this-prop-is-set".

Second step, we provide server-side configuration for automating those
config settings, as follows:

[[[
svn propset --revprop -r0 \
"svn:server-dictated-config:config:auto-props:src/**/*.exe" \
"danielsahlberg:no-pristines = *"
svn propset --revprop -r0 \
"svn:server-dictated-config:config:working-copy:omit-pristines-where-this-prop-is-set"
 \
"danielsahlberg:no-pristines"
]]]

and we make standard Subversion read config options from the
repository's r0 revprops, and use those as default values for local
config options.

This is not a concrete proposal, just trying to make a clear explanation.

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Daniel Sahlberg
Den fre 11 mars 2022 kl 11:28 skrev Vincent Lefevre :

> On 2022-03-11 10:04:36 +, Julian Foad wrote:
> > Daniel Sahlberg wrote:
> > > I'm taking an opposite position with regards on where this should be
> > > administred. [...] I would prefer a multi-level approach where the
> > > repository (through svn:foo properties) could suggest pristine-less WC
> >
> > I understand completely your case, but the solution you need is a way to
> > configure your client's behaviour remotely, and that is not necessarily
> > best done by Subversion versioned properties. Do you see the
> > distinction? Rather, what you need is for client configuration to be
> > managed centrally and obeyed by your clients. The server and clients
> > involved *could* be your Subversion repository server and Subversion
> > clients, but could alternatively be some other mechanism. You just need
> > some mechanism that works and is easy enough to deploy.
>
> If I understand what Daniel Sahlberg intends to mean is that the
> property would actually tell the client what to do *by default*,
> removing the need to configure the client. But I suppose that its
> use would be very uncommon (say, for a repository storing only
> big binary files, the main goal being to keep the history, but
> where "svn diff" would never be done in practice).
>

Correct!

Having such a property on directories and/or individual files would
> be much more interesting, but in such a case, there should be more
> than 2 levels of suggestion.
>

In our case we have directories with binary blobs of documentation and I
would like to set it on that directory, but not on the directories
containing source code. We also commit compiled code (identified by file
name extension) and I would like to set it (via auto-props) on these files.

Again: I'm not suggesting that Subversion should set such settings *by
default* but provide a mechanism for the committers to set it.

Kind regards,
Daniel Sahlberg


Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Vincent Lefevre
On 2022-03-11 10:04:36 +, Julian Foad wrote:
> Daniel Sahlberg wrote:
> > I'm taking an opposite position with regards on where this should be
> > administred. [...] I would prefer a multi-level approach where the
> > repository (through svn:foo properties) could suggest pristine-less WC
> 
> I understand completely your case, but the solution you need is a way to
> configure your client's behaviour remotely, and that is not necessarily
> best done by Subversion versioned properties. Do you see the
> distinction? Rather, what you need is for client configuration to be
> managed centrally and obeyed by your clients. The server and clients
> involved *could* be your Subversion repository server and Subversion
> clients, but could alternatively be some other mechanism. You just need
> some mechanism that works and is easy enough to deploy.

If I understand what Daniel Sahlberg intends to mean is that the
property would actually tell the client what to do *by default*,
removing the need to configure the client. But I suppose that its
use would be very uncommon (say, for a repository storing only
big binary files, the main goal being to keep the history, but
where "svn diff" would never be done in practice).

Having such a property on directories and/or individual files would
be much more interesting, but in such a case, there should be more
than 2 levels of suggestion.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Daniel Sahlberg
Den fre 11 mars 2022 kl 11:04 skrev Julian Foad :

> Daniel Sahlberg wrote:
> > I'm taking an opposite position with regards on where this should be
> > administred. [...] I would prefer a multi-level approach where the
> > repository (through svn:foo properties) could suggest pristine-less WC
>
> I understand completely your case, but the solution you need is a way to
> configure your client's behaviour remotely, and that is not necessarily
> best done by Subversion versioned properties. Do you see the
> distinction? Rather, what you need is for client configuration to be
> managed centrally and obeyed by your clients. The server and clients
> involved *could* be your Subversion repository server and Subversion
> clients, but could alternatively be some other mechanism. You just need
> some mechanism that works and is easy enough to deploy.
>

I do see that distinction and I completely agree with your analysis.

My position is that svn properties is the easiest way for me to distribute
this kind of client configuration (we could call it "client hints"). If
there is a majority that Subversion should not provide that, then I won't
stand in the way of consensus.

There are a lot of other options as well to configure the clients, AD group
policies probably being the most common in a corporate environment but
these have a higher bar to get started.

Kind regards,
Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Julian Foad
Daniel Sahlberg wrote:
> I'm taking an opposite position with regards on where this should be
> administred. [...] I would prefer a multi-level approach where the
> repository (through svn:foo properties) could suggest pristine-less WC

I understand completely your case, but the solution you need is a way to
configure your client's behaviour remotely, and that is not necessarily
best done by Subversion versioned properties. Do you see the
distinction? Rather, what you need is for client configuration to be
managed centrally and obeyed by your clients. The server and clients
involved *could* be your Subversion repository server and Subversion
clients, but could alternatively be some other mechanism. You just need
some mechanism that works and is easy enough to deploy.


Re: A two-part vision for Subversion and large binary objects.

2022-03-11 Thread Daniel Sahlberg
Den tors 10 mars 2022 kl 18:48 skrev Karl Fogel :

> On 10 Mar 2022, Lorenz wrote:
> >Daniel Sahlberg wrote:
> >
> >>Den tis 8 mars 2022 kl 14:17 skrev Daniel Shahaf
> >>:
> >>
> >>> An alternative is to require the user to let svn know before
> >>> they're
> >>> starting to edit a file, so we can create a pristine off the
> >>> on-disk
> >>> file.  This way we won't have pristineless modified files in
> >>> the first
> >>> place.
> >>>
> >>
> >>Not "require". It might be an interesting for some use-case to
> >>have "svn
> >>create-pristine-from-wc" as a manual step, but not adding this
> >>as part of
> >>the normal workflow. I have some wc's that might benefit from
> >>being
> >>pristine-less, but I'm not prepared to pay the extra cost
> >>(time-wise) of an
> >>svn:needs-locking-like step for every file I need to modify. I
> >>don't think
> >>this new command (or option) is MVP.
> >
> >maybe something like svn:needs-prestine-for-edit similar to
> >svn:needs-lock?
> >
> >Or, when finally get a file specific configuration for prestine
> >handling, that case could be included there?
>
> There's one principle I'm pretty firmly convinced about (I mean,
> of course everything is open for discussion here, I'm just saying
> where I'm starting from):
>
> Everything to do with pristines is a matter of *local
> configuration* ("configuration" interpreted broadly -- it includes
> local run-time options, as well as stuff in config files).
>
> In other words, it would be a mistake to create new svn:foo
> properties that indicate what the local pristine behavior should
> be, because the user's needs are inherently local and specific to
> that user's situation (how fast is their network, how much disk
> space do they have).  In other words, those needs are *not* about
> the file itself, but rather are solely about the constraints of
> the local (client-side) environment.
>
> Now, local configuration could look at *existing* svn:foo
> properties that serve other purposes (e.g., svn:mime-type), in
> order to make decisions about pristines, the same way local config
> can look at file size to make such decisions.  And if some
> organization wants to set their own custom non-svn:foo properties
> and have local config look at those custom properties for
> guidance, that's fine -- that's their business.
>
> But SVN should not be building in such things itself.  Pristines
> are a purely local phenomenon.  An svn:foo property whose purpose
> is to give guidance about pristines would be a directional
> mistake, IMHO.
>

I'm taking an opposite position with regards on where this should be
administred. My primary use case is with users who manage their own
computers (ie, I have no simple way of pushing settings) but who are not
interested in configuring a lot of client side option. I know their use
case enough to know that they would benefit from pristine-less WCs (99% of
the work is made while connected on a fast network connection and svn diff
(et al.) is a relatively uncommon operation).

I would prefer a multi-level approach where the repository (through svn:foo
properties) could suggest pristine-less WC (even better, to have that
property on directories and on individual files) but the client could
override this suggestion (either through general config in .svn/ or through
cmdline options).

With certain repositories (like the ASF repo) this knowledge does not exist
and I would expect the property isn't set. With other repositories
(internal corporate repositories) let the administrator handle things and
powerusers could overrule.

That might not be MVP, but at least lets not rule it out for the future.

Kind regards,
Daniel Sahlberg


Re: A two-part vision for Subversion and large binary objects.

2022-03-10 Thread Karl Fogel

On 10 Mar 2022, Lorenz wrote:

Daniel Sahlberg wrote:

Den tis 8 mars 2022 kl 14:17 skrev Daniel Shahaf 
:


An alternative is to require the user to let svn know before 
they're
starting to edit a file, so we can create a pristine off the 
on-disk
file.  This way we won't have pristineless modified files in 
the first

place.



Not "require". It might be an interesting for some use-case to 
have "svn
create-pristine-from-wc" as a manual step, but not adding this 
as part of
the normal workflow. I have some wc's that might benefit from 
being
pristine-less, but I'm not prepared to pay the extra cost 
(time-wise) of an
svn:needs-locking-like step for every file I need to modify. I 
don't think

this new command (or option) is MVP.


maybe something like svn:needs-prestine-for-edit similar to
svn:needs-lock?

Or, when finally get a file specific configuration for prestine
handling, that case could be included there?


There's one principle I'm pretty firmly convinced about (I mean, 
of course everything is open for discussion here, I'm just saying 
where I'm starting from):


Everything to do with pristines is a matter of *local 
configuration* ("configuration" interpreted broadly -- it includes 
local run-time options, as well as stuff in config files).


In other words, it would be a mistake to create new svn:foo 
properties that indicate what the local pristine behavior should 
be, because the user's needs are inherently local and specific to 
that user's situation (how fast is their network, how much disk 
space do they have).  In other words, those needs are *not* about 
the file itself, but rather are solely about the constraints of 
the local (client-side) environment.


Now, local configuration could look at *existing* svn:foo 
properties that serve other purposes (e.g., svn:mime-type), in 
order to make decisions about pristines, the same way local config 
can look at file size to make such decisions.  And if some 
organization wants to set their own custom non-svn:foo properties 
and have local config look at those custom properties for 
guidance, that's fine -- that's their business.


But SVN should not be building in such things itself.  Pristines 
are a purely local phenomenon.  An svn:foo property whose purpose 
is to give guidance about pristines would be a directional 
mistake, IMHO.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Karl Fogel

On 08 Mar 2022, Daniel Shahaf wrote:
Sure.  I was asking whether by "once the user has a local 
pristine" you
meant a pristine — as in, a file under .svn/pristine/ that 
.svn/wc.db
knows about and uses — or Alice making a local copy of the 
contents of

file@BASE somewhere libsvn doesn't know about.


Well, depending on the context, I may be using the word "pristine" 
flexibly.  Sometimes I mean a literal integrated-into-wc-metadata 
pristine, and sometimes I just mean "an extra copy of the file, 
that the user has made locally".


(It's possible that the degree of precision you would like in this 
sub-discussion is not one I'm willing to adhere to consistently 
:-).  I can't always predict what will matter to a given 
interlocutor.  But I'll try to be sufficiently precise in my 
responses below at least.) 

A manual copy of the BASE revision would "serve for local diffs 
and
reverts", indeed, but I would hestitate to recommend this, 
because diff
and revert are both core operations.  If users need to reinvent 
these

two wheels, then:

- All the advantages of having just that one well-known «svn 
revert»
 button that all the users' GUI clients and scripts can press 
 are lost


- The local disk storage cost will be paid, but without all the
 benefits: e.g., commit will use a self-delta rather than a 
 delta
 against BASE even if the file format does lend itself to binary 
 diffs;
 ra_serf's ability to not download a file if the wc has another 
 file

 with the same sha1 won't be used; the keyword-contraction and
 diff-ignore-content-type features of «svn diff» will need to be
 reimplemented; etc.

- We might leave a bad impression on potential users

As an MVP alternative, some sort of command to hydrate a single 
file,
perhaps, as you have proposed?  CLI-wise, I'll just say we might 
want to
mark such a command as experimental (name it "x-foo" and document 
it has
reduced forward compatibility promises).  Backend-wise, we'll 
want to

ensure a manually-hydrated file doesn't get dehydrated too soon.

What's "too soon"?  Until the user explicitly requests or permits
dehydration.  If hydration was manual, so should dehydration be.

Makes sense?


Yes, thanks for the suggestion, and I agree.  I would love for MVP 
or MVP+1 to have an explicit "rehydrate" UI.  I think there 
*might* be some value to shipping MVP without such a feature, in 
order to first get some real-world experience with how people use 
pristine-less working copies, before we make long-lasting UI 
decisions.


But anyway, +1 to the general idea.

The context of all this is whether 'update' should fetch 
pristines for
modified files.  I guess it should not do so by default (there's 
no
reason to incur the costs, and the user has opted in to 
pristines-on-demand),
but I don't think we should tell users to keep pristines _and not 
tell

libsvn_wc about them_.  The cost of implementing «svn x-hydrate»
(however named) is smaller than the cost of asking users to 
reimplement

core version control functionality.


Users can already copy files behind Subversion's back, of course.

I'm worried that implementing 'svn x-hydrate' commands now would 
be premature -- we don't know enough about real-world usage yet. 
I'd feel more comfortable putting out one release (of 
x-hydrate-less MVP) to get feedback on pristine-less working 
copies.  We could even say that we're considering adding x-hydrate 
commands but that we're waiting until the next release so we can 
make sure our UI ideas match people's actual needs.


Anyone else have thoughts on this?


If we think there are use-cases in which users will want to have
a pristine for a modified file, whether those use-cases involve 
«commit»
or «diff» or «revert» or whatever else, then that pristine 
shouldn't be
just the user's private copy of BASE; it should be a real 
pristine, live
in .svn/pristine/ and be known to wc.db, and used for all svn 
operations,

not just those the user has reimplemented.


I understand the motivation.  There are reasonable arguments both 
ways for shipping MVP with/without x-hydrate functionality.


What do others think?

This way, by default «commit» will send self-deltas, but if the 
user
wants a pristine for diffs or reverts, then reverts, diffs, and 
commits
will all use the pristine.  There shouldn't be any need for the 
user to
reimplement their own pristine store and their own diff and 
revert

operations.

And yes, commit might not want to use pristines this way, but 
that's
actually a separate feature request: a request to change the 
"When
committing a change to a pristineful file, send a delta against 
BASE or
a self-delta, whichever is smaller" logic, which IIRC works by 
computing
a delta against BASE and comparing its length to the 
repository-normal
filesize, to something that doesn't compute a delta against BASE 
in the

first place.


Yes, that's a good point (in that last paragraph there), and we 
should take it into account when (re)implementing commit 

Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 14:01:22 -0600:
> On 08 Mar 2022, Daniel Shahaf wrote:
> > Karl Fogel:
> > > Hmm, I don't see where I was assuming that the pristine would be
> > > needed exactly once, though.  Once the user has a local pristine
> > > (by whatever means),
> > 
> > To be clear, we're only talking about pristines that libsvn_wc knows
> > about, right?  As opposed to Alice running «svn cat iota@BASE» and
> > saving the output somewhere.
> 
> Hmm, I don't think I understand the question here.  Can you ask it with more
> details / context?

Sure.  I was asking whether by "once the user has a local pristine" you
meant a pristine — as in, a file under .svn/pristine/ that .svn/wc.db
knows about and uses — or Alice making a local copy of the contents of
file@BASE somewhere libsvn doesn't know about.

> > > if she wants to keep that local pristine after committing its
> > > corresponding working file, then she could do so or not do so,
> > > depending on
> > > whether she wants to continue paying the local storage cost for it.
> > 
> > How would Alice keep iota's pristine after committing iota?  «svn commit
> > iota» deletes iota's pristine.
> 
> Like I said, I wasn't going into UI details.

Sure.  Neither was I.

> But if Subversion wants to offer a way for commit to keep the
> post-commit pristine around (in circumstances where that file would
> otherwise be pristine-less), it can do so.  This wouldn't be for MVP,
> of course; I'm just saying it's a conceivable feature and maybe some
> day we'll offer it.

+1

> For now, the way Alice would keep an "informal pristine" would be simply
> manually copy the file.  That's not a pristine in the full sense of the
> word, but it will serve for local diffs and reverts of course.

A manual copy of the BASE revision would "serve for local diffs and
reverts", indeed, but I would hestitate to recommend this, because diff
and revert are both core operations.  If users need to reinvent these
two wheels, then:

- All the advantages of having just that one well-known «svn revert»
  button that all the users' GUI clients and scripts can press are lost

- The local disk storage cost will be paid, but without all the
  benefits: e.g., commit will use a self-delta rather than a delta
  against BASE even if the file format does lend itself to binary diffs;
  ra_serf's ability to not download a file if the wc has another file
  with the same sha1 won't be used; the keyword-contraction and
  diff-ignore-content-type features of «svn diff» will need to be
  reimplemented; etc.

- We might leave a bad impression on potential users

As an MVP alternative, some sort of command to hydrate a single file,
perhaps, as you have proposed?  CLI-wise, I'll just say we might want to
mark such a command as experimental (name it "x-foo" and document it has
reduced forward compatibility promises).  Backend-wise, we'll want to
ensure a manually-hydrated file doesn't get dehydrated too soon.

What's "too soon"?  Until the user explicitly requests or permits
dehydration.  If hydration was manual, so should dehydration be.

Makes sense?



The context of all this is whether 'update' should fetch pristines for
modified files.  I guess it should not do so by default (there's no
reason to incur the costs, and the user has opted in to pristines-on-demand),
but I don't think we should tell users to keep pristines _and not tell
libsvn_wc about them_.  The cost of implementing «svn x-hydrate»
(however named) is smaller than the cost of asking users to reimplement
core version control functionality.

If we think there are use-cases in which users will want to have
a pristine for a modified file, whether those use-cases involve «commit»
or «diff» or «revert» or whatever else, then that pristine shouldn't be
just the user's private copy of BASE; it should be a real pristine, live
in .svn/pristine/ and be known to wc.db, and used for all svn operations,
not just those the user has reimplemented.

This way, by default «commit» will send self-deltas, but if the user
wants a pristine for diffs or reverts, then reverts, diffs, and commits
will all use the pristine.  There shouldn't be any need for the user to
reimplement their own pristine store and their own diff and revert
operations.

And yes, commit might not want to use pristines this way, but that's
actually a separate feature request: a request to change the "When
committing a change to a pristineful file, send a delta against BASE or
a self-delta, whichever is smaller" logic, which IIRC works by computing
a delta against BASE and comparing its length to the repository-normal
filesize, to something that doesn't compute a delta against BASE in the
first place.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Karl Fogel

On 08 Mar 2022, Daniel Shahaf wrote:
Hmm, I don't see where I was assuming that the pristine would 
be needed
exactly once, though.  Once the user has a local pristine (by 
whatever

means),


To be clear, we're only talking about pristines that libsvn_wc 
knows
about, right?  As opposed to Alice running «svn cat iota@BASE» 
and

saving the output somewhere.


Hmm, I don't think I understand the question here.  Can you ask it 
with more details / context?



if she wants to keep that local pristine after committing its
corresponding working file, then she could do so or not do so, 
depending on
whether she wants to continue paying the local storage cost for 
it.


How would Alice keep iota's pristine after committing iota?  «svn 
commit

iota» deletes iota's pristine.


Like I said, I wasn't going into UI details.  But if Subversion 
wants to offer a way for commit to keep the post-commit pristine 
around (in circumstances where that file would otherwise be 
pristine-less), it can do so.  This wouldn't be for MVP, of 
course; I'm just saying it's a conceivable feature and maybe some 
day we'll offer it.


For now, the way Alice would keep an "informal pristine" would be 
simply manually copy the file.  That's not a pristine in the full 
sense of the word, but it will serve for local diffs and reverts 
of course.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 12:32:38 -0600:
> On 08 Mar 2022, Daniel Shahaf wrote:
> > Karl Fogel wrote on Mon, Mar 07, 2022 at 13:44:03 -0600:
> > > And in the absence of fancy cross-network common-prefix detection
> > > code that we're not going to write, this would just be
> > > cost-shifting anyway.  Whatever commit-time improvement one would
> > > gain from having the pristine locally would be offset by the extra
> > > time spent fetching the pristine to make that commit-time
> > > improvement possible.
> > 
> > What assumptions is this conclusion valid under?  It seems to this
> > conclusion assumes, at least, that the uplink and downlink bandwidths
> > are equal and that the pristine will be needed exactly once (i.e.,
> > a hydrate-commit-dehydrate sequence).
> 
> I was assuming up and down speeds are roughly the same, yes.
> 
> Hmm, I don't see where I was assuming that the pristine would be needed
> exactly once, though.  Once the user has a local pristine (by whatever
> means),

To be clear, we're only talking about pristines that libsvn_wc knows
about, right?  As opposed to Alice running «svn cat iota@BASE» and
saving the output somewhere.

> if she wants to keep that local pristine after committing its
> corresponding working file, then she could do so or not do so, depending on
> whether she wants to continue paying the local storage cost for it.

How would Alice keep iota's pristine after committing iota?  «svn commit
iota» deletes iota's pristine.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Karl Fogel

On 08 Mar 2022, Daniel Shahaf wrote:
I wasn't proposing we require such a step.  I was merely saying 
that was
one of several possible solutions to the "How to commit a 
pristineless

file" question.  Here they are again:

1. Download the pristine and then send a regular delta
2. Send a self-delta
3. rsync the file
4. Avoid getting into this situation in the first place

I guess we'll be happy with (2) for the MVP.


Very happy with (2) for MVP, and possibly for all time :-).


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Karl Fogel

On 08 Mar 2022, Daniel Shahaf wrote:

Karl Fogel wrote on Mon, Mar 07, 2022 at 13:44:03 -0600:
And in the absence of fancy cross-network common-prefix 
detection code that
we're not going to write, this would just be cost-shifting 
anyway.  Whatever
commit-time improvement one would gain from having the pristine 
locally
would be offset by the extra time spent fetching the pristine 
to make that

commit-time improvement possible.


What assumptions is this conclusion valid under?  It seems to 
this
conclusion assumes, at least, that the uplink and downlink 
bandwidths
are equal and that the pristine will be needed exactly once 
(i.e.,

a hydrate-commit-dehydrate sequence).


I was assuming up and down speeds are roughly the same, yes.

Hmm, I don't see where I was assuming that the pristine would be 
needed exactly once, though.  Once the user has a local pristine 
(by whatever means), if she wants to keep that local pristine 
after committing its corresponding working file, then she could do 
so or not do so, depending on whether she wants to continue paying 
the local storage cost for it.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Karl Fogel

On 08 Mar 2022, Daniel Shahaf wrote:

Karl Fogel wrote on Sun, Mar 06, 2022 at 22:19:50 -0600:
b) The failure mode of unnecessary fetching and storing is much 
worse than
the failure mode of not having fetched a pristine that someone 
might turn

out to want (there are workarounds for the latter);


What are some of those workarounds?


One can make a copy of a file before modifying it, if one thinks 
one might need to revert or do a local diff (not necessarily 
limited to regular plain-text 'diff' program, of course -- some 
binary file formats have corresponding custom diff tools).


And if one forgets to make a copy first, one can *still* fetch the 
base file manually from the server using 'svn cat', thus paying 
the network-time cost and the local-storage cost by choice.


Finally, eventually we may have an 'svn rehydrate' command (or 
'svn update --rehydrate' or whatever -- I'm not worrying about the 
UI details here, just positing that there *is* a UI).  That would 
do basically what 'svn cat' does, but in addition would integrate 
the result into the working copy as a pristine base.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Daniel Sahlberg wrote on Tue, Mar 08, 2022 at 14:34:06 +0100:
> Den tis 8 mars 2022 kl 14:17 skrev Daniel Shahaf :
> 
> > An alternative is to require the user to let svn know before they're
> > starting to edit a file, so we can create a pristine off the on-disk
> > file.  This way we won't have pristineless modified files in the first
> > place.
> >
> 
> Not "require". It might be an interesting for some use-case to have "svn
> create-pristine-from-wc" as a manual step, but not adding this as part of
> the normal workflow. I have some wc's that might benefit from being
> pristine-less, but I'm not prepared to pay the extra cost (time-wise) of an
> svn:needs-locking-like step for every file I need to modify. I don't think
> this new command (or option) is MVP.

I wasn't proposing we require such a step.  I was merely saying that was
one of several possible solutions to the "How to commit a pristineless
file" question.  Here they are again:

1. Download the pristine and then send a regular delta
2. Send a self-delta
3. rsync the file
4. Avoid getting into this situation in the first place

I guess we'll be happy with (2) for the MVP.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Sahlberg
Den tis 8 mars 2022 kl 14:17 skrev Daniel Shahaf :

> An alternative is to require the user to let svn know before they're
> starting to edit a file, so we can create a pristine off the on-disk
> file.  This way we won't have pristineless modified files in the first
> place.
>

Not "require". It might be an interesting for some use-case to have "svn
create-pristine-from-wc" as a manual step, but not adding this as part of
the normal workflow. I have some wc's that might benefit from being
pristine-less, but I'm not prepared to pay the extra cost (time-wise) of an
svn:needs-locking-like step for every file I need to modify. I don't think
this new command (or option) is MVP.

Kind regards,
Daniel Sahlberg


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Mon, Mar 07, 2022 at 13:44:03 -0600:
> On 07 Mar 2022, Mark Phippard wrote:
> > > I do understand the reasons why Evgeny thought pre-fetching
> > > pristines for modified files as part of an 'update' could be a
> > > good idea.
> > 
> > My recollection of the first version of this patch, commit needed the
> > pristine and so had to fetch it before the commit happened. This may
> > have been a reason it seemed like a good idea at the time for update
> > to get the pristine.
> 
> Ah, maybe so; I didn't realize that.
> 
> If that was the motivation, then there's even less reason for 'update' to
> fetch pristines for modified files.  Having the pristine is not only
> unnecessary for the commit, in most cases having the pristine is not even
> particularly *useful* to the commit.  These types of files tend to be
> non-diffable anyway (i.e., not even binary diffable), broadly speaking and
> with occasional exceptions of course.  For example, a common such file is a
> gigantic gzipped blob.  Tiny changes in the uncompressed text will lead to a
> completely different gzipped blob.

And «update» could send a self-compressed delta anyway.

> (I suppose it might be the case that if the first change is made very late
> in the uncompressed text, then the revised gzipped blob can, under some
> real-world circumstances, actually be bit-for-bit the same as the original
> for a long initial prefix before showing any difference.  But this is a rare
> enough case that I don't think Subversion should be trying to detect it and
> support it.  We'd essentially have to incorporate the rsync rolling-checksum
> algorithm, or something like it, into our diff negotiation to even get any
> advantage.)

This use-case may be a rare one, but rsync _was_ in fact designed to
solve precisely the problem that «svn commit» of a pristineless file
needs to solve.  So, suppose we did use the rsync algorithm, would this
benefit any other use-cases other than the "first change is at the end
of the file" use-case you describe here?  Is it faster to commit a file
by sending a self-delta of it or by rsync'ing it?

Furthermore, the user may be able to deliberately create the huge file
in a way that makes it rsync-friendly: for instance, `svnadmin dump`
emits hashes in sorted order, which has the side-effect of making dump
files rsync-friendly.  For gzip files there is «gzip --rsyncable».

None of this is needed for the MVP, of course, but I do think the basic
principle of using rsync is in fact sound.

An alternative is to require the user to let svn know before they're
starting to edit a file, so we can create a pristine off the on-disk
file.  This way we won't have pristineless modified files in the first
place.

> And in the absence of fancy cross-network common-prefix detection code that
> we're not going to write, this would just be cost-shifting anyway.  Whatever
> commit-time improvement one would gain from having the pristine locally
> would be offset by the extra time spent fetching the pristine to make that
> commit-time improvement possible.

What assumptions is this conclusion valid under?  It seems to this
conclusion assumes, at least, that the uplink and downlink bandwidths
are equal and that the pristine will be needed exactly once (i.e.,
a hydrate-commit-dehydrate sequence).

I'm not objecting to making assumptions; we aren't going to address all
use-cases in 1.15.  I'm just asking that we make our assumptions explicit.

Cheers,

Daniel

> So... yeah.  Let's not do that :-).
> 
> Best regards,
> -Karl


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Sun, Mar 06, 2022 at 22:19:50 -0600:
> b) The failure mode of unnecessary fetching and storing is much worse than
> the failure mode of not having fetched a pristine that someone might turn
> out to want (there are workarounds for the latter);

What are some of those workarounds?

+1 to everything else.  API design is just a game of Simon Says :)

Thanks,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-07 Thread Karl Fogel

On 07 Mar 2022, Mark Phippard wrote:

I do understand the reasons why Evgeny thought pre-fetching
pristines for modified files as part of an 'update' could be a
good idea.


My recollection of the first version of this patch, commit needed 
the
pristine and so had to fetch it before the commit happened. This 
may
have been a reason it seemed like a good idea at the time for 
update

to get the pristine.


Ah, maybe so; I didn't realize that.

If that was the motivation, then there's even less reason for 
'update' to fetch pristines for modified files.  Having the 
pristine is not only unnecessary for the commit, in most cases 
having the pristine is not even particularly *useful* to the 
commit.  These types of files tend to be non-diffable anyway 
(i.e., not even binary diffable), broadly speaking and with 
occasional exceptions of course.  For example, a common such file 
is a gigantic gzipped blob.  Tiny changes in the uncompressed text 
will lead to a completely different gzipped blob.


(I suppose it might be the case that if the first change is made 
very late in the uncompressed text, then the revised gzipped blob 
can, under some real-world circumstances, actually be bit-for-bit 
the same as the original for a long initial prefix before showing 
any difference.  But this is a rare enough case that I don't think 
Subversion should be trying to detect it and support it.  We'd 
essentially have to incorporate the rsync rolling-checksum 
algorithm, or something like it, into our diff negotiation to even 
get any advantage.)


And in the absence of fancy cross-network common-prefix detection 
code that we're not going to write, this would just be 
cost-shifting anyway.  Whatever commit-time improvement one would 
gain from having the pristine locally would be offset by the extra 
time spent fetching the pristine to make that commit-time 
improvement possible.


So... yeah.  Let's not do that :-).

Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-03-07 Thread Mark Phippard
On Sun, Mar 6, 2022 at 11:19 PM Karl Fogel  wrote:

[snipped]

Agree with everything you have said.

> I do understand the reasons why Evgeny thought pre-fetching
> pristines for modified files as part of an 'update' could be a
> good idea.

My recollection of the first version of this patch, commit needed the
pristine and so had to fetch it before the commit happened. This may
have been a reason it seemed like a good idea at the time for update
to get the pristine.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-03-06 Thread Karl Fogel

On 04 Mar 2022, Julian Foad wrote:
I had a talk with Karl about this, and now I understand the 
concern much better.


(Karl, please correct anything I misrepresent.)


You've described it well, Julian.  Thank you (and thank you also 
for your patience in explaining to me the current State Of The 
Onion in a phone call, when I was still behind on reading dev@ 
posts -- I'm caught up now).


The one thing I would add to your summary below is that the 
concern on the client side is not just about wasted time (that is, 
the time spent fetching pristines for files that won't, in the 
end, actually need pristines locally).


The concern is also local *space*.  It's not unusual for one of 
these working copies to bring a local disk to within a few 
enormous files of full disk usage -- in other words, to be in a 
situation where fetching a certain number of pristines could 
result in the disk running out of space.  So if one has modified N 
of the large versioned files, and then an update brings down N 
correspondingly large pristines, well, hilarity could ensue :-).


But even beyond my experience with particular use cases, I think 
we should aim for the simplicity of a principle here:


Principle: When a file is checked out without its pristine, then 
SVN should never fetch that pristine unless we actually need to.


(Naturally, this principle applies, via the distributive property, 
to all the files in a fully pristine-less working copy.  Since in 
the future we may offer UI to allow working copies in which some 
files are checked out with pristine and some without, I am being 
careful to articulate the principle here as being about files 
rather than about working copies.)


The justification for this principle is that there's presumably a 
*reason* why the user requested that there be no pristine for that 
file.  Whatever that reason is, we have no reason to think we know 
better than the user does. 

The most likely reason is that the file is huge and the user 
doesn't want to pay the disk-space cost, nor the network-time cost 
in the case of updates for which the file hasn't changed in the 
repository.  But maybe the reason is something else.  Who knows? 
Not our business.  The user told SVN what they wanted, and SVN 
should do that thing.


Now, if the user runs an operation that requires a pristine, 
that's different -- then they're effectively notifying us that 
they're changing their decision.  We should obey the user in that 
case too.  It's just that it would be bad form for us to go 
fetching a pristine when a) the user already said they don't want 
it and b) SVN has no identifiable need for it in this operation.


I do understand the reasons why Evgeny thought pre-fetching 
pristines for modified files as part of an 'update' could be a 
good idea.  There would surely be _some_ occasions where a user 
would be pleasantly surprised to find that they have that pristine 
locally just when they need it.  But in the end, I believe that


a) In the most common use cases, it's probably not what the user 
wants anyway;


b) The failure mode of unnecessary fetching and storing is much 
worse than the failure mode of not having fetched a pristine that 
someone might turn out to want (there are workarounds for the 
latter);


c) It's generally better if we have a simple and comprehensible 
principle, like the one I articulated above.


Best regards,
-Karl

He shares the view that it would be unacceptable for 'svn update' 
to
fetch pristines of files that have become locally modified since 
the
previous fetch opportunity, but that are not actually being 
updated by

this update.

In his use cases a developer locally modifies some large 
files. The

developer also modifies some small files (such as 'readme' files
describing the large files). The developer doesn't need to diff 
or
revert the large files, and so chooses the checkout mode which 
doesn't

keep the pristines initially.

Before committing, the developer runs 'update', expecting to 
fetch any
remote changes to the small files (and not large files, not in 
this
case), expecting it to be quick, and then the developer continues 
work

and eventually commits.

The time taken to fetch the pristines of the large, modified 
files would
be long (for example, ten minutes). Taking a long time for the 
commit is
acceptable because the commit is the end of the work flow (and 
the
developer can go away or move on to something else while it 
proceeds).
The concern is that taking a long time at the update stage would 
be too disruptive.


It wouldn't be a problem for an operation that really needs the
pristines taking a long time. (Revert, for example.) The 
perception is
that update doesn't really need them. That is, while it obviously 
needs
in principle to fetch the new pristines of the files that need 
updating
to a new version from the server (or fetch a delta and so be able 
to
generate the pristine), it doesn't, in principle, need pristines 
of
files that it 

Re: A two-part vision for Subversion and large binary objects.

2022-03-04 Thread Mark Phippard
On Fri, Mar 4, 2022 at 3:52 PM Julian Foad  wrote:
>
> > Mark Phippard wrote:
> >> [...] For an update, I think it is unexpected and undesirable. [...]
>
> I had a talk with Karl about this, and now I understand the concern much 
> better.
>
> (Karl, please correct anything I misrepresent.)
>
> He shares the view that it would be unacceptable for 'svn update' to
> fetch pristines of files that have become locally modified since the
> previous fetch opportunity, but that are not actually being updated by
> this update.
>
> In his use cases a developer locally modifies some large files. The
> developer also modifies some small files (such as 'readme' files
> describing the large files). The developer doesn't need to diff or
> revert the large files, and so chooses the checkout mode which doesn't
> keep the pristines initially.
>
> Before committing, the developer runs 'update', expecting to fetch any
> remote changes to the small files (and not large files, not in this
> case), expecting it to be quick, and then the developer continues work
> and eventually commits.
>
> The time taken to fetch the pristines of the large, modified files would
> be long (for example, ten minutes). Taking a long time for the commit is
> acceptable because the commit is the end of the work flow (and the
> developer can go away or move on to something else while it proceeds).
> The concern is that taking a long time at the update stage would be too 
> disruptive.
>
> It wouldn't be a problem for an operation that really needs the
> pristines taking a long time. (Revert, for example.) The perception is
> that update doesn't really need them. That is, while it obviously needs
> in principle to fetch the new pristines of the files that need updating
> to a new version from the server (or fetch a delta and so be able to
> generate the pristine), it doesn't, in principle, need pristines of
> files that it isn't going to update. In this use case, it isn't going to
> update the large, locally modified files. And fetching their pristines
> wouldn't massively benefit the commit either, because they are poorly
> diffable kinds of files. So it is wasted time.
>
> If the implementation currently requires these pristines, that would
> seem to be an implementation detail and we would seek to change that.
>
> So my task now is to investigate for any way we can eliminate or
> optimise the unnecessary fetching, at least in this specific case.
>
> Filed as https://subversion.apache.org/issue/4892 .
>
> I will investigate this issue next week.

Thanks Julian. I think your summary here matches what I was
saying/thinking well.

Hope it works out.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-03-04 Thread Julian Foad
> Mark Phippard wrote:
>> [...] For an update, I think it is unexpected and undesirable. [...]

I had a talk with Karl about this, and now I understand the concern much better.

(Karl, please correct anything I misrepresent.)

He shares the view that it would be unacceptable for 'svn update' to
fetch pristines of files that have become locally modified since the
previous fetch opportunity, but that are not actually being updated by
this update.

In his use cases a developer locally modifies some large files. The
developer also modifies some small files (such as 'readme' files
describing the large files). The developer doesn't need to diff or
revert the large files, and so chooses the checkout mode which doesn't
keep the pristines initially.

Before committing, the developer runs 'update', expecting to fetch any
remote changes to the small files (and not large files, not in this
case), expecting it to be quick, and then the developer continues work
and eventually commits.

The time taken to fetch the pristines of the large, modified files would
be long (for example, ten minutes). Taking a long time for the commit is
acceptable because the commit is the end of the work flow (and the
developer can go away or move on to something else while it proceeds).
The concern is that taking a long time at the update stage would be too 
disruptive.

It wouldn't be a problem for an operation that really needs the
pristines taking a long time. (Revert, for example.) The perception is
that update doesn't really need them. That is, while it obviously needs
in principle to fetch the new pristines of the files that need updating
to a new version from the server (or fetch a delta and so be able to
generate the pristine), it doesn't, in principle, need pristines of
files that it isn't going to update. In this use case, it isn't going to
update the large, locally modified files. And fetching their pristines
wouldn't massively benefit the commit either, because they are poorly
diffable kinds of files. So it is wasted time.

If the implementation currently requires these pristines, that would
seem to be an implementation detail and we would seek to change that.

So my task now is to investigate for any way we can eliminate or
optimise the unnecessary fetching, at least in this specific case.

Filed as https://subversion.apache.org/issue/4892 .

I will investigate this issue next week.

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-03-02 Thread Julian Foad
Mark Phippard wrote:
> That comment specifically talks about diff. [...] For an update, I
> think it is unexpected and undesirable. [...]

You're right, the comment I pointed to doesn't do anything to justify
why 'update' should fetch it. And I agree it would be better if it
didn't. Maybe we will be able to optimise that case. I don't know.

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-03-01 Thread Mark Phippard
On Tue, Mar 1, 2022 at 10:34 AM Julian Foad  wrote:
>
> On Feb 18 2022, Mark Phippard wrote:
> >> [It fetches and stores pristines of modified files;] it doesn't mean
> >> "store no pristines" in that WC.
> >
> > I am curious what Karl thinks given that he is living this scenario
> > today and wants the feature. I would think that having update create
> > pristines for any modified file taints its usefulness. That said, it
> > is probably still better than what they have today and if the user is
> > on a fast network and disk space is not too big of an issue it might
> > not matter too much. I personally think this is the biggest issue to
> > solve though, more so than selectively choosing pristines for
> > different files. I think the feature just really does not "work as
> > advertised" if it is going to behave this way.
>
> Hello, Mark. Maybe Karl will yet answer, but I didn't want to leave this
> hanging any longer.
>
> This design was anticipated as far back as a 2006-06-09 comment on #525
> by Oswald Buddenhagen [1], where it is described as one of the
> possibilities among variations and alternatives. I'm not saying that
> justifies choosing it as the best solution, just that it's not arriving
> now from off the radar.

That comment specifically talks about diff. I think it is entirely
reasonable that for diff the feature works the way it does (fetch and
keep the pristine).

For an update, I think it is unexpected and undesirable. At least if
the HEAD revision of the file on the server is still the same as what
I had in my WC.

> We've already discussed how there are certainly scenarios where it won't
> be greatly helpful as well as scenarios where it will, and several
> people seem to think there are enough of the latter.
>
> Maybe, don't knock it till you've tried it?

I am really not knocking the overall feature. I am just saying that in
the scenario I described there is no way I would expect svn up to
fetch the pristines for files just because I have local mods. I think
for users with really large files ... which I assume are the main
target user ... it will make the feature less useful than it would be
if this behavior did not exist.

I am not this user. I am just projecting what I think they would want.
I was hoping Karl might chime in and/or interview his users about what
they might think. I personally think finding a solution to this would
be valuable.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-03-01 Thread Julian Foad
Summary of status of #525
=

Currently on the 'pristines-on-demand-on-mwf' branch.

Dev tasks in progress or outstanding:
--

* Multi-WC-format dependency (https://subversion.apache.org/issue/4883):

- is merged to trunk and reviewed;

- some outstanding items from review like UI tweaks, API
private-vs-public choices (in dev@ emails, and #4884 through #4887);

- some docs probably needed (release notes, help text).

* Per-WC config (https://subversion.apache.org/issue/4889):

- Quotes from #4889: "not strictly needed for MVP... MIGHT be good
to add the low level flagging mechanism now... not clear which would be
more effort."

- Last thing I wrote on 2022-02-16 [1]: "I do think we need an
explicit option to enable the feature by name, not just a WC version
number. I haven't yet worked out whether it must also be possible to
upgrade to 1.15 format without enabling the feature, and thus need to
store the feature-enable flag in the WC somewhere separate from the
format version number. For future developments of other wc features,
that will be needed; I just haven't finalised yet if it's essential for
MVP. Might be, in order to not cause compatibility issues for those
future scenarios."

* Issues arising in existing regression tests (#4888 and others):

- authz denied during textbase sync (#4888) -- in progress [2]

- about 12 other tests that were disabled or modified -- I have
started investigating and patching; need some further attention.


Community tasks outstanding:


* initiate a merge to trunk

* decide on a name for the feature


- Julian

[1] dev@ thread "A two-part vision for Subversion and large binary objects."
[2] dev@ thread "Pristines-on-demand: authz denied during textbase sync"



Re: A two-part vision for Subversion and large binary objects.

2022-03-01 Thread Julian Foad
On Feb 18 2022, Mark Phippard wrote:
>> [It fetches and stores pristines of modified files;] it doesn't mean
>> "store no pristines" in that WC.
> 
> I am curious what Karl thinks given that he is living this scenario
> today and wants the feature. I would think that having update create
> pristines for any modified file taints its usefulness. That said, it
> is probably still better than what they have today and if the user is
> on a fast network and disk space is not too big of an issue it might
> not matter too much. I personally think this is the biggest issue to
> solve though, more so than selectively choosing pristines for
> different files. I think the feature just really does not "work as
> advertised" if it is going to behave this way.

Hello, Mark. Maybe Karl will yet answer, but I didn't want to leave this
hanging any longer.

This design was anticipated as far back as a 2006-06-09 comment on #525
by Oswald Buddenhagen [1], where it is described as one of the
possibilities among variations and alternatives. I'm not saying that
justifies choosing it as the best solution, just that it's not arriving
now from off the radar.

We've already discussed how there are certainly scenarios where it won't
be greatly helpful as well as scenarios where it will, and several
people seem to think there are enough of the latter.

Maybe, don't knock it till you've tried it?

As for "as advertised", then surely the take-away message is we need to
be careful how we describe it; never call it "without pristines".

- Julian


[1] 
https://issues.apache.org/jira/browse/SVN-525?focusedCommentId=14911121=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14911121




Linking to the archives (was: Re: A two-part vision for Subversion and large binary objects.)

2022-02-19 Thread Daniel Shahaf
Julian Foad wrote on Fri, Feb 18, 2022 at 09:01:27 +:
> I asked about this in this thread a few weeks ago; you could see there
> for further discussion. (I tried to dig up a link but having trouble
> finding myself in the archives.)

FWIW, if you have a mail locally, you can pipe it to

to get a working archive URL for it.  (The first run will be a little
slow since it will do a network access to seed a local cache.)

And if you don't find a mail locally, you can always point to it by its
From/To/Cc/Subject/Date/Message-ID tuple.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-02-18 Thread Mark Phippard
On Fri, Feb 18, 2022 at 3:21 PM Julian Foad  wrote:
>
> Karl Fogel wrote:
> > Is the above happening in MVP?
>
> Yes. I was describing what Evgeny created last year in the 
> 'pristines-on-demand' branch.
>
> > I ask because my understanding of
> > MVP was that it's not doing this opportunistic fetching/discarding
> > of bases, but rather that it's a simple per-WC setting that means
> > "store no pristines in this WC" and that's that.
>
> MVP is what Evgeny created, but selectively turned on, or not, per WC. When 
> turned on, it doesn't mean "store no pristines" in that WC.

I am curious what Karl thinks given that he is living this scenario
today and wants the feature. I would think that having update create
pristines for any modified file taints its usefulness. That said, it
is probably still better than what they have today and if the user is
on a fast network and disk space is not too big of an issue it might
not matter too much. I personally think this is the biggest issue to
solve though, more so than selectively choosing pristines for
different files. I think the feature just really does not "work as
advertised" if it is going to behave this way.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-18 Thread Julian Foad
Karl Fogel wrote: 
> Is the above happening in MVP? 

Yes. I was describing what Evgeny created last year in the 
'pristines-on-demand' branch.

> I ask because my understanding of 
> MVP was that it's not doing this opportunistic fetching/discarding 
> of bases, but rather that it's a simple per-WC setting that means 
> "store no pristines in this WC" and that's that.

MVP is what Evgeny created, but selectively turned on, or not, per WC. When 
turned on, it doesn't mean "store no pristines" in that WC.

I'm thinking two things would make the explanation more accessible.

1. Docs: release notes is a good start (thanks Nathan) but somewhere more (svn 
help?) too.

2. Feedback: svn should print progress notifications (maybe gated on 
'--verbose'), that make clear when and what it is doing with the pristines.

Number 2 would make a great volunteer contribution if anyone's willing to dip 
in to the branch code. It's just a matter of extending our existing notifier 
callback and making the hydrate/dehydrate call back to it.
- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-02-18 Thread Karl Fogel

On 18 Feb 2022, Julian Foad wrote:
To understand, we need to recap that this design is based around 
a
simple invariant: whenever a file is seen to be locally modified, 
at the
next convenient opportunity we will download its base; and when 
seen to

be not-modified we will discard its base. It is not a
fetch-at-point-of-access design.


This seems like a good principle for the long run (and 
well-articulated above, thank you!).


Is the above happening in MVP?  I ask because my understanding of 
MVP was that it's not doing this opportunistic fetching/discarding 
of bases, but rather that it's a simple per-WC setting that means 
"store no pristines in this WC" and that's that.


That means it would be up to the user to use the available 
workarounds if they need to do things (like local diff) that would 
require a pristine.  Fortunately those workarounds are easy: they 
just involve copying a file sometimes before you start working on 
it :-).


Just to be be super clear: my question here is solely about MVP -- 
about the first released version of a usable 525-enabled 
Subversion -- not about the longer term plans, which I agree are 
excellent and will make the feature even better.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-18 Thread Julian Foad
Mark Phippard wrote:
>> Update starts by hydrating. That means it WILL download any missing
>> pristines of modified files, regardless whether any newer revision
>> will be found.
> 
> Does the possibility exist to optimize this at all? [...]

To understand, we need to recap that this design is based around a
simple invariant: whenever a file is seen to be locally modified, at the
next convenient opportunity we will download its base; and when seen to
be not-modified we will discard its base. It is not a
fetch-at-point-of-access design.

This design uses a "wrapper" structure, localising the hydration and
dehydration steps at the top level of subcommands, outside the command
logic, not inside the command logic. It does not know which text bases
the command will need, and instead works on the principle that it will
fetch all that might be needed. Evgeny and others suggested that
localizing the network access to the start of each subcommand was better
(for client software and users) than introducing the possibility for
network access to be required at arbitrary points inside the command logic.

The rationale is that in the design use cases, fetching is acceptably
cheap both in network speed and in the availability of sufficient
storage space for text bases for the files that are locally modified.

Possibilities for optimisation may exist but the kind of optimisation
you are aksing about would require a different design, one in which the
fetcher knows what is really needed by the current operation, so one
where the fetch is pushed down in the logic to nearer the point of
access. There is limited scope for it in this design.

Personally I would like us to explore the fetch-at-point-of-access
design alternative. While introducing one kind of complexity (network
access requested at arbitrary points during a command) I feel it would
reduce the kind of complexity we're discussing here (unnecessary fetches
and consequent attempts at optimisation). But that's not what we're
exploring right now.

I asked about this in this thread a few weeks ago; you could see there
for further discussion. (I tried to dig up a link but having trouble
finding myself in the archives.)

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Mark Phippard
On Thu, Feb 17, 2022 at 5:52 PM Julian Foad  wrote:
>
> Mark Phippard wrote:
> >> | update/switch | Always | Always + Hydrate |
> >
> >Can you expand on this one a bit? I presume what you mean is if you
> >have local mods to a file and run update/switch and there is a newer
> >revision of the file in the repository it will hydrate that new
> >version? But otherwise, I assume just running update does not create
> >pristines.
> Let's delete the word "Always" from both columns.
>
> Update starts by hydrating. That means it WILL download any missing pristines 
> of modified files, regardless whether any newer revision will be found.

Does the possibility exist to optimize this at all? Say I am working
on a large binary that I have edited. My colleague checks in a
different file that I want to look at so I run update. Now SVN is
going to unnecessarily download another copy of that large binary.

The WC knows the revision, I do not see why it would download the file
from the server on update if it is not a newer version.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Nathan Hartman
On Thu, Feb 17, 2022 at 5:09 PM Julian Foad  wrote:
>
> Awesome, Nathan! I was going to say this is clearly a priority. Thanks
> so much for writing that. It is so much easier to iterate on it now you
> have begun. At first glance there is not much I would add or change.

Glad to help!

> Not sure about the word "bare"; for now I'll read it as a place-holder
> for whatever term we agree on.

Agreed.

> Suggested edits below.

Thanks for the feedback!

I think it will be best if I go ahead and create the template 1.15
release notes on site/staging and add there the text we have so far
(including your suggested edits). Then we can hack on it as much as we
want. That will be easier than accumulating too many suggested
improvements in the mailing list, where it is easy to lose track of
them...

Cheers,
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Julian Foad
Mark Phippard wrote:
>> | update/switch | Always | Always + Hydrate |
>
>Can you expand on this one a bit? I presume what you mean is if you
>have local mods to a file and run update/switch and there is a newer
>revision of the file in the repository it will hydrate that new
>version? But otherwise, I assume just running update does not create
>pristines.
Let's delete the word "Always" from both columns.

Update starts by hydrating. That means it WILL download any missing pristines 
of modified files, regardless whether any newer revision will be found.

- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Mark Phippard
On Thu, Feb 17, 2022 at 5:09 PM Julian Foad  wrote:
>
> Awesome, Nathan!

I agree.

> | update/switch | Always | Always + Hydrate |

Can you expand on this one a bit? I presume what you mean is if you
have local mods to a file and run update/switch and there is a newer
revision of the file in the repository it will hydrate that new
version? But otherwise, I assume just running update does not create
pristines.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Julian Foad
Awesome, Nathan! I was going to say this is clearly a priority. Thanks
so much for writing that. It is so much easier to iterate on it now you
have begun. At first glance there is not much I would add or change.

Not sure about the word "bare"; for now I'll read it as a place-holder
for whatever term we agree on.

Suggested edits below.

Nathan Hartman wrote:
[...]
> An initial, very-rough, draft, of what might go in the 1.15 release
> notes, at least as I see this feature. Obviously, let me know if I'm
> totally off here!
> 
> New Client Features -> Bare Working Copies
> 
> All Subversion working copies require extra storage space in addition
> to the size of the checked out files.
> 
> By default, the total storage space required is slightly more than
> double the size of the checked out files. Subversion uses most of that
> extra space to cache each file's BASE revision so that many operations
> can work faster and offline.

"so that operations such as diff and revert can work offline, and
commit can send just the modified portions of a file to the repository
server rather than the whole file. This optimises the speed and
availability of these operations, on the assumption that network
throughput to the server is often a bottleneck."

> Starting in 1.15, users can check out a bare working copy to cut the
> storage requirement by up to 50%. Instead of caching the BASE revision
> of all files all the time, Subversion will only fetch and cache those
> of individual files when needed, and will eliminate them when no
> longer needed. The space savings come at a tradeoff of requiring a
> connection to the repository for more operations as compared to a
> normal working copy and may, depending on network speeds and file
> sizes, introduce a perceptible delay when a BASE file is downloaded.
> 
> This feature is motivated by use cases involving very large versioned
> files that change infrequently, where keeping the cached BASE copy
> wastes space and provides little or no benefit. This feature may also
> be useful in other scenarios, such as where a very fast connection to
> the server is available, the repository is local, available storage
> space is very limited, etc.
> 
> To check out a bare working copy:
> 
> $ svn checkout --foo --bar $REPO $WC
> 
> The command to check out a normal working copy is unchanged.

The following table lists the Subversion commands that behave
differently in a bare working copy. For each command, it shows the
difference in how that command accesses the repository.

> +--+---+
> |  | Working Copy Type |
> +--+---+---+
> | Command  | Normal| Bare  |
> +--+---+---+

| cat (BASE) | No | Hydrate |
| commit | Send-Delta | Send-Full |
| conflict resolving (resolve/merge/up/sw) | Sometimes | Sometimes (...) |
| diff (BASE) | No | Hydrate |
| revert | No | Hydrate |
| update/switch | Always | Always + Hydrate |

> Legend:

* Hydrate: this operation downloads and keeps the file's base revision,
for each file that has a local content modification ('svn status' shows
'M' in the 1st column) and its base is not already stored in the working
copy [1][2].

* Send-Delta: sends just the locally modified parts of each file's content.

* Send-Full: sends the complete content of each locally modified file.

* No: does not contact the server.

* Always: always contacts the server.

Once downloaded, Subversion keeps a file's base locally cached in the
working copy, so that further operations on the file will not download
the base from the repository again. It keeps the base in this way until
one of these operations either restores the file to an unmodified state
or detects that the file is no longer modified. For example, "commit"
and "revert" will immediately discard the base of each file they
operated on, because that file will no longer be locally modified,
whereas "diff" will discard the base only if it finds there are no differences.


[1] At the beginning of a given operation, Subversion will download
missing bases of *at least* the files that this particular operation
will use. It may download those of other files too, that this particular
operation will not use. For example, in the initial implementation of
this feature, Subversion considers all potential files in the smallest subtree
that spans all the target files of the operation. The details of this
behaviour are subject to change before and after the feature is released.

[2] In evaluating differences between a file's working text and its base
text, Subversion takes into account the "EOL style" and "keywords"
settings. (See the 'svn:eol-style' and 'svn:keywords' properties.)
Just as 'svn status' does not show 'M' in the first column for such
differences, neither will these cause the base to be downloaded from the 
repository.




Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Nathan Hartman
On Thu, Feb 17, 2022 at 9:41 AM Julian Foad  wrote:
>
> Mark Phippard wrote:
> [...]
> >currently there is no pristine. Now I run svn diff and I see the
> >result. The command ends ... there is still no pristine. [...] If, when the
> >command finishes, there are no pristines stored on disk ... then there
> >are no pristines.
> The bit you are missing is, at the end of a diff command, the diffed file is 
> still locally modified so the pristine is not felted then, it is kept in the 
> store indefinitely. Only when some subsequent command finds it unmodified, 
> only then will it be removed. (No matter whether it became unmodified by an 
> svn operation such as revert or commit, in which case the command that did 
> that would have removed the pristine; or if it was reverted by the user 
> without involving svn.)
>
> I agree about the insufficiency of current docs and how the commands could be 
> listed in full.

An initial, very-rough, draft, of what might go in the 1.15 release
notes, at least as I see this feature. Obviously, let me know if I'm
totally off here!

New Client Features -> Bare Working Copies

All Subversion working copies require extra storage space in addition
to the size of the checked out files.

By default, the total storage space required is slightly more than
double the size of the checked out files. Subversion uses most of that
extra space to cache each file's BASE revision so that many operations
can work faster and offline.

Starting in 1.15, users can check out a bare working copy to cut the
storage requirement by up to 50%. Instead of caching the BASE revision
of all files all the time, Subversion will only fetch and cache those
of individual files when needed, and will eliminate them when no
longer needed. The space savings come at a tradeoff of requiring a
connection to the repository for more operations as compared to a
normal working copy and may, depending on network speeds and file
sizes, introduce a perceptible delay when a BASE file is downloaded.

This feature is motivated by use cases involving very large versioned
files that change infrequently, where keeping the cached BASE copy
wastes space and provides little or no benefit. This feature may also
be useful in other scenarios, such as where a very fast connection to
the server is available, the repository is local, available storage
space is very limited, etc.

To check out a bare working copy:

$ svn checkout --foo --bar $REPO $WC

The command to check out a normal working copy is unchanged.

The following table lists all Subversion commands and whether they
need to access the repository:

+--+---+
|  | Working Copy Type |
+--+---+---+
| Command  | Normal| Bare  |
+--+---+---+
| add  | Never | Never |
| .|   |   |
| .|   |   |
| .|   |   |
| checkout | Always| Always|
| .|   |   |
| .|   |   |
| .|   |   |
| diff | Remote URL| When modified |
| .|   |   |
| .|   |   |
| .|   |   |
| revert   | Never | Always|
| .|   |   |
| .|   |   |
| .|   |   |
+--+---+---+

Legend:

* Never: This operation never contacts the repository.

* Remote URL: This operation contacts the repository only if given a
  repository path. It does not contact the repository when operating
  on a local path.

* When Modified: This operation contacts the repository when the path
  in question is locally modified ('svn status' shows 'M' in the 1st
  column) or is provided with a repository URL.

* Always: This operation always contacts the repository.

Additional Details:

When operating on a bare working copy, the Subversion client will
download the BASE revision of a file when it detects that the file is
locally modified and an operation involving that file requires the
BASE revision.

Once downloaded, the BASE revision will remain locally cached until a
further operation either restores the file to an unmodified state or
detects that the file is no longer modified.

Cheers,
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Mark Phippard
On Thu, Feb 17, 2022 at 9:41 AM Julian Foad  wrote:
>
> Mark Phippard wrote:
> [...]
> >currently there is no pristine. Now I run svn diff and I see the
> >result. The command ends ... there is still no pristine. [...] If, when the
> >command finishes, there are no pristines stored on disk ... then there
> >are no pristines.
> The bit you are missing is, at the end of a diff command, the diffed file is 
> still locally modified so the pristine is not felted then, it is kept in the 
> store indefinitely. Only when some subsequent command finds it unmodified, 
> only then will it be removed. (No matter whether it became unmodified by an 
> svn operation such as revert or commit, in which case the command that did 
> that would have removed the pristine; or if it was reverted by the user 
> without involving svn.)
>
> I agree about the insufficiency of current docs and how the commands could be 
> listed in full.

Thanks Julian, that is clearer. FWIW, this is the wording that confused me:

"The operations also include a final step during which the no longer
required text-bases are removed from disk"

The use of "the operations" made me think it was the same operation.
IOW, I read this as the "diff operation" includes a final step that
does this cleanup. What you seem to be saying is some other future
operation like revert or commit is what would clean it up.

I agree this is a reasonable way for the feature to behave BTW. At
least assuming there are no "surprises" about what operations trigger
the need to fetch the pristine.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Julian Foad
Mark Phippard wrote:
[...]
>currently there is no pristine. Now I run svn diff and I see the
>result. The command ends ... there is still no pristine. [...] If, when the
>command finishes, there are no pristines stored on disk ... then there
>are no pristines.
The bit you are missing is, at the end of a diff command, the diffed file is 
still locally modified so the pristine is not felted then, it is kept in the 
store indefinitely. Only when some subsequent command finds it unmodified, only 
then will it be removed. (No matter whether it became unmodified by an svn 
operation such as revert or commit, in which case the command that did that 
would have removed the pristine; or if it was reverted by the user without 
involving svn.)

I agree about the insufficiency of current docs and how the commands could be 
listed in full.

- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Mark Phippard
On Wed, Feb 16, 2022 at 9:07 AM Mark Phippard  wrote:

> > FWIW, I just assumed that this *isn't* the intended entry point to
> > the feature.  That is, it's just how things happen to be on the
> > branch right now, but (presumably) Julian isn't saying that he
> > thinks this is how users should access the feature in real life.
>
> I also assume that to be the case but want to confirm.
>
> My "assumption" is that the 1.15 WC format includes some new database
> indicator(s) that specify whether or not pristines are being stored
> but the default 1.15 format would include pristines. There will be
> some other option that creates the 1.15 format but with the database
> indicator(s) set to indicate that pristines are NOT being stored.
>
> Presumably there will be some new UX as being discussed that
> implicitly creates a 1.15 format WC with these indicators set.
>
> So really the only use case for creating a 1.15 format using this more
> generic syntax is based on some future version of SVN that lets you
> selectively change this setting after a WC is created? Perhaps on a
> file/folder by file/folder basis.

Setting aside the bikeshedding on what we call this new feature ...
this is the behavior I would expect:

$ svn checkout==OR==
$ svn checkout --compatible-version=1.14

Creates a 1.14 compatible WC with pristines

$ svn checkout --compatible-version=1.15

Creates a 1.15 compatible WC with pristines ... there is currently no
reason for a user to do this but it leaves open the option for future
commands and options to selectively hydrate/dehydrate on a file by
file basis.

$ svn checkout --bare  ==OR==
$ svn checkout --compatible-version=1.15 --bare

Bikeshedding aside ... this creates a 1.15 compatible WC without pristines

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-17 Thread Mark Phippard
On Wed, Feb 16, 2022 at 9:59 AM Julian Foad  wrote:
>
> Mark Phippard wrote:
> >> "The core idea is that we start to maintain the following invariant: only 
> >> the modified files have their pristine text-base files available on the 
> >> disk."
> >> (https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf/BRANCH-README)
> >
> >That was where I read it! thanks
>
> (The Readme is Evgeny's text AFAIK.)
>
> >So this text confuses me and makes me assume I am not reading it
> >correctly. Suppose I use this new feature to checkout a new WC without
> >any pristines. I make edits to a large binary file using some tool. At
> >this point, SVN does not even know I have done anything so I still
> >have no pristines.
>
> Correct.
>
> >If I run svn status it will show me the file is modified. Are you
> >saying that when I do this, SVN is going to pull down a pristine from
> >the server?
>
> Not for "status". Does the further description from the readme help?:
> """
>   - To get into the appropriate state at the beginning of the operation, we 
> walk through the current text-base info in the db and check if the 
> corresponding working files are modified. The missing text-bases are fetched 
> using the svn_ra layer.

Yes and No. To truly understand this the way it is written requires a
lot of internal knowledge about how SVN works. As someone whose
knowledge is more as a user I am not clear what situations would
require the missing text-bases to be fetched. I think that has been
the heart of my question all along. I certainly understand they are
needed for diff and revert, for example. I would want to know if the
more day to day update/commit cycle needs them. If they do not, then
it sounds good to me.

> The operations also include a final step during which the no longer required 
> text-bases are removed from disk.

This is slightly confusing. I assume it means that the pristines are
not truly stored on disk in that by the time the command finishes they
are gone again? So if I run svn diff, it will fetch the pristines, use
them to complete the operation, and then discard them? FWIW, if this
is how it works I think that is good.

>   - The operations that don't need to access the text-bases (such as "svn ls" 
> or the updated "svn st") do not perform this walk and do not synchronize the 
> text-base state.
> """

So I would just reiterate the earlier comment that it is unclear which
commands need to fetch pristines. It seems like it would be relatively
easy to just spell it out in the docs? It has to be a smallish number
of commands.

>
> > [...] My assumption is that if I have one of these new types of WC's that I
> > will NEVER have any "pristines".
>
> Not correct, for this design.

If the pristines are discarded at the end of a command, as I noted
above, then I think it would be fair to simply describe this feature
as a "bare WC" without pristines. The fact that the pristines
temporarily exist during certain commands is an irrelevant detail.
After all, they also exist during checkout before the final version of
the file is written to disk with line endings and keywords expanded.

>
> >Please enlighten me as to when pristines will be created and stored
> >and why I would want SVN to do that when I asked for no pristines? I
> >think I must be overlooking something obvious.
>
> In my own words now: In this design, pristines are kept locally for modified 
> files (not never). During the new pristines-sync phase at the beginning of 
> any operation that might[1] want the pristine (e.g. diff, but *not* status), 
> if a file is detected to be locally modified and has no pristine locally, 
> then the pristine is fetched ("hydrated") and then kept locally... until, 
> during a sync pass at the end of any such operation, if any file is detected 
> to be not-modified and its pristine is present locally: then it is cleaned up 
> (dehydrated).
>
> Hope that's getting clearer.

This all sounds super internal. It might be good for developer level
explanation but not user level. If I have modified a file and
currently there is no pristine. Now I run svn diff and I see the
result. The command ends ... there is still no pristine. The fact that
something existed during the operation in order to produce the result
is uninteresting to me and it confuses the explanation. If, when the
command finishes, there are no pristines stored on disk ... then there
are no pristines.

Does this make sense?

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-16 Thread Julian Foad
Mark Phippard wrote:
>> "The core idea is that we start to maintain the following invariant: only 
>> the modified files have their pristine text-base files available on the 
>> disk."
>> (https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf/BRANCH-README)
>
>That was where I read it! thanks

(The Readme is Evgeny's text AFAIK.)

>So this text confuses me and makes me assume I am not reading it
>correctly. Suppose I use this new feature to checkout a new WC without
>any pristines. I make edits to a large binary file using some tool. At
>this point, SVN does not even know I have done anything so I still
>have no pristines.

Correct.

>If I run svn status it will show me the file is modified. Are you
>saying that when I do this, SVN is going to pull down a pristine from
>the server?

Not for "status". Does the further description from the readme help?:
"""
  - To get into the appropriate state at the beginning of the operation, we 
walk through the current text-base info in the db and check if the 
corresponding working files are modified. The missing text-bases are fetched 
using the svn_ra layer. The operations also include a final step during which 
the no longer required text-bases are removed from disk.
  - The operations that don't need to access the text-bases (such as "svn ls" 
or the updated "svn st") do not perform this walk and do not synchronize the 
text-base state. 
"""

> [...] My assumption is that if I have one of these new types of WC's that I
> will NEVER have any "pristines".

Not correct, for this design.

>Please enlighten me as to when pristines will be created and stored
>and why I would want SVN to do that when I asked for no pristines? I
>think I must be overlooking something obvious.

In my own words now: In this design, pristines are kept locally for modified 
files (not never). During the new pristines-sync phase at the beginning of any 
operation that might[1] want the pristine (e.g. diff, but *not* status), if a 
file is detected to be locally modified and has no pristine locally, then the 
pristine is fetched ("hydrated") and then kept locally... until, during a sync 
pass at the end of any such operation, if any file is detected to be 
not-modified and its pristine is present locally: then it is cleaned up 
(dehydrated).

Hope that's getting clearer.

[1] "might want": false positives exist, as noted in this thread a few weeks 
ago.

- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-02-16 Thread Mark Phippard
On Wed, Feb 16, 2022 at 2:53 AM Karl Fogel  wrote:

> >Are you saying this is how you would activate this no-pristines
> >feature? If so, that sounds like a poor UX. As a user, I would
> >not
> >expect the version number to be connected to a feature like that
> >Or
> >more accurately, I could understand if you need a 1.15 working
> >copy to
> >enable a WC format that tracks whether or not pristines are
> >available
> >but I would not expect the version number alone to be the factor
> >that
> >makes this decision. Does that make sense?
>
> FWIW, I just assumed that this *isn't* the intended entry point to
> the feature.  That is, it's just how things happen to be on the
> branch right now, but (presumably) Julian isn't saying that he
> thinks this is how users should access the feature in real life.

I also assume that to be the case but want to confirm.

My "assumption" is that the 1.15 WC format includes some new database
indicator(s) that specify whether or not pristines are being stored
but the default 1.15 format would include pristines. There will be
some other option that creates the 1.15 format but with the database
indicator(s) set to indicate that pristines are NOT being stored.

Presumably there will be some new UX as being discussed that
implicitly creates a 1.15 format WC with these indicators set.

So really the only use case for creating a 1.15 format using this more
generic syntax is based on some future version of SVN that lets you
selectively change this setting after a WC is created? Perhaps on a
file/folder by file/folder basis.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-16 Thread Mark Phippard
On Wed, Feb 16, 2022 at 4:16 AM Julian Foad  wrote:

> "The core idea is that we start to maintain the following invariant: only the 
> modified files have their pristine text-base files available on the disk."
> (https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf/BRANCH-README)

That was where I read it! thanks

So this text confuses me and makes me assume I am not reading it
correctly. Suppose I use this new feature to checkout a new WC without
any pristines. I make edits to a large binary file using some tool. At
this point, SVN does not even know I have done anything so I still
have no pristines.

If I run svn status it will show me the file is modified. Are you
saying that when I do this, SVN is going to pull down a pristine from
the server? That seems very unlikely to me but I cannot otherwise
imagine what your wording in the BRANCH-README would be describing. My
assumption is that if I have one of these new types of WC's that I
will NEVER have any "pristines".

Please enlighten me as to when pristines will be created and stored
and why I would want SVN to do that when I asked for no pristines? I
think I must be overlooking something obvious.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-16 Thread Julian Foad
Re. being unsure exactly what the feature behaviour is: we should have a clear 
description somewhere permanent. Release notes for a start. Help text as well? 
For now, the BRANCH-README says it this way:

"The core idea is that we start to maintain the following invariant: only the 
modified files have their pristine text-base files available on the disk."
(https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf/BRANCH-README)

And it gives more detail.

Re. intended UI for enabling it: I do think we need an explicit option to 
enable the feature by name, not just a WC version number. I haven't yet worked 
out whether it must also be possible to upgrade to 1.15 format without enabling 
the feature, and thus need to store the feature-enable flag in the WC somewhere 
separate from the format version number. For future developments of other wc 
features, that will be needed; I just haven't finalised yet if it's essential 
for MVP. Might be, in order to not cause compatibility issues for those future 
scenarios.

- Julian

- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Karl Fogel

On 15 Feb 2022, Nathan Hartman wrote:
Possibly bikeshedding a bit, but this seems to return to the idea 
of
"turning on" what we are (tentatively) calling "local 
base"... IMHO it
would be better if it were reversed to "--remote-base=yes" to 
convey

that this is non-default and opt-in. (Or possibly allow both.)


The reason I shy away from the "--remote-base=foo" name is that 
there is *always* a remote base anyway.  Even when one has 
pristines locally, there is also a remote pristine available (and 
indeed the server makes use of it sometimes).  So that name would 
be misleading, and for more knowledgeable users, even confusing.



Alternatively...

As a command line switch, how about:

"svn checkout --base=local $REPO $WC"
or
"svn checkout --base=remote $REPO $WC"


This implies a symmetry between control of local-base presence and 
control of remote-base presence, but there is no such symmetry. 
The only thing this feature can ever control is the presence of 
local bases, so I think it would be a mistake to say anything 
about remote bases when addressing it.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Karl Fogel

On 15 Feb 2022, Mark Phippard wrote:
On Tue, Feb 15, 2022 at 12:00 PM Julian Foad 
 wrote:
Currently: "svn checkout --compatible-version=1.15". No feature 
name
involved. Not saying that's good, just that's the current 
state.


Are you saying this is how you would activate this no-pristines
feature? If so, that sounds like a poor UX. As a user, I would 
not
expect the version number to be connected to a feature like that 
Or
more accurately, I could understand if you need a 1.15 working 
copy to
enable a WC format that tracks whether or not pristines are 
available
but I would not expect the version number alone to be the factor 
that

makes this decision. Does that make sense?


FWIW, I just assumed that this *isn't* the intended entry point to 
the feature.  That is, it's just how things happen to be on the 
branch right now, but (presumably) Julian isn't saying that he 
thinks this is how users should access the feature in real life.


I have not followed every email on this topic but feel like I 
have
lost understanding of what the feature will do. I thought the 
original
goal was "I have a lot of large binaries, I would like to have a 
WC

with no pristines in it"


AFAIU, that's the "MVP" goal here, yup.


Assuming I have a WC with large binaries:

* I am not going to use diff
* If I commit a change, I would like to just send the new file to 
the

server and let it figure it all out
* If I revert, yeah I will need a new copy sent to me
* If I update, and do not have local mods, I will just get a new 
copy

of file that replaces what I have

* If I update, and have local mods ... less sure what should 
happen.
Is this the scenario where you create a pristine? If it is not a 
file

where we can do a text merge, then I guess I would just want my
version of the file to remain and ideally not even get the new 
file

sent to me. If I change my mind, I will do a revert.

Personally, I think a toggle for the whole WC would be fine. And 
even
if I have text files too, the handful of operations like diff 
where it
downloads a pristine to do the diff might be fine as long as it 
works.

Performance might even be fine.

It sounds like this is not how it works?


I'm not sure that the "fetch base from server if necessary" 
behavior is part of the MVP (it might be, I'm just not sure -- 
there are decent workarounds if it's not, after all).


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Nathan Hartman
On Tue, Feb 15, 2022 at 2:22 PM Karl Fogel  wrote:
> As a command-line option for per-WC behavior, it might be
> something like this on checkout:
>
>   --local-base=no
>
> When the option is not provided, the default would be "yes" of
> course (in a sense, it's been defaulting to "yes" for decades :-)
> ).
>
> As a configuration option, it would be something like this:

Possibly bikeshedding a bit, but this seems to return to the idea of
"turning on" what we are (tentatively) calling "local base"... IMHO it
would be better if it were reversed to "--remote-base=yes" to convey
that this is non-default and opt-in. (Or possibly allow both.)

Alternatively...

As a command line switch, how about:

"svn checkout --base=local $REPO $WC"
or
"svn checkout --base=remote $REPO $WC"

The default of "--base=local" would be, as Karl points out, the same
behavior as in past releases, unless the user configures otherwise.

When checking out, it would only be necessary to specify
"--base=local" or "--base=remote" if that differs from the configured
(or implied) default.

A possible future way to change pristine storage in an existing
working copy:

"svn update --set-base=local"
or
"svn update --set-base=remote"

That conceptually mirrors the current "svn checkout --depth=xxx" and
"svn update --set-depth=xxx", (modulo the fact that --depth=xxx has
another meaning for operations beside checkout).

Hopefully this doesn't cause confusion with, e.g., "--accept=base".

No opinion yet regarding configuration option naming.

Cheers,
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Mark Phippard
On Tue, Feb 15, 2022 at 12:00 PM Julian Foad  wrote:
>
> Karl Fogel wrote:
> > [...] there has to be some way for the user to specify at checkout
> > time [...]
>
> Currently: "svn checkout --compatible-version=1.15". No feature name
> involved. Not saying that's good, just that's the current state.

Are you saying this is how you would activate this no-pristines
feature? If so, that sounds like a poor UX. As a user, I would not
expect the version number to be connected to a feature like that Or
more accurately, I could understand if you need a 1.15 working copy to
enable a WC format that tracks whether or not pristines are available
but I would not expect the version number alone to be the factor that
makes this decision. Does that make sense?

I have not followed every email on this topic but feel like I have
lost understanding of what the feature will do. I thought the original
goal was "I have a lot of large binaries, I would like to have a WC
with no pristines in it"

I thought that was also what the original PoC that Evgeny created did.
He would fetch a pristine if he needed to (such as for revert) but he
would not otherwise store it. Again ... just my understanding and
recollection.

One of your earlier responses seemed to indicate we are creating
pristines and storing them at some point. What would be the scenario
where the original user request would want that?

Assuming I have a WC with large binaries:

* I am not going to use diff
* If I commit a change, I would like to just send the new file to the
server and let it figure it all out
* If I revert, yeah I will need a new copy sent to me
* If I update, and do not have local mods, I will just get a new copy
of file that replaces what I have

* If I update, and have local mods ... less sure what should happen.
Is this the scenario where you create a pristine? If it is not a file
where we can do a text merge, then I guess I would just want my
version of the file to remain and ideally not even get the new file
sent to me. If I change my mind, I will do a revert.

Personally, I think a toggle for the whole WC would be fine. And even
if I have text files too, the handful of operations like diff where it
downloads a pristine to do the diff might be fine as long as it works.
Performance might even be fine.

It sounds like this is not how it works?

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Karl Fogel

On 15 Feb 2022, Nathan Hartman wrote:

How about:

Remote BASE

(as opposed to Local BASE).

The idea here being that BASE is a concept with which users 
should be

familiar, while pristines are part of Subversion's implementation
under the hood.


Getting closer, I think!  "base" seems like a good word -- more 
familiar to most users than "pristine" would be, and the meaning 
really is pretty spot-on, since we've been supporting "-rBASE" 
since forever.


As a command-line option for per-WC behavior, it might be 
something like this on checkout:


 --local-base=no

When the option is not provided, the default would be "yes" of 
course (in a sense, it's been defaulting to "yes" for decades :-) 
).


As a configuration option, it would be something like this:

 ### Section for configuring working copies.
 [working-copy]
 no-local-base-FOO = SOME_VALUE

Now, I don't know what FOO and SOME_VALUE are yet -- they will 
vary, because we'll want various behaviors.  Sometimes you'll want 
to say "no local base when checking out from this particular 
repository".  Later, when we support finer-grained local base 
control than just per-WC, we'll want to be able to say "no local 
base for files larger than size X".  And maybe we'll want to say 
"no local base for files that have the following property or 
prop/val combination".  E.g.,


 ### Section for configuring working copies.
 [working-copy]
 no-local-base-repositories = [LIST OF REGEXPS TO MATCH AGAINST]
 no-local-base-properties = BLAH BLAH
 no-local-base-size-threshold = 1GB
 no-local-base-FOO = etc, etc

We don't have to figure out that config UI right now.  I'm just 
trying to figure out the primary user-facing terminology for the 
feature, and maybe "local base" is it.


Thoughts?

Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Julian Foad
Karl Fogel wrote:
> [...] there has to be some way for the user to specify at checkout
> time [...]

Currently: "svn checkout --compatible-version=1.15". No feature name
involved. Not saying that's good, just that's the current state.

> [...] Those are *descriptions* [...]

Yes; hoping to inspire ideas.

Nathan Hartman wrote:
> Remote BASE

That's not bad.

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-02-15 Thread Nathan Hartman
On Mon, Feb 14, 2022 at 4:54 PM Karl Fogel  wrote:

>
> ROTFL :-).  I'll take #2 with a side of onion rings, please.
>
> Those are *descriptions*, for the release notes and other
> documentation, but we will still need a *name* too, to use in the
> command-line flag (or config option, whatever).



How about:

Remote BASE

(as opposed to Local BASE).

The idea here being that BASE is a concept with which users should be
familiar, while pristines are part of Subversion's implementation under the
hood.

Cheers
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-02-14 Thread Karl Fogel

On 14 Feb 2022, Julian Foad wrote:
Karl, thanks for bringing a user-focused perspective to the 
naming. In
Subversion's UI we will not necessarily expose any name for the 
feature,

but we might, e.g. in a configuration file or in help text. In
describing what's new in 1.15 people will certainly start using 
some
short name for the feature and it will be helpful if we pick a 
memorable

and user-comprehensible one to start with.


Agree with all your concerns below, but we do have to pick a name 
soon, because there's going to be *some* UI for accessing this 
feature, right? 

That is, even in MVP where the feature is just a non-changeable 
per-WC-at-checkout-time decision about whether pristines are 
cached or not, there has to be some way for the user to specify at 
checkout time what that decision actually *is* for a given WC. 
That could be done via a command-line option, or via something in 
a config file, but whatever it is, it's going to involve having a 
name by which to call the feature.


In this case the term "binary, large object" is an apt portrayal 
of the

main use case. (I'll set aside my distaste for such inelegant
backronyms.) However, the term is used (I think) primarily in 
software
developer circles, which covers a significant portion but perhaps 
not

vast majority of Subversion users. So it might not be the best.
Similarly, casual Subversion users don't really need to know the 
terms
"pristine" and "text base" before they become "power users", 
although if
they think about this feature at all then they will obviously 
become

aware of the concept, if not those names.


Agreed.


Johan, I like and agree with your perspective of this pristines
management as one aspect of caching repository content in 
general, with
scope for further variations. I am not sure if that is the best 
way to
present it to users at this stage. Perhaps this perspective fits 
better
in a "road map" that devs and users interested in further 
development

can read.

Ideas:

   - "Option to optimize a checkout for minimal disk space 
   rather than

minimal network traffic."

   - "50% off. Unlimited offer. Buy it now. Shrink your 
   checkouts to
half the size.*  (Small print: *Compared to our previous 
checkouts which
cost double their effective size. Network subscription 
required.)"


ROTFL :-).  I'll take #2 with a side of onion rings, please.

Those are *descriptions*, for the release notes and other 
documentation, but we will still need a *name* too, to use in the 
command-line flag (or config option, whatever).


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-14 Thread Nathan Hartman
On Mon, Feb 14, 2022 at 6:07 AM Julian Foad  wrote:
> Ideas:
>
> - "Option to optimize a checkout for minimal disk space rather than
> minimal network traffic."
>
> - "50% off. Unlimited offer. Buy it now. Shrink your checkouts to
> half the size.*  (Small print: *Compared to our previous checkouts which
> cost double their effective size. Network subscription required.)"

You know, I was going to offer some tongue-in-cheek ones myself...

Honey I Shrunk The Working Copy
Half Price Working Copy
Working Copy 4 Less

But I'm still trying to think of name we could actually use.

Personally I'm not very partial to the BLOB idea. It feels like more
of a SQL database centric concept, and that makes it seem kind of
dated in my mind. We need something that sounds new and cool. :-)

The feature does trade off higher network access for lower storage
consumption, and that is an aspect the user would need to know to make
an educated decision, so we'll probably want to bear that in mind.

Cheers,
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-02-14 Thread Julian Foad
Karl, thanks for bringing a user-focused perspective to the naming. In
Subversion's UI we will not necessarily expose any name for the feature,
but we might, e.g. in a configuration file or in help text. In
describing what's new in 1.15 people will certainly start using some
short name for the feature and it will be helpful if we pick a memorable
and user-comprehensible one to start with.

In this case the term "binary, large object" is an apt portrayal of the
main use case. (I'll set aside my distaste for such inelegant
backronyms.) However, the term is used (I think) primarily in software
developer circles, which covers a significant portion but perhaps not
vast majority of Subversion users. So it might not be the best.
Similarly, casual Subversion users don't really need to know the terms
"pristine" and "text base" before they become "power users", although if
they think about this feature at all then they will obviously become
aware of the concept, if not those names.

Johan, I like and agree with your perspective of this pristines
management as one aspect of caching repository content in general, with
scope for further variations. I am not sure if that is the best way to
present it to users at this stage. Perhaps this perspective fits better
in a "road map" that devs and users interested in further development
can read.

Ideas:

- "Option to optimize a checkout for minimal disk space rather than
minimal network traffic."

- "50% off. Unlimited offer. Buy it now. Shrink your checkouts to
half the size.*  (Small print: *Compared to our previous checkouts which
cost double their effective size. Network subscription required.)"

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-02-14 Thread Johan Corveleyn
On Mon, Feb 14, 2022 at 11:13 AM Ivan Zhakov  wrote:
> On Mon, 14 Feb 2022 at 01:39, Karl Fogel  wrote:
>> On 12 Feb 2022, Mark Phippard wrote:
...
>> In any case, the branch name doesn't matter too much here,
>> especially since it's going to get merged soon.  However, for the
>> user-facing name of the feature, we should pick a name based on
>> the essence of the feature, not on a not-yet-fully-implemented
>> optional enhancement to the feature, discussed further below.
>>
>> On 13 Feb 2022, Julian Foad wrote:
>> >That name came, as far as I am aware, from Evgeny's branch which
>> >implements the latter.
>> >
>> >This may be a case where the public facing name for the feature
>> >ought to differ from the internal development name.
>> >
>> >Any ideas for a good public name?
>> >
>> >Pristines on Subversion's demand?
>> >Dehydrated WC?
>>
>> I kind of like the dehydration/rehydration theme -- it's certainly
>> memorable!  Other possibilities:
>>
>>   - blob-optimized checkouts
>>
>>   - "blobtimized" checkouts (okay, kidding there... :-) )
>>
> I would suggest:
> - optional pristines

As I tried to explain before, I think it makes more sense (also to new
users who have never used pre-1.15) to try to expose the feature as a
knob for the pristine storing (or caching) strategy. Because,
effectively, the pristine store is just a cache, right? All the
information is there on the server, and the client simply duplicates /
caches that information locally to make some operations more
efficient. Up until know, the pristine caching strategy was fixed:
"cache them all, all the time, forever".

So now we're working on a very lazy or minimal type of pristine
caching strategy (or "no caching", if you will -- we might consider it
an implementation detail that a pristine is fetched in the "regular
pristine store" for a moment, and cleaned up after the operation -- it
might just as well have been spooled to a tmp location, or in memory,
or ... during the operation).

To expose this to users, I would take a step back, and open the door
for other types of pristine caching strategy in the future. So I'd
say:

"New feature in 1.15: Configurable Pristine Caching", or "Flexible
Pristine Caching" or "Pristine Caching Options". Where it was
previously a fixed strategy, you now have some choice. In 1.15 we
introduce the "lazy" (or "short-lived", or "minimal") pristine caching
strategy. Apart from that we still have the (default, old) "full" /
"complete" caching strategy. In the future we might introduce
additional (more flexible) strategies, such as those dictated by some
rules, potentially with a repos-side suggestion (like with
svn:auto-props).

Instead of taking about "Pristine Caching Strategy", we could also
talk about the "Pristine Strorage Strategy" or "Storing Strategy"
('storing' instead of 'fetching', as the former is the more permanent
effect; fetching might be seen as an implementation detail on what
subversion needs to do when it runs into a non-stored pristine).

-- 
Johan


Re: A two-part vision for Subversion and large binary objects.

2022-02-14 Thread Ivan Zhakov
On Mon, 14 Feb 2022 at 01:39, Karl Fogel  wrote:

> On 12 Feb 2022, Mark Phippard wrote:
> >Just to offer a counterpoint Karl, I always assumed the goal of
> >the
> >branch was to have no pristines in the WC and the "on-demand"
> >aspect
> >was referring to an internal SVN detail that it would have to
> >fetch
> >pristines when they were needed to complete a command that I have
> >executed like diff or revert.
> >
> >I know we discussed whether the entire WC, or individual files
> >would
> >not have pristines but I never considered the "on-demand" aspect
> >to be
> >about my ability to decide this. It was about SVN just doing what
> >it
> >needed to when it needed to.
>
> Ah, I see.  That might be where the branch name came from, yeah.
> But the key (necessary) part of the feature is the absence of
> pristines, whereas the restoration of some pristines on demand is
> an optional enhancement (and one we're not even doing in the first
> MVP version).
>
> In fact, selected rehydration is not necessarily even the first
> enhancement we might make after MVP.  There's an argument for
> prioritizing flexible client-side configuration specs first, so
> that all the diffable files get pristines on checkout while all
> the big binary blob files get no pristines.  IOW, if we get the
> checkout right the first time, then selected rehydration becomes
> less important to have; also, there is an easy workaround for it;
> just make a copy of the working file :-).
>
> (I still think selected rehydration would be good to have, of
> course; I'm just pointing out that we haven't really discussed
> where it sits relative to other possible things.)
>
> In any case, the branch name doesn't matter too much here,
> especially since it's going to get merged soon.  However, for the
> user-facing name of the feature, we should pick a name based on
> the essence of the feature, not on a not-yet-fully-implemented
> optional enhancement to the feature, discussed further below.
>
> On 13 Feb 2022, Julian Foad wrote:
> >That name came, as far as I am aware, from Evgeny's branch which
> >implements the latter.
> >
> >This may be a case where the public facing name for the feature
> >ought to differ from the internal development name.
> >
> >Any ideas for a good public name?
> >
> >Pristines on Subversion's demand?
> >Dehydrated WC?
>
> I kind of like the dehydration/rehydration theme -- it's certainly
> memorable!  Other possibilities:
>
>   - blob-optimized checkouts
>
>   - "blobtimized" checkouts (okay, kidding there... :-) )
>
> I would suggest:
- optional pristines

Just my two cents.

-- 
Ivan Zhakov


Re: A two-part vision for Subversion and large binary objects.

2022-02-13 Thread Karl Fogel

On 12 Feb 2022, Mark Phippard wrote:
Just to offer a counterpoint Karl, I always assumed the goal of 
the
branch was to have no pristines in the WC and the "on-demand" 
aspect
was referring to an internal SVN detail that it would have to 
fetch

pristines when they were needed to complete a command that I have
executed like diff or revert.

I know we discussed whether the entire WC, or individual files 
would
not have pristines but I never considered the "on-demand" aspect 
to be
about my ability to decide this. It was about SVN just doing what 
it

needed to when it needed to.


Ah, I see.  That might be where the branch name came from, yeah. 
But the key (necessary) part of the feature is the absence of 
pristines, whereas the restoration of some pristines on demand is 
an optional enhancement (and one we're not even doing in the first 
MVP version).


In fact, selected rehydration is not necessarily even the first 
enhancement we might make after MVP.  There's an argument for 
prioritizing flexible client-side configuration specs first, so 
that all the diffable files get pristines on checkout while all 
the big binary blob files get no pristines.  IOW, if we get the 
checkout right the first time, then selected rehydration becomes 
less important to have; also, there is an easy workaround for it; 
just make a copy of the working file :-).


(I still think selected rehydration would be good to have, of 
course; I'm just pointing out that we haven't really discussed 
where it sits relative to other possible things.)


In any case, the branch name doesn't matter too much here, 
especially since it's going to get merged soon.  However, for the 
user-facing name of the feature, we should pick a name based on 
the essence of the feature, not on a not-yet-fully-implemented 
optional enhancement to the feature, discussed further below.


On 13 Feb 2022, Julian Foad wrote:
That name came, as far as I am aware, from Evgeny's branch which 
implements the latter.


This may be a case where the public facing name for the feature 
ought to differ from the internal development name.


Any ideas for a good public name?

Pristines on Subversion's demand?
Dehydrated WC? 


I kind of like the dehydration/rehydration theme -- it's certainly 
memorable!  Other possibilities:


 - blob-optimized checkouts

 - "blobtimized" checkouts (okay, kidding there... :-) )

The first one is actually a serious suggestion, though.  It's more 
helpful for users if we frame the feature in terms of what it 
enables than in terms of back-end implementation.  What issue #525 
is about is optimizing for checkouts with lots of Binary Large 
OBjects -- things that don't generally receive mergeable changes 
and that one rarely if ever diffs.  Hence "blob-optimized 
checkouts" as the tag line (and then in the feature description we 
explain the details).


Anyway, that's one idea, but the floor is open...

Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-13 Thread Julian Foad
>> The name of the "pristines-on-demand" branch implies a certain
>> behavior -- namely, that pristines can, via some UI, be fetched on
>> demand :-).  [...]
>
>Just to offer a counterpoint Karl, I always assumed the goal of the
>branch was to have no pristines in the WC and the "on-demand" aspect
>was referring to an internal SVN detail that it would have to fetch
>pristines when they were needed [...]

That name came, as far as I am aware, from Evgeny's branch which implements the 
latter.

This may be a case where the public facing name for the feature ought to differ 
from the internal development name.

Any ideas for a good public name?

Pristines on Subversion's demand?
Dehydrated WC? 

- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-02-12 Thread Mark Phippard
On Sat, Feb 12, 2022 at 3:15 PM Karl Fogel  wrote:

> The name of the "pristines-on-demand" branch implies a certain
> behavior -- namely, that pristines can, via some UI, be fetched on
> demand :-).  But in the MVP we're talking about, pristines in a
> given WC are either all present or all absent, and, at least for
> MVP, that per-WC state is not changeable, right?

Just to offer a counterpoint Karl, I always assumed the goal of the
branch was to have no pristines in the WC and the "on-demand" aspect
was referring to an internal SVN detail that it would have to fetch
pristines when they were needed to complete a command that I have
executed like diff or revert.

I know we discussed whether the entire WC, or individual files would
not have pristines but I never considered the "on-demand" aspect to be
about my ability to decide this. It was about SVN just doing what it
needed to when it needed to.

Mark


Re: A two-part vision for Subversion and large binary objects.

2022-02-12 Thread Karl Fogel

I wrote:
...does the "pristines-on-demand" branch name still accurately 
reflect

what the state of the onion will be after that branch is merged?


Ah, I'll retroactively update my question to now be about the new 
"pristines-on-demand-on-mwf" branch, of course.


Best regards,
-Karl



Re: A two-part vision for Subversion and large binary objects.

2022-02-12 Thread Karl Fogel

On 10 Feb 2022, Julian Foad wrote:

My current plan:

* multi-wc-format is, I consider, ready for merge to trunk. See 
thread [1].

   -> Please review it.
   - I can post a diff and a summary log message to help 
   reviewers.


* Make pristines-on-demand behaviour conditional on WC format.
   - The changes are mostly simple if a bit fiddly. In libsvn_wc 
   we
will need a bit of futzing around with two variants of some 
existing
wc-db queries, one with and one without the extra 'hydrated' 
column, to

work with both DB formats.

* Re-base pristines-on-demand on top of multi-wc-format.
   - I have this ready in a working copy.
   - The one significant change is to remove the new bits from 
   the main
DB schema statements, so that it will create format 31 not 32, as 
the
new way (multi-wc-format) is to always create the baseline 
(lowest
supported) format first (which will be 31) and then run the 
statements
that upgrade it to any higher requested format (specifically, 
when 32 is requested).


* Finish the per-WC configuration. See thread [2].
   -> Please review the plan there.

At that point I would consider the feature a minimum viable 
product

(MVP), ready to merge and ready for use.

Please do speak up with any comments.


Agree about MVP being ready by that point.  My only question is 
about a matter of intra-dev communications:


The name of the "pristines-on-demand" branch implies a certain 
behavior -- namely, that pristines can, via some UI, be fetched on 
demand :-).  But in the MVP we're talking about, pristines in a 
given WC are either all present or all absent, and, at least for 
MVP, that per-WC state is not changeable, right?  (That is, MVP 
doesn't include dehydration or rehydration, IIUC.)


Again, just to be clear: I think that's fine and the MVP will be 
very useful, even before any [rd]ehydration feature is available. 
But does the "pristines-on-demand" branch name still accurately 
reflect what the state of the onion will be after that branch is 
merged?


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-02-11 Thread Julian Foad
Julian Foad wrote:
> * Re-base pristines-on-demand on top of multi-wc-format.

Done: branch 'pristines-on-demand-on-mwf'.

> * Make pristines-on-demand behaviour conditional on WC format.

Mostly done in r1897977.

I could do with some help on a SQL test failure (below), please.

Quoting from that log message:

  - With 'make check WC_FORMAT_VERSION=1.15': test suite still passes.
  - With 'make check [WC_FORMAT_VERSION=1.8]': some tests FAIL or XPASS:

XPASS: authz_tests.py 31: remove a subdir with authz file
XPASS: basic_tests.py 8: basic corruption detection on commit
   [[Relies on wc.text_base_path()]]
XPASS: revert_tests.py 2: revert reexpands manually contracted keyword
XPASS: trans_tests.py 1: commit new files with keywords active from birth
   [[Relies on wc.text_base_path()]]
XPASS: trans_tests.py 3: committing eol-style change forces text send
   [[Relies on wc.text_base_path()]]
XPASS: update_tests.py 83: missing tmp update caused segfault
   [[The error message has changed]]
XPASS: upgrade_tests.py 16: upgrade with base and working replaced files
   [[Can't fetch pristines: the working copy points to 
file:///tmp/repo]]
XPASS: upgrade_tests.py 34: automatic SQLite ANALYZE
FAIL:  wc-queries-test 3: test query expectations

>From a quick look, the XPASSes may be just a matter of '@Wimp'
annotations that had been added on one of these branches, now being out
of date, or something like that. I will check them.

I have posted separately asking for help with the FAIL in 
test_query_expectations().

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-02-10 Thread Julian Foad
My current plan:

* multi-wc-format is, I consider, ready for merge to trunk. See thread [1].
-> Please review it.
- I can post a diff and a summary log message to help reviewers.

* Make pristines-on-demand behaviour conditional on WC format.
- The changes are mostly simple if a bit fiddly. In libsvn_wc we
will need a bit of futzing around with two variants of some existing
wc-db queries, one with and one without the extra 'hydrated' column, to
work with both DB formats.

* Re-base pristines-on-demand on top of multi-wc-format.
- I have this ready in a working copy.
- The one significant change is to remove the new bits from the main
DB schema statements, so that it will create format 31 not 32, as the
new way (multi-wc-format) is to always create the baseline (lowest
supported) format first (which will be 31) and then run the statements
that upgrade it to any higher requested format (specifically, when 32 is 
requested).

* Finish the per-WC configuration. See thread [2].
-> Please review the plan there.

At that point I would consider the feature a minimum viable product
(MVP), ready to merge and ready for use.

Please do speak up with any comments.


[1] "Multi-WC-format branch: preparing for merge to trunk"
[2] "[PATCH] Sketch of per-user/per-wc config for pristines-mode"

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-02-03 Thread Julian Foad
On 2022-01-28 Evgeny Kotkov wrote:
> Julian Foad writes:
>> We could swap the scanning logic around to do the (quick) check for
>> missing pristines before deciding whether a (slower) file "stat" is
>> necessary. [...]
> 
> I might be missing something, but I don't yet see how we could save a stat().

I have now responded to this in the thread "[PATCH] Sketch of
per-user/per-wc config for pristines-mode", in a paragraph beginning "3.
Another alternative ...".

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-02-01 Thread Nathan Hartman
On Mon, Jan 31, 2022 at 6:41 AM Johan Corveleyn  wrote:
>
> Replying to a few different points in this thread.
>
> On Jan 27, Julian Foad wrote:
> > The user can choose one mode, per WC, from a list of options that may 
> > include:
> >
> >   - off: as in previous versions, no checking, just assume all pristines
> > are present
> >   - pristines-on-demand: fetch "wanted" pristines; discard others
> >   - fetch-only: fetch any pristine that's absent; do not discard
>
> I think, whatever the name of the property here, "off" is confusing
> for wanting all pristines (the back-compat / old / default (?)
> behaviour).
>
> To me it sounds like I am setting the feature "do not fetch all
> pristines" to off, so "please fetch all pristines" with a double
> negation.

+1 to avoiding the double negative.

I did previously speak in terms of the (pristines-on-demand) feature
being turned on or off, but I was thinking from the perspective of a
proposed new option which shouldn't take effect unless the user opts
in. Imagining it from the perspective of a future user after the
feature exists, this is a double negative and is indeed confusing.

More inline responses below...

> Hmm. I think the pristine-fetching strategy that is chosen for a
> particular working copy should a property of that working copy. That's
> because it has a "persistent" impact on that working copy. Changing
> that strategy (if we would support that) severely impacts the disk
> layout of that particular working copy. It's not just a runtime thing,
> like using "exclusive sqllite locks" or some such (leaves no trace for
> the next user).
>
> If it would be a runtime setting, and Alice and Bob would both work on
> the same working copy, and the former has "pristine-fetching=full" and
> the latter "pristine-fetching=lazy" (or some detailed strategy with
> patterns, whatever), the working copy would be changed severely every
> time one or the other touches it.

+1. Since pristines are part of a particular working copy, the
variables that indicate whether they are present and how/when they are
fetched should be part of the working copy as well.

> So I think the chosen pristine-fetching strategy for a working copy
> should be stored in the WC itself, probably in wc.db.
>
> However, we would still need related runtime config options. But I see
> them as "defaults" for when creating a new working copy. Perhaps these
> belong in a [working-copy defaults] or [working-copy creation]
> section, as opposed to the [working-copy] section which is more about
> runtime behaviour.

That would be convenient; if a user wants pristines-on-demand all or
most of the time, this would save having to specify that for every
checkout. Is it mandatory for the first iteration? I don't know, but I
do think it would be very nice to have.

What *is* mandatory in my mind is an option for 'svn checkout'. The
user should not be locked in to a configured default because they may
(and perhaps are even likely to) have a mix of different repos they
checkout, perhaps from faster or slower servers, or perhaps some repos
with seldom-changing and others with frequently changing ones, etc.
The user knows which is the more common case and should configure that
as the default, then override it at checkout time for the less common
case.

Cheers,
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-02-01 Thread Daniel Shahaf
Daniel Shahaf wrote on Mon, 31 Jan 2022 11:42 +00:00:
> Furthermore, whenever we have some sort of server-recommended
> configuration, having some syntax to show where the wc differs from the
> recommendation will make sense.  For instance, for depth I do
> .
> svn info -R | grep-dctrl -F Depth '' -s Path,Depth
> .
> to print all local files that have a depth other than the default — but
> having some syntactic sugar (e.g., a tools/ script) to do this query
> would make sense.

That's «svn info --x-viewspec=svn11».

Cheers,

Daniel

> This goes not only for server-recommended depth
> configuration but also for server-recommended pristinefulness
> configuration, if we have that.
>
> Looking forward to hearing about your use-case.
>
> Cheers,
>
> Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-02-01 Thread Daniel Shahaf
Lorenz wrote on Tue, Feb 01, 2022 at 08:07:43 +:
> Daniel Shahaf wrote:
> 
> >Lorenz wrote on Mon, Jan 31, 2022 at 07:13:46 +:
> >> Karl Fogel wrote:
> >> >Hi, everyone.  I'd like feedback an idea that I've had for some 
> >> >years now but never written up before.
> >> >
> >> >Subversion can already be used to manage large (usually binary) 
> >> >files.  In fact, we use SVN for this at my company and it works 
> >> >decently.  However, there are two possible features that would 
> >> >make Subversion go beyond "decent" all the way to "quite good" at 
> >> >this :-).  They are:
> >> >
> >> >1) Make pristine text-base files optional. [...]
> >> 
> >> I'm following the optional pristines debate for a while now. but can't
> >> remember a properies based configuration having been discussed.
> >> 
> >> So here is what I would like to see:
> >> 
> >
> >*Why* would you like to see a properties-based design?  Could you please
> >describe the use-case, constraints, business needs, etc., you're
> >designing for?  We shouldn't be discussing concrete solutions/designs
> >until we have a common understanding of what use-case they are designed
> >to solve.
> >
> >What does this design achieve or enable that other proposals do not?
> 
> And there I thought I was asking if I overlooked some discussion about
> this variant 8-)
> 

We have discussed making it possible to enable/disable pristinelessness
on a finer granularity than per-working-copy.  See, for instance,

in the early part of this thread.

We've also discussed passing a file's properties hash to a per-file
callback predicate that decides whether or not the file would be
pristineful; see, e.g.,

and 
.

And it turns out I actually proposed earlier in this thread to use
properties both for storing the server-recommended configuration and for
storing the client settings.

> Mainly a property based configuration allows storing a default
> configuration based on knowledge about the intended use of files in
> the repository. This can't be done by a pure client side (per
> workstation or WC wise) approach.

Let's distinguish the two uses of properties here: using them as
a vehicle for the server to propose configuration to the client with,
and using them for storing client settings.

> [Server uses properties to communicate recommended configuration to
> the client]

The server doesn't know the client's use-case.  See, for instance,
.
In that sense, having some sort of server-proposed default would be
a secondary priority, something less important than implementing some
fully client-side way to select which subset of files should have
pristines.  That's just like how we have depth and even viewspecs but
don't have a way for the server to distribute viewspecs to client.

If we were to have a way for the server to distribute recommended
viewspecs (whether for depth or the equivalent for pristinefulness),
then:

- We might want the the server to offer the client more than one
  viewspec, so the client will be able to choose.  E.g., the server
  could offer a "qa" preset and a "dev" preset, and each client would
  choose what they need.  Or our site/publish/ tree could have a "ja"
  preset that includes index.ja.html and excludes index.zh.html.
  (Besides, if a single preset would work for everyone, we could just
  make it the default and let people use --depth=infinity to override.
  [I'm stating this in terms of depth, but the situation is similar for
  pristines, I think.])

- How _else_ could the recommendations be transferred, if not via node
  properties?  The contents of «svnadmin dump» and locks are virtually
  the only things a client can fetch in-band, and node properties and
  in-tree .svnfoo files are the only things in that set that are subject
  to authz.
  
  Moreover, it can be argued that "How much of the code does a user
  need" is a property of the code that changes with time (and if someone
  wants to change the value retroactively, that's impossible like it's
  impossible to fix bugs in old revisions retroactively). 
  
  I guess the alternative to node properties are (1) invent a new RA API
  (that does authz as needed); (2) let admins come up with out-of-band
  solutions (that use svnauthz(1) if necessary).

> [Storing client settings in properties]

This means «svn propset» will have two different meanings: both "deal
with versioned data" and "deal with wc configuration".  On the one hand,
I somewhat hesitate to overload «propset» this way; but on the other

Re: Sharing .svn directories across wc's (was: Re: A two-part vision for Subversion and large binary objects.)

2022-01-31 Thread Branko Čibej

On 24.01.2022 14:39, Daniel Shahaf wrote:

Daniel Sahlberg wrote on Mon, Jan 24, 2022 at 14:13:26 +0100:

Den mån 24 jan. 2022 kl 14:02 skrev Daniel Shahaf :


As to what it'll take to actually implement this… I'm not sure.  If
someone went in and changed «mkdir(".svn")» to «symlink("/well/known/path",
".svn")», would things Just Work™, or?


There are OSes where the support for symlinks are, let's say, less than
perfect.

And?


WC_ID is hardcoded to 1 pretty much everywhere. There'd be a bit of work 
to make WC identification explicit.


Also the meaning of 'svn upgrade' in this context becomes ... interesting.



I didn't propose symlinks as a solution.  I only asked about them in
order to identify the blockers to implementing shared .svn dirs: whether
it's as simple as needing to invent a «--dotsvn-dir» option, or is there
more work needed on, say, identifying SQLite queries that compare
LOCAL_RELPATH without also comparing WC_ID.



That would be most of the queries.





See for example issue SVN-3570. Of course, when we solve #3570,
then the shared wc.db would be an easy fix ;-)

Or we could deal with the symlink on Windows the same way we deal with
versioned symlinks on Windows: by creating a file ".svn" with content
"link /path/to/some/where".



Or we could try not to invent platform-specific solutions for something 
like this. *shudder*


-- Brane


Re: A two-part vision for Subversion and large binary objects.

2022-01-31 Thread Daniel Shahaf
Julian Foad wrote on Fri, Jan 28, 2022 at 12:30:01 +:
> We could fetch pristines in a backgroud thread, making the "foreground"
> operation thread wait, just in time, for each pristine before accessing
> it. Positives: the end result is efficient. Negatives: we don't have
> precendent for threaded operations,

We have the svn_task_* API (subversion/include/private/svn_task.h:38),
although nothing seems to use it yet, on any branch.

We have subversion/include/private/svn_thread_cond.h to support it.

We use threads in the test suite:
subversion/tests/svn_test_main.c:do_tests_concurrently()

> and they can be tricky, so unknown and potentially large effort to
> complete it.

Even with the API building blocks in place, there's still effort
involved, of course.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-01-31 Thread Daniel Shahaf
Evgeny Kotkov wrote on Fri, Jan 28, 2022 at 01:58:43 +0300:
> Also, I tend to think that DRY doesn't really apply here, because a status
> walk and a textbase sync are essentially different operations that just
> happen to have something in common internally.  For example, a textbase
> sync doesn't have to follow the tree structure and can be implemented
> with an arbitrarily ordered walk over NODES.

If we iterate NODES in an arbitrary order, we'll lose the benefit of
cache locality of the OS filesystem's page cache.  To avoid that, we can
do a depth-first walk.  Is there a reason not to?

% sqlite3 .svn/wc.db  "SELECT local_relpath FROM nodes WHERE checksum IS NOT 
NULL ORDER BY checksum;" | head 
subversion/libsvn_diff/diff4.c
subversion/bindings/javahl/src/org/tigris/subversion/javahl/CommitMessage.java
notes/subversion-diagram.graffle
tools/dist/release.py
tools/hook-scripts/validate-extensions.py
subversion/mod_dav_svn/reports/log.c
tools/dev/iz/ff2csv.py
subversion/libsvn_fs_x/util.c
subversion/libsvn_ra/util.c
subversion/bindings/javahl/native/ExternalItem.hpp

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-01-31 Thread Daniel Shahaf
Lorenz wrote on Mon, Jan 31, 2022 at 07:13:46 +:
> Karl Fogel wrote:
> >Hi, everyone.  I'd like feedback an idea that I've had for some 
> >years now but never written up before.
> >
> >Subversion can already be used to manage large (usually binary) 
> >files.  In fact, we use SVN for this at my company and it works 
> >decently.  However, there are two possible features that would 
> >make Subversion go beyond "decent" all the way to "quite good" at 
> >this :-).  They are:
> >
> >1) Make pristine text-base files optional. [...]
> 
> I'm following the optional pristines debate for a while now. but can't
> remember a properies based configuration having been discussed.
> 
> So here is what I would like to see:
> 

*Why* would you like to see a properties-based design?  Could you please
describe the use-case, constraints, business needs, etc., you're
designing for?  We shouldn't be discussing concrete solutions/designs
until we have a common understanding of what use-case they are designed
to solve.

What does this design achieve or enable that other proposals do not?

As you may have seen elsethread, Julian has already begun to implement
a single per-WC toggle design (as a first iteration of the new
functionality).  If you see any conflict between that work and your
use-case, please say so sooner rather than later: it's easier to pivot
before work has been done than after.

> 1) a inheritable svn:pristine (on, off, on-demand) property on files
> and folders.
> On folders that could be extended to handle thresholds (size, age).

For files, this could make sense.  "This file will not need to be diffed
by most users" does sound like information the user might have that we
can't determine otherwise.  Perhaps it's a generated file (that gets put
in veresion control for whatever reason; e.g., our dist/ repository, and
the Apache CMS websites/ tree).  Perhaps it's a file that only one or
two users will be modifying.

For folders, however, I don't see how this makes sense.  Size and age
thresholds are not an intrinsic property of an inode in the versioned
filesystem; they are time-space trade-offs that each client makes.
Different clients could make different trade-offs, and clients that
checkout today's HEAD in the future (using the «svn checkout $URL@$peg»
syntax) might have different needs than clients that checkout today's
HEAD today.

>

What do you think of using an r0 revision property for storing
information about what files typically don't need their pristines?

This could get interesting if some of the files involved are protected
by "no access" authz.

> 2) a inheritibe svn:pristine-wc property for local override. This
> property would WC-only, not to be stored in the repository.
> 

There is such a thing as WC-only (see SVN_PROP_WC_PREFIX and
svn_prop_wc_kind).  The existing ones are deliberately not shown in the
UI.  IIRC, they were used as the precursor of today's NODES.dav_cache
column, as a place where the RA layer can store per-file information.

In any case, properties are for attributes of the versioned filesystem's
inodes.  They are not for local configuration.  It wouldn't make sense
for «svn diff» to show both changes to, say, a file's encoding (in
svn:mime-type) and to a file's pristinefulness, because those are
different kinds-of-things: one describes the versioned inode, like its
charset (which can be stored in svn:mime-type), and another is
a property of the user's working copy [literally] of that versioned
inode, like its depth.

If it's not committable, it shouldn't be shown by «svn diff».

> 3) optional but not neccessary a command line option for svn checkout
> to set svn-pristine-wc to off.
> Optional because one can always restricting the initial checkout
> depth, setting the propery, then update.
> 
> 4) for workstation global settings an entry in the config
> corresponding to the svn:pristine-wc would be needed.

If there's a "typical" configuration that many clients will want to use,
having some way for the server to advise them about it would make sense,
as would letting clients decide whether or not to honour the advice
(both by default and _ad hoc_ for a particular working copy).

Furthermore, whenever we have some sort of server-recommended
configuration, having some syntax to show where the wc differs from the
recommendation will make sense.  For instance, for depth I do
.
svn info -R | grep-dctrl -F Depth '' -s Path,Depth
.
to print all local files that have a depth other than the default — but
having some syntactic sugar (e.g., a tools/ script) to do this query
would make sense.  This goes not only for server-recommended depth
configuration but also for server-recommended pristinefulness
configuration, if we have that.

Looking forward to hearing about your use-case.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-01-31 Thread Johan Corveleyn
Replying to a few different points in this thread.

On Jan 27, Julian Foad wrote:
> The user can choose one mode, per WC, from a list of options that may include:
>
>   - off: as in previous versions, no checking, just assume all pristines
> are present
>   - pristines-on-demand: fetch "wanted" pristines; discard others
>   - fetch-only: fetch any pristine that's absent; do not discard

I think, whatever the name of the property here, "off" is confusing
for wanting all pristines (the back-compat / old / default (?)
behaviour).

To me it sounds like I am setting the feature "do not fetch all
pristines" to off, so "please fetch all pristines" with a double
negation.

Maybe we should go for something like:
pristine-fetching = full (or "eager", or "all", i.e. default) | lazy
(or "on-demand")

Perhaps with a third option "lazy-keep" (like your "fetch-only"),
indicating on-demand, but not immediately cleaning it after use (don't
know if this would be useful -- could be added later of course). Or
"lazy-transient" for the "lazy with immediate cleaning after use" as
opposed to "lazy" (which keeps fetched pristines once fetched).


On Sat, Jan 29, 2022 at 9:22 AM Julian Foad  wrote:
>
> Vincent Lefevre wrote:
> >> [...] Specifying a pattern to match the WC path [or] per repository [...]
> >
> >But what if a WC can be accessed from different machines [...]?
>
> Then:
> - The config option should be designed never to assume or depend on the 
> pristine store being in a particular state (such as fully populated).
> - The user might want different behaviour on different machines, or the same 
> on all.
> - The patch I posted yesterday in a separate thread allows the user to set 
> the config option in the user config or per-wc config.
> - I noticed we already have some other config options in the '[working-copy]' 
> config section. We probably should allow the user to set those per-wc too.
> - Julian

Hmm. I think the pristine-fetching strategy that is chosen for a
particular working copy should a property of that working copy. That's
because it has a "persistent" impact on that working copy. Changing
that strategy (if we would support that) severely impacts the disk
layout of that particular working copy. It's not just a runtime thing,
like using "exclusive sqllite locks" or some such (leaves no trace for
the next user).

If it would be a runtime setting, and Alice and Bob would both work on
the same working copy, and the former has "pristine-fetching=full" and
the latter "pristine-fetching=lazy" (or some detailed strategy with
patterns, whatever), the working copy would be changed severely every
time one or the other touches it.

So I think the chosen pristine-fetching strategy for a working copy
should be stored in the WC itself, probably in wc.db.

However, we would still need related runtime config options. But I see
them as "defaults" for when creating a new working copy. Perhaps these
belong in a [working-copy defaults] or [working-copy creation]
section, as opposed to the [working-copy] section which is more about
runtime behaviour.

Basically:
  - The chosen pristine-fetching strategy should be a persistent
property of the WC, to be chosen at creation time.
  - Defaults for this should be part of our runtime-config area (and
perhaps also options for 'svn checkout').
  - We might introduce ways to change the setting of a given WC (but
it's not a must have for the first iteration, I guess)


On Fri, Jan 28, 2022 at 6:11 PM Evgeny Kotkov
 wrote:
>
> Julian Foad  writes:
>
> > We could swap the scanning logic around to do the (quick) check for
> > missing pristines before deciding whether a (slower) file "stat" is
> > necessary. Positives: eliminates the redundant "stat" overhead which may
> > be significant in working trees containing many files. Negatives: some
> > re-work needed in the current implementation.
> >
> > Of these, the last one currently looks viable and useful.
> >
> > Does that one look promising to you?
>
> I might be missing something, but I don't yet see how we could save a stat().
>
> Currently, a pristine is hydrated if and only if the corresponding working
> file is modified.  Let's say we check if a pristine is hydrated beforehand.
> If we find out that pristine is dehydrated, we have to stat(), because if the
> file is modified, then we need to hydrate.  If we find out that pristine is
> hydrated, we still have to stat(), because if the file is no longer modified,
> then we need to dehydrate.

What seems important to me is that, if a WC is set to
pristine-fetching=full (or SVN 1.15 would be working with an older
working copy, without the "pristine-fetching" property (say if
upgrading of the wc format would not be needed)), that the code here
can assume (like the old code) that the pristine is present. No need
for an extra stat, just assume it is there, if not, error out like
today (incidentally, this is a very annoying error to run into (if a
pristine is accidentally deleted for some 

Re: A two-part vision for Subversion and large binary objects.

2022-01-29 Thread Julian Foad
Vincent Lefevre wrote:
>> [...] Specifying a pattern to match the WC path [or] per repository [...]
>
>But what if a WC can be accessed from different machines [...]?

Then:
- The config option should be designed never to assume or depend on the 
pristine store being in a particular state (such as fully populated).
- The user might want different behaviour on different machines, or the same on 
all.
- The patch I posted yesterday in a separate thread allows the user to set the 
config option in the user config or per-wc config.
- I noticed we already have some other config options in the '[working-copy]' 
config section. We probably should allow the user to set those per-wc too.
- Julian


Re: A two-part vision for Subversion and large binary objects.

2022-01-28 Thread Vincent Lefevre
On 2022-01-27 17:21:42 +, Julian Foad wrote:
> This setting doesn't have to be persistent in the WC. It could be
> configured in client run-time config instead (e.g.
> ~/.subversion/config), as we previously mentioned.
> 
> If it's stored in the WC then we need to create some new UI to control
> the setting. I am not sure we want to do so just now. It does seem, if
> we were designing svn from scratch, such a setting would ideally be
> remembered in the WC and there would be UI to control it, analogous to
> "git config --system|--global|--local", but we are not there.
> 
> When we were thinking the setting would be of the form "on for all files
> larger than X" then the runtime config seemed more appropriate, as that
> form might be expected to apply to many WCs, possibly adding conditions
> such as "and path to WC matches Y" or "repository matches Z". Specifying
> the WC path is ugly as WCs can be moved and we haven't ever exposed any
> other identifier for a WC. Specifying a pattern to match the WC path is
> better. Specifying it per repository is very logical because the
> behavior is so dependent on the repo connection.

But what if a WC can be accessed from different machines (e.g. via
NFS or SSHFS), so potentially with different ~/.subversion/config
files? And what if a WC is stored on a USB drive/disk, which can
move to various machines?

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: A two-part vision for Subversion and large binary objects.

2022-01-28 Thread Evgeny Kotkov
Julian Foad  writes:

> SEMANTICS
>
> The user can choose one mode, per WC, from a list of options that may
> include:
>
>   - off: as in previous versions, no checking, just assume all pristines
> are present
>   - pristines-on-demand: fetch "wanted" pristines; discard others
>   - fetch-only: fetch any pristine that's absent; do not discard

My two cents on this would be that it might be easier to start with an on/off
property that gets stored when a working copy is created and doesn't change
afterwards, at least for now.

> Help please: Where should we properly store this setting in the WC?
>
> - in '.svn/entries' or '.svn/format'?
>   (Both currently contain a single line saying "12". We could add an
> extra line, or in general N extra lines each with one setting, for example.)
> - in a new file such as '.svn/pristines-on-demand'?
> - in the wc.db somewhere?

Thinking out loud, this sounds like a property associated with a specific
wc_id in the database.  I would say that this pretty much rules out options
of storing it outside the wc.db.


Thanks,
Evgeny Kotkov


Re: A two-part vision for Subversion and large binary objects.

2022-01-28 Thread Evgeny Kotkov
Julian Foad  writes:

> We could swap the scanning logic around to do the (quick) check for
> missing pristines before deciding whether a (slower) file "stat" is
> necessary. Positives: eliminates the redundant "stat" overhead which may
> be significant in working trees containing many files. Negatives: some
> re-work needed in the current implementation.
>
> Of these, the last one currently looks viable and useful.
>
> Does that one look promising to you?

I might be missing something, but I don't yet see how we could save a stat().

Currently, a pristine is hydrated if and only if the corresponding working
file is modified.  Let's say we check if a pristine is hydrated beforehand.
If we find out that pristine is dehydrated, we have to stat(), because if the
file is modified, then we need to hydrate.  If we find out that pristine is
hydrated, we still have to stat(), because if the file is no longer modified,
then we need to dehydrate.


Thanks,
Evgeny Kotkov


Re: A two-part vision for Subversion and large binary objects.

2022-01-28 Thread Julian Foad
Thanks for your replies, Evgeny. Replying to the "status walk" part.

(TL;DR: Could we optimise by doing the db scan before the stat?)

I think we want to ensure, as far as possible,

  - no significant performance degradation if user does not opt in to
the new feature;
  - overhead when enabled should not be disproportionate to the operation.

We can achieve the former simply by skipping the scanning/syncing steps
entirely when the user has not chosen pristines-on-demand mode. That is fine.

About the overhead, you suggest the overhead does not seem too high, at
least for now, and maybe you are right. But it can be subjective: some
people have considered the status walk too costly in the past and have
dedicated a lot of effort to reducing it. I have some ideas about
reducing this overhead anyway.

Some ways we could potentially optimise when it's enabled are:

  - Fetch pristines in a backgroud thread;
  - Further limit the scan (by depth etc.);
  - Scan for missing pristines (quick db op) before statting files for mods.

We could fetch pristines in a backgroud thread, making the "foreground"
operation thread wait, just in time, for each pristine before accessing
it. Positives: the end result is efficient. Negatives: we don't have
precendent for threaded operations, and they can be tricky, so unknown
and potentially large effort to complete it.

We could further limit the scan (by depth etc.). Positives: easy to
implement some steps in this direction. Negatives: only ever gets
us closer, but never all the way, towards fetching only what is really
needed; and risk of introducing buggy cases where it fetches too little.

We could swap the scanning logic around to do the (quick) check for
missing pristines before deciding whether a (slower) file "stat" is
necessary. Positives: eliminates the redundant "stat" overhead which may
be significant in working trees containing many files. Negatives: some
re-work needed in the current implementation.

Of these, the last one currently looks viable and useful.

Does that one look promising to you?

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Evgeny Kotkov
Julian Foad  writes:

> Scanning with 'stat'
>
> I'm concerned about the implementation scanning the whole subtree,
> calling 'stat' on every file to determine whether the file is "changed"
> (locally modified). This is done in svn_wc__textbase_sync() with its
> textbase_walk_cb().
>
> It does this scan on every sync, which is twice on every syncing
> operation such as diff.
>
> Don't we already have an optimised scan for local modifications
> implemented in the "status" code? Could we re-use this?

In a few of my experiments, performance of textbase_sync() was more or
less comparable to a status walk.  So maybe it's not actually worthwhile
spending time on improving this part, at least for now.

Also, I tend to think that DRY doesn't really apply here, because a status
walk and a textbase sync are essentially different operations that just
happen to have something in common internally.  For example, a textbase
sync doesn't have to follow the tree structure and can be implemented
with an arbitrarily ordered walk over NODES.

> Premature Hydrating
>
> The present implementation "hydrates" (fetches missing pristines) every
> file within the whole subtree the operation targets. This is done by
> every major client operation calling svn_client__textbase_sync() before
> and afterwards.
>
> That is pessimistic: the operation may not actually touch all these
> files if limited in any way such as by
>
>   - depth filtering
>   - other filtering (changelist, properties-only, ...)
>   - terminating early (e.g. output piped to 'head')
>
> That introduces all the fetching overhead for the given subtree as a
> latency before the operation shows its results, which for something
> small at the root of the tree such as "svn diff --depth=empty
> --properties-only ./" may make a significant usability impact.
>
> Presumably we could add the depth and some other kinds of filtering to
> the tree walk. But that will always leave terminating early, and
> possibly other cases, sub-optimal.
>
> I would prefer a solution that defers the hydrating until closer to the
> moment of demand.

I think that fetching the pristine contents at the moment of demand is a
particularly problematic concept to pursue, because it implies that there is
a network request that can now happen at an unpredictable moment of time.
So any operation that may access the pristine contents has to be ready for
a network fetch.  Compared to that, fetching the required pristines before
the operation does not impose that kind of requirement on the existing code.


Thanks,
Evgeny Kotkov


Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Karl Fogel

On 27 Jan 2022, Daniel Shahaf wrote:
Hang on.  Why do you assume that if someone has big files, then 
they're
necessarily all out in a one directory and all the accompanying 
texty
(or otherwise diffable) files are all in another directory? 
Sure,
that's exactly kfogel's use-case (described upthread), but it's 
not the

only way to structure a repository.


FYI, that's not our company's use case.

In fact, we have large non-diffable files spread all over in many 
directories, and we have small-texty-diffable files also spread 
out over many directories, and both kinds of file often co-exist 
within the same directory.


Just correcting the record -- I don't think this one particular 
use case
is necessarily definitive for the feature or anything.  I just 
wanted to make sure that you have accurate information.


Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Julian Foad
Julian Foad wrote:
> I'm writing an initial patch to let the user control pristines-on-demand
> on and off per WC.

This setting doesn't have to be persistent in the WC. It could be
configured in client run-time config instead (e.g.
~/.subversion/config), as we previously mentioned.

If it's stored in the WC then we need to create some new UI to control
the setting. I am not sure we want to do so just now. It does seem, if
we were designing svn from scratch, such a setting would ideally be
remembered in the WC and there would be UI to control it, analogous to
"git config --system|--global|--local", but we are not there.

When we were thinking the setting would be of the form "on for all files
larger than X" then the runtime config seemed more appropriate, as that
form might be expected to apply to many WCs, possibly adding conditions
such as "and path to WC matches Y" or "repository matches Z". Specifying
the WC path is ugly as WCs can be moved and we haven't ever exposed any
other identifier for a WC. Specifying a pattern to match the WC path is
better. Specifying it per repository is very logical because the
behavior is so dependent on the repo connection.

Any good ideas anyone?

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Julian Foad
I'm writing an initial patch to let the user control pristines-on-demand
on and off per WC.

Here I will assume:

  - The user needs a way to populate all pristines throughout the whole
WC when they want to, for example before they go offline.

At first I thought we would have an option value that represents the
state where all pristines are definitely present, as in older formats.
But we want a setting that is under direct control of the user, not a
state that Subversion reports from an API (although we might want that
as well). The user may change the setting at any time, when some
pristines may be absent, so there is no "all pristines are present right
now" mode setting.


SEMANTICS

The user can choose one mode, per WC, from a list of options that may include:

  - off: as in previous versions, no checking, just assume all pristines
are present
  - pristines-on-demand: fetch "wanted" pristines; discard others
  - fetch-only: fetch any pristine that's absent; do not discard

If the user wishes to ensure all pristines in the WC are present, they
can set the "fetch-only" mode and then run some svn command that fetches
all missing pristines.

If the user wishes to change a WC from having all pristines present to
pristines-on-demand, keeping pristines only for files that are currently
modified and discarding the rest, they can set the "pristines-on-demand"
mode and then run some svn command that discards all "unwanted" pristines.

In both of those cases, it doesn't particularly matter which command the
user has to run, just so long as we ensure there is one that we can recommend.

Additional options are possible such as:

  - off-line: use existing pristines; do not fetch or discard

The "off" and "off-line" modes imply basically the same behaviour; where
they differ is in the expectation that all pristines are present when we
choose "off" mode. I am not yet sure if we will want to keep such a
distinction in the end.


PERFORMANCE

The current pristines-on-demand' branch implementation does two scans of
the given WC subtree, one before and one after certain operations, as I
mentioned before.

  - "pristines-on-demand" mode: these scans are needed.

  - "off" and "off-line" modes: these can be skipped entirely.

  - "fetch-only" mode: the scan after the operation can be skipped,
while the scan before it will still be performed, even when all the
pristines (at least in the subtree) are already present.

Are we going to need to optimise until the cost is negligible at least
when pristines are all present, so that the user would never have need
to turn the feature "off" completely to match current performance?


IMPLEMENTATION

My patch initially uses a file '.svn/pristines-on-demand' as a
place-holder for wherever we might choose to store the setting properly
(in wc.db for example).

Help please: Where should we properly store this setting in the WC?

- in '.svn/entries' or '.svn/format'?
  (Both currently contain a single line saying "12". We could add an
extra line, or in general N extra lines each with one setting, for example.)
- in a new file such as '.svn/pristines-on-demand'?
- in the wc.db somewhere?

Do we have any precedent of user controlled settings in the WC? I can't
find any.

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Julian Foad
In the messages I'm now replying to, basically we were debating details
of some potential use cases, in the context of how far a per-WC control
might or might not be adequate over the range of possible cases.

I started drafting a point-by-point reply but I think we may better use
our time agreeing that some cases would benefit from finer grained
control but we're not yet in a position to quantify it.

Just some selected inline responses:

Daniel Shahaf wrote:
> [...]
> Haven't you just moved your goalposts?  I quote:

I don't think so. I could have been unclear. The phrases "not 'huge'"
and "minority of the pristine space" do not appear to conflict. The
double negatives can be confusing.

[...]

>> I don't dispute that some cases exist where it would be nice to have
>> per-file control. [...]
> 
> Hang on.  Why do you assume that if someone has big files, then they're
> necessarily all out in a one directory and [...]

I don't, and can't see where I implied that.

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Daniel Shahaf
Julian Foad wrote on Wed, Jan 26, 2022 at 14:53:24 +:
> Daniel Shahaf wrote:
> > Julian Foad wrote on Fri, Jan 21, 2022 at 11:15:04 +:
> >> Premature Hydrating
> >>  
> >> The present implementation "hydrates" (fetches missing pristines) every
> >> file within the whole subtree the operation targets. [...]
> >  
> > Does it?  Looking at textbase_walk_cb(), it only sets REFERENCED to TRUE
> > for modified files.  [...]
> 
> I meant it fetches missing pristines that are deemed *wanted*, for all
> files within the tree. That limits it to modified files only, but all
> modified files (that don't yet have their pristines) not just those that
> will be touched by the operation.

Ah, it hydrates before depth and changelist filtering?

> > However, in cases such as «svn diff --diff-cmd=», fetching the
> > pristines (too) close to the time they are needed could result in having
> > to reopen RA sessions.
> 
> What would be a problem with that?

More user-visible delays, need to authenticate again, …

> How is it different from existing long-running diff scenarios?

What scenarios do you have in mind? «svn diff --diff-cmd=
URL1 URL2»?

In any case, the question isn't "Are we introducing a problem that
already exists [avoidably or otherwise] in other use-cases" but "Are we
introducing a problem that we can avoid introducing".

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-01-27 Thread Daniel Shahaf
Julian Foad wrote on Tue, Jan 25, 2022 at 21:43:44 +:
> Daniel Shahaf wrote:
> > Julian Foad wrote on Thu, Jan 20, 2022 at 21:03:02 +:
> >> The only case in which a simple per-WC setting might be unsatisfactory
> >> is the following combination:
> >  
> > Why would it be the only case?
> 
> I assert that per-WC control suffices if any of the conditions I listed
> is false.

I understood the form of your argument; I just didn't understand why the
argument was correct.  Saying that the case you've outlined is the
_only_ one, that there isn't _any_ exception, is a non-trivial claim.
(For instance, that's exactly the claim to fame of the Ω(n log n) lower
bound on comparison-based sorting.)

> > I agree that that subset's pristines are necessarily able to be stored
> > locally at least from time to time, but no more than that.  It's not
> > _necessarily_ posssible to store those files' pristines permanently [...]
> 
> You rightly point out that cases may exist where the pristines-wanted
> subset is only needed some of the time, and the rest of the time it's
> important to recover that space for other uses. That implies the
> pristines-wanted subset is "huge" -- otherwise by definition the space
> they occupy would not be unacceptable to store permanently.
> 
> When you need those pristines, it would therefore be OK to disable
> pristines-on-demand for the whole WC, because that isn't hugely worse
> than if you could choose just the subset. (Saving a minority of
> the pristines space is not a driving requirement for this feature, even
> if it would be nice to have.)

Haven't you just moved your goalposts?  I quote:

> > >- the WC data set is "huge" […] in total; and
> > >- there is a subset of files […]
> > >- that subset of files is not "huge" in total; and

The subset of the files was "not 'huge' in total" upthread and is
responsible for "a minority of the pristine space" here.  Which is it?
We can't agree on handling this use-case until we agree on what this
use-case is.

> In those cases, switching the WC between pristines-present and
> pristines-on-demand would be necessary. Such "switching" is probably a
> strong requirement anyway, even outside this case, as I should think it
> would be considered poor UX if it were not possible to change one's mind
> without a re-checkout.

Even if it's poor UX, we should still ask whether this poor UX would or
wouldn't be a good exchange for the engineering effort of implementing
toggleability.  That's comparable to how having «svn upgrade» and
«svnadmin upgrade» at all means there could be bugs that affect upgraded
wc's/repositories but not new ones.  (For extra fun, the bug could be
latent and only surface after a further in-place upgrade.  Debian has
had such bugs, e.g., 
.)

> > Let me try to sketch a use-case for wanting only _some_ files to be
> > pristineless. [...]
> 
> I don't dispute that some cases exist where it would be nice to have
> per-file control. I still see it as merely "nice" and still do not see
> how it could be considered essential or very important.
> 

Hang on.  Why do you assume that if someone has big files, then they're
necessarily all out in a one directory and all the accompanying texty
(or otherwise diffable) files are all in another directory?  Sure,
that's exactly kfogel's use-case (described upthread), but it's not the
only way to structure a repository.

For instance, take our own /repos/asf/subversion/site/publish/download
and /repos/dist/release/subversion.  Those are separate repositories,
but Subversion (the software) does not dictate that.  If Infra had
decided to do things differently, to put the artifacts in the /site
directory, then only dev@ subscribers who participate in "… up for
signing/testing" threads would have had a reason to download the full
/site tree; everyone else (say, translators) would have needed only the
texty bits.  [That's actually a use-case for server-provided viewspecs;
and «svn checkout --depth=infinity» would override them…]

> It's not completely clear to me what you mean to draw out in your
> 'libsvn*.so' example. It seems to be a case where the user wants
> efficient 'commit' of a few files which are large enough to care about
> that operation (let's assume they are diffable enough for their
> pristines to be useful) -- but make up only a small subset of the total
> WC size so omitting pristines of the majority of the WC, which is huge,
> would be important to save space. Yes, that's a case where subset
> control would be nice.
> 
> But I would argue to that case, there are alternative and even better
> solutions than managing pristines. The user could make the WC shallow
> instead, omitting the pristines *and* working files of releases branches
> they don't currently need to work on while behind the narrow downlink.

My use-case involved a user who wished to have 1.13's binaries available
to them offline (so they 

Re: A two-part vision for Subversion and large binary objects.

2022-01-26 Thread Julian Foad
Daniel Shahaf wrote:
> Julian Foad wrote on Fri, Jan 21, 2022 at 11:15:04 +:
>> I'm concerned about the implementation scanning the whole subtree,
>> calling 'stat' on every file [...]
>> Don't we already have an optimised scan for local modifications
>> implemented in the "status" code?
>  
> This? — [subversion/libsvn_wc/status.c]
> [...]
>  
> This still does a stat() on every file; how else would it obtain
> dirent->mtime?  It doesn't do open()/read()/memcmp().

The main point is DRY: maintain exactly one implementation so that both
the precise functionality and the performance optimisations are shared.

An example of the sort of thing the optimised code *could* potentially
do would be to obtain the mtimes of all the files in a directory in one
single read-dir call, depending on APR/OS/FS details.

TODO: deduplicate this walker.


>> Premature Hydrating
>>  
>> The present implementation "hydrates" (fetches missing pristines) every
>> file within the whole subtree the operation targets. [...]
>  
> Does it?  Looking at textbase_walk_cb(), it only sets REFERENCED to TRUE
> for modified files.  [...]

I meant it fetches missing pristines that are deemed *wanted*, for all
files within the tree. That limits it to modified files only, but all
modified files (that don't yet have their pristines) not just those that
will be touched by the operation.

> However, in cases such as «svn diff --diff-cmd=», fetching the
> pristines (too) close to the time they are needed could result in having
> to reopen RA sessions.

What would be a problem with that? How is it different from existing
long-running diff scenarios?

- Julian



Re: A two-part vision for Subversion and large binary objects.

2022-01-25 Thread Nathan Hartman
On Mon, Jan 24, 2022 at 11:50 AM Karl Fogel  wrote:

> On 24 Jan 2022, Daniel Shahaf wrote:
>
> >Which brings me to a less contrived / more general point: What if
> >the user _knows in advance_ they'll need a pristine?  Shouldn't
> >there be: —  - a way to say "I'm about to change a large,
> >diffable file; detranslate
> >  it into the pristine store before I touch it"?  Perhaps even
> >  make files read-only at the OS level (as with svn:needs-lock)
> >  so the user doesn't modify the file accidentally until its
> >  pristine has been set aside?
>
> 'svn hydrate'?  (I can't even tell if I'm joking.)



Suppose that in the future we get a "history depth" feature. I vaguely
recall a discussion about that. It is possible we even have a feature
request filed. In any event the idea of history depth is like that of
directory tree depth, but for locally cached history. To illustrate, SVN
today always has a history depth of one, meaning that pristines serve as a
local cache of BASE. Pristines-on-demand can be seen as a history depth of
zero, meaning no local cache (until needed for specific files). A future
history depth feature may make it possible to locally cache the last n
revisions, or even infinity (full DVCS).

Perhaps the command to hydrate/dehydrate pristines should be designed with
this possibility in mind, i.e., not limit it only to a true/false value.

E.g.,

# "Normal" pristines
$ svn update --set-history-depth=immediates .

# Pristines-on-demand:
$ svn update --set-history-depth=none .

With the limitation, currently, that it may be applied only to the wc root
since the underlying logic is currently a wc-wide on/off switch.

Cheers,
Nathan


Re: A two-part vision for Subversion and large binary objects.

2022-01-25 Thread Julian Foad
Replying to selected points from the last few messages.

Daniel Shahaf wrote:
> Julian Foad wrote on Thu, Jan 20, 2022 at 21:03:02 +:
>> The only case in which a simple per-WC setting might be unsatisfactory
>> is the following combination:
>  
> Why would it be the only case?

I assert that per-WC control suffices if any of the conditions I listed
is false.

> [...]
>> - there is a subset of files on which the user needs to work
>> (requiring diffs, etc.) often enough that fetching their pristines "on
>> demand" is a problem; and
>  
> Disagree.  Why would fetching on-demand being a problem _necessarily_ be
> caused by an "often enough" need to work on some files?  Why couldn't
> on-demand fetching pristines be a problem for files that change once in
> a blue moon?

We agree. If even once is a problem in a particular case, then once
qualifies as "often enough" in that case. Maybe my wording or my lower
limit wasn't clear.

> For example, [...] some files that are large and diffable and
> may need to edited and diffed while behind a narrow downlink.

Yes, this is an example of the case I am describing where a simple
per-WC setting might be unsatisfactory.

>> - that subset of files is not "huge" in total; and
>  
> I agree that that subset's pristines are necessarily able to be stored
> locally at least from time to time, but no more than that.  It's not
> _necessarily_ posssible to store those files' pristines permanently [...]

You rightly point out that cases may exist where the pristines-wanted
subset is only needed some of the time, and the rest of the time it's
important to recover that space for other uses. That implies the
pristines-wanted subset is "huge" -- otherwise by definition the space
they occupy would not be unacceptable to store permanently.

When you need those pristines, it would therefore be OK to disable
pristines-on-demand for the whole WC, because that isn't hugely worse
than if you could choose just the subset. (Saving a minority of
the pristines space is not a driving requirement for this feature, even
if it would be nice to have.)

In those cases, switching the WC between pristines-present and
pristines-on-demand would be necessary. Such "switching" is probably a
strong requirement anyway, even outside this case, as I should think it
would be considered poor UX if it were not possible to change one's mind
without a re-checkout.

>> - that subset of files can be distinguished from the rest by metadata.
>  
> Why is this necessarily the case [...] this seems to rule
> out solutions that involve hardcoded lists (à la svn:ignore [...]

I meant any sort of metadata including such lists, but basically you're
right that this is not really relevant to describing the use case.

> Let me try to sketch a use-case for wanting only _some_ files to be
> pristineless. [...]

I don't dispute that some cases exist where it would be nice to have
per-file control. I still see it as merely "nice" and still do not see
how it could be considered essential or very important.

It's not completely clear to me what you mean to draw out in your
'libsvn*.so' example. It seems to be a case where the user wants
efficient 'commit' of a few files which are large enough to care about
that operation (let's assume they are diffable enough for their
pristines to be useful) -- but make up only a small subset of the total
WC size so omitting pristines of the majority of the WC, which is huge,
would be important to save space. Yes, that's a case where subset
control would be nice.

But I would argue to that case, there are alternative and even better
solutions than managing pristines. The user could make the WC shallow
instead, omitting the pristines *and* working files of releases branches
they don't currently need to work on while behind the narrow downlink.
Or they could have their main WC pristine-less and check out a separate
WC, with pristines, containing just the minority parts that they need offline.

> Which brings me to a less contrived / more general point: What if the
> user _knows in advance_ they'll need a pristine?  Shouldn't there be: —
>  
> - a way to say "I'm about to change a large, diffable file; detranslate
>  it into the pristine store before I touch it"?  Perhaps even make
>  files read-only at the OS level (as with svn:needs-lock) [...]?

> - a way to say "[...] download a pristine for this file now"?

> - «svn commit --keep-pristines» [...]?

At one level these are some logical extensions to the control that users
would have over the pristine-management process. These additional
controls might be valuable in certain cases.

In the context of the main driving use cases (fast connectivity to the
repo) these would be marginal tweaks with no real benefit. They could
have real benefits in the scenarios that we looked at above where there
is neither plenty space nor plenty connectivity, and when per-file
control of pristines is available.

We should consider making sure the API 

Re: A two-part vision for Subversion and large binary objects.

2022-01-25 Thread Daniel Shahaf
Julian Foad wrote on Fri, Jan 21, 2022 at 11:15:04 +:
> Scanning with 'stat'
> 
> I'm concerned about the implementation scanning the whole subtree,
> calling 'stat' on every file to determine whether the file is "changed"
> (locally modified). This is done in svn_wc__textbase_sync() with its 
> textbase_walk_cb().
> 
> It does this scan on every sync, which is twice on every syncing
> operation such as diff.
> 
> Don't we already have an optimised scan for local modifications
> implemented in the "status" code?

This? —

   [subversion/libsvn_wc/status.c]
   457/* If the on-disk dirent exactly matches the expected state
   458   skip all operations in svn_wc__internal_text_modified_p()
   459   to avoid an extra filestat for every file, which can be
   460   expensive on network drives as a filestat usually can't
   461   be cached there */
   462if (!info->has_checksum)
   463  text_modified_p = TRUE; /* Local addition -> Modified */
   464else if (ignore_text_mods
   465||(dirent
   466   && info->recorded_size != SVN_INVALID_FILESIZE
   467   && info->recorded_time != 0
   468   && info->recorded_size == dirent->filesize
   469   && info->recorded_time == dirent->mtime))
   470  text_modified_p = FALSE;

This still does a stat() on every file; how else would it obtain
dirent->mtime?  It doesn't do open()/read()/memcmp().

> Could we re-use this?

textbase_walk_cb() calls check_file_modified() which has a very similar
size-and-mtime check right at the top.  So, we already repeat the logic;
we just implement it twice.

Reuse would be nice, of course.  If nothing else, we could at least add
comments to the two locations cross-referencing them.

> Premature Hydrating
> 
> The present implementation "hydrates" (fetches missing pristines) every
> file within the whole subtree the operation targets. This is done by
> every major client operation calling svn_client__textbase_sync() before
> and afterwards.
> 

Does it?  Looking at textbase_walk_cb(), it only sets REFERENCED to TRUE
for modified files.  If I understand correctly textbase_walk_cb() and
the docstring of svn_wc__db_textbase_walk(), something along the lines of
.
svn revert -R ./
echo foo > subversion/tests/README
svn diff
.
would fetch the pristine only for that one file, wouldn't it?

Sorry, I haven't got time to test this right now.

> That is pessimistic: the operation may not actually touch all these
> files if limited in any way such as by
> 
>   - depth filtering
>   - other filtering (changelist, properties-only, ...)
>   - terminating early (e.g. output piped to 'head')
> 
> That introduces all the fetching overhead for the given subtree as a
> latency before the operation shows its results, which for something
> small at the root of the tree such as "svn diff --depth=empty
> --properties-only ./" may make a significant usability impact.
> 
> Presumably we could add the depth and some other kinds of filtering to
> the tree walk. But that will always leave terminating early, and
> possibly other cases, sub-optimal.
> 
> I would prefer a solution that defers the hydrating until closer to the
> moment of demand.

Agree that from a UX perspective, it would be nice to avoid a long delay
at the start of an operation.

However, in cases such as «svn diff --diff-cmd=», fetching the
pristines (too) close to the time they are needed could result in having
to reopen RA sessions.  In this case, perhaps it would make sense to
download the pristines in the background in a separate thread (at least
in case APR_HAS_THREADS)?

> Evgeny, have you looked into these possibilities at all? What are your
> thoughts about these?

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-01-24 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Jan 25, 2022 at 01:22:07 -0600:
> On 25 Jan 2022, Daniel Shahaf wrote:
> > We _could_ make them in a way that doesn't require us to provide
> > compatibility for them forever, such as by releasing them as
> > "experimental" (cf.
> > https://subversion.apache.org/docs/release-notes/1.10#shelving),
> > by releasing an alpha or a nightly and soliciting feedback for
> > that, or by prototyping in Python what can be so prototyped.
> 
> We could, but I also have the feeling that after a while (a few months?)  of
> usage of the basic implementation, we'll all have a pretty good idea of what
> improvements would be most helpful.

Sure.  I was just thinking that getting the functionality into a tarball
would mean more people would be able to test it.

> > (Aside: "explicitly-hydrated" is a bit of a mouthful.  I considered just
> > referring to these as "somatic" and "autonomous" pristines…)
> 
> Yes... that would be so... clarifying...:-)

I know; that's why I didn't use those terms.  But they _would_ have been
very greppable :)

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-01-24 Thread Karl Fogel

On 25 Jan 2022, Daniel Shahaf wrote:

Karl Fogel wrote on Mon, Jan 24, 2022 at 12:35:10 -0600:
I'm partly just thinking out loud here, to stimulate us all to 
think.  None
of this affects the initial, whole-WC implementation, and of 
course let's
keep in mind that the *main* use case will be already 
well-served by that
initial implementation.  These further improvements are for the 
future, and
perhaps we shouldn't even make them until we've all had some 
experience with

the initial simple UI.


+1 to every single sentence of this paragraph.


*whew*


We _could_ make them in a way that doesn't require us to provide
compatibility for them forever, such as by releasing them as
"experimental" 
(cf. https://subversion.apache.org/docs/release-notes/1.10#shelving),

by releasing an alpha or a nightly and soliciting feedback for
that, or by prototyping in Python what can be so prototyped.


We could, but I also have the feeling that after a while (a few 
months?)  of usage of the basic implementation, we'll all have a 
pretty good idea of what improvements would be most helpful.


(Aside: "explicitly-hydrated" is a bit of a mouthful.  I 
considered just

referring to these as "somatic" and "autonomous" pristines…)


Yes... that would be so... clarifying...:-)

Best regards,
-Karl


Re: A two-part vision for Subversion and large binary objects.

2022-01-24 Thread Daniel Shahaf
Karl Fogel wrote on Mon, Jan 24, 2022 at 12:35:10 -0600:
> On 24 Jan 2022, Daniel Shahaf wrote:
> > > > - «svn commit --keep-pristines», in case Alice has two logical
> > > > changes that she'd like to make in separate commits?
> > > 
> > > Maybe, or maybe one just uses 'svn dehydrate' ('svn hydrate
> > > --dehydrate' :-)
> > > ) when one is done working on the file.
> > 
> > I figured «svn commit --keep-pristines» could be used not only after
> > manually hydrating but also after implicit hydrating (e.g., after
> > «echo foo >> iota && svn diff iota»).
> 
> Before implementing these options (which obviously won't happen in the first
> iteration anyway), we should think carefully about naming and about how much
> of the underlying implementation detail we want to expose.

+1; I thought so too, just didn't say so explicitly.

> And there are broader UI/UX questions:
> 
> Maybe once someone starts working on a (large) file, they are likely to
> continue working on it.  In which case, we should keep the pristine until
> told to drop it.  Then 'svn cleanup' could have an additional behavior:
> "remove pristine for any unmodified file that would *normally* not have a
> pristine (except that the user manually caused it to have a pristine due to
> some special action or circumstance)".
> 
> This is a bit different from the current '--vacuum-pristines' option of 'svn
> cleanup', by the way, though maybe there should be some connection.

Let's disentangle this.

In 1.14, «svn cleanup --vacuum-pristines» is used to garbage collect
(GC) unreferenced pristines.  The need to run it manually has been
documented as a bug since 
https://subversion.apache.org/docs/release-notes/1.7#wc-pristines
(10 years ago).

In pristines-on-demand (as it stands / as currently envisioned)
a pristine may be either absent altogether, present because the file
_had been_ locally modified ("implicitly hydrated"), or present because
the user advised us that the file is _about to become_ locally modified
("explicitly hydrated").

All three cases fall under "Return to the state just after a fresh
checkout".  However, as you say, removing implicitly-downloaded
pristines is closer to GC'ing pristines, because both of these cases
dehydrate pristines that had been hydrated by the library's logic,
whereas removal of an explicitly-downloaded pristine undoes an explicit
user action.

As to UI…

- The GC case has no downsides unless the user downdates or switches the
  wc, in which case the unreferenced pristines might be used.

  [Remind me: is this reuse possible over all RA layers?]

- In the other two cases, the user is making an informed choice to
  dehydrate.  That is similar to «revert» in that an accidental use may
  have non-trivial costs, so we can consider some of «revert»'s
  behaviour of defaulting to --depth=empty and not using 
svn_opt_push_implicit_dot_target(),
  particularly if the file in question has local mods.

  I don't immediately see a reason to distinguish between explicitly-
  and implicitly-downloaded files at the UI level in this context.
  However, right now the question is whether we should be make this
  distinction at the library implementation level.  (For instance, the
  "keep the pristine until told to drop it" scenario implies being able
  to make such a distinction.)

  More precisely, the question is whether our design permits us to add
  this functionality to the library if and when a UI need for it will
  want to be implemented.

  I think it does.  Suppose we release 1.15 without library support for
  distinguishing implicitly/explicitly-hydrated pristines, and then want
  to add such support in 1.16.  I think 1.16 will be able to implement
  this without a format bump if it adds to the PRISTINE table a column
  declared as «manually_hydrated INTEGER NOT NULL DEFAULT ( 0 )»,
  provided 1.16 will handle the possibility that an old, 1.15 client
  will dehydrate the pristine in spite of the user's instruction.  (The
  DEFAULT constraint on column definitions is supported by the oldest
  SQLite version supported by 1.14.)

  So, unrolling the chain of logic, I think we'll be able to teach the
  backend to distinguish explicitly/implicitly hydrated pristines if and
  when the UI requires this.

>

Aside: Can we have two working copies share _only_ their pristine
stores?  That is, continue to have separate wc.db files, but use the
same on-disk pristine store?  That might be easier to implement than
shared wc.db's, and would be useful if multiple wc's need the same file
hydrated.

(Or sharing could happen at a lower level, as with
http://scord.sourceforge.net/, mentioned on issue #525 — although this
particular solution doesn't support wc-ng (≥1.7).)

> And maybe a --interactive option would be good, so the user can
> interactively choose which pristines to drop and which not!

Perhaps; but for now, we can let people write scripts around svn to
achieve this, like how svn-bisect(1) and backport.pl are external.
That's 

Re: A two-part vision for Subversion and large binary objects.

2022-01-24 Thread Karl Fogel

On 24 Jan 2022, Daniel Shahaf wrote:
Sure!  And a script for running «hydrate» automatically could be 
called

"submerge". :)

And I guess we'll want `svn info` to grow a "Last watered at:" 
line.


As long as we alias 'svn mop' for 'svn cleanup', it's fine with me 
:-).


Agreed, but perhaps have a --offline-only option to let people 
say
"Error out if you can't complete the operation without contacting 
the
server".  That might be useful for «revert», «diff», etc., as 
well.


Yep.


Use-case: to request "fail fast" behaviour rather than commence
a (known to the user to be) long/expensive network retrieval.


Indeed, other tools implement that option, for that exact purpose. 
I think it's quite reasonable.


> - «svn commit --keep-pristines», in case Alice has two 
> logical changes

> that she'd like to make in separate commits?

Maybe, or maybe one just uses 'svn dehydrate' ('svn hydrate 
--dehydrate' :-)

) when one is done working on the file.


I figured «svn commit --keep-pristines» could be used not only 
after

manually hydrating but also after implicit hydrating (e.g., after
«echo foo >> iota && svn diff iota»).


Before implementing these options (which obviously won't happen in 
the first iteration anyway), we should think carefully about 
naming and about how much of the underlying implementation detail 
we want to expose.


And there are broader UI/UX questions:

Maybe once someone starts working on a (large) file, they are 
likely to continue working on it.  In which case, we should keep 
the pristine until told to drop it.  Then 'svn cleanup' could have 
an additional behavior: "remove pristine for any unmodified file 
that would *normally* not have a pristine (except that the user 
manually caused it to have a pristine due to some special action 
or circumstance)".


This is a bit different from the current '--vacuum-pristines' 
option of 'svn cleanup', by the way, though maybe there should be 
some connection.


And maybe a --interactive option would be good, so the user can 
interactively choose which pristines to drop and which not!


I'm partly just thinking out loud here, to stimulate us all to 
think.  None of this affects the initial, whole-WC implementation, 
and of course let's keep in mind that the *main* use case will be 
already well-served by that initial implementation.  These further 
improvements are for the future, and perhaps we shouldn't even 
make them until we've all had some experience with the initial 
simple UI.


Best regards,
-Karl


  1   2   >