Re: svn commit: r1915519 [1/4] - in /subversion/branches/pristine-checksum-salt: ./ build/ build/ac-macros/ build/generator/ build/generator/swig/ contrib/client-side/svn_load_dirs/ contrib/hook-scrip

2024-02-03 Thread Evgeny Kotkov via dev
Daniel Sahlberg  writes:

> Log:
> On branch pristine-checksum-salt:
>
> Catchup merge with trunk

`pristine-checksum-salt` is branched off from branches/pristine-checksum-kind
(rather than from trunk), so in this case it should probably be better to
merge the trunk changes into the parent branch first, and propagate them
into the `pristine-checksum-salt` branch from parent.


Thanks,
Evgeny Kotkov


Re: Changing the permission checks in libsvn_subr/io.c

2024-02-03 Thread Evgeny Kotkov via dev
Daniel Sahlberg  writes:

>> Index: subversion/libsvn_subr/io.c
>> ===
>> --- subversion/libsvn_subr/io.c (revision 1915064)
>> +++ subversion/libsvn_subr/io.c (working copy)
>> @@ -2535,7 +2535,14 @@ svn_io__is_finfo_read_only(svn_boolean_t *read_onl
>>apr_uid_t uid;
>>apr_gid_t gid;
>>
>> -  *read_only = FALSE;
>> +  *read_only = (access(file_info->fname, W_OK) != 0);

There are a few problems, because this approach adds a I/O syscall (that
checks access to a file by its name) into a fairly low-level function.

Previously this function was analyzing the file information from the passed-in
apr_finfo_t structure.  The new version, however, performs an I/O call, and
that leads to a couple of problems:

1) Potential performance regression, not limited to just the revert.
   For example, all calls to open_pack_or_rev_file() in libsvn_fs_fs would
   now start to make additional access() calls, thus slowing down the
   repository operations.

2) This adds an opportunity for a desynchronization/race between the
   information in apr_finfo_t and the result of access(), because access()
   works by a filename.  For example, in a case where a file is recreated
   between the two calls, its result will correspond to a completely
   different file, compared to the one that produced the apr_finfo_t.

Overall, I have a feeling that solving this specific edge-case might not be
worth introducing these regressions (which look significant, at least to me).


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-18 Thread Evgeny Kotkov via dev
Daniel Sahlberg  writes:

> As far as I understand, the point of multi-hash is to keep the WC format
> between versions (so older clients can continue to use the WC).

Just as a minor note, the working copies created using the implementation
on the `pristine-checksum-salt` branch don't multi-hash the contents, but
rather make the [single] used checksum kind configurable and persist it at
the moment when a working copy is created or upgraded.

> I need some help to understand how that would work in practice. Let's say
> that 1.15 adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But...
> - A 1.17 client will only use SHAABC or SHAXYZ hashes.
> - A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes.
> - A 1.15 client can only use SHA1 and SHAABC hashes.
>
> How can these work together? A WC created in 1.17 can't be used by a 1.15
> client and a WC created in 1.15 (with SHA1) can't be used by a 1.17 client.
> How is this different from bumping the format? How do we detect this?

In the current design available on the `pristine-checksum-salt` branch, the
supported checksum kinds are tied to a working copy format, and any supported
checksum kind may additionally use a dynamic salt.  For example, format 33
supports only SHA-1 (regular or dynamically salted), but a newer format 34
can add support for another checksum kind such as SHA-2 if necessary.

When an existing working copy is upgraded to a newer format, its current
checksum kind is retained as is (we can't rehash the content in a
`--store-pristine=no` case because the pristines are not available).

I don't know if we'll find ourselves having to forcefully phase out SHA-1
*even* for such working copies that retain an older checksum kind, i.e.,
it might be enough to use the new checksum kind only for freshly created
working copies.  However, there would be a few options to consider:

I think that milder options could include warning the user to check out a
new working copy (that would use a different checksum kind), and a harsher
option could mean adding a new format that doesn't support SHA-1 under
any circumstances, and declaring all previously available working copy
formats unsupported.


Regards,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2024-01-18 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> Merged in https://svn.apache.org/r1905955
>
> I'm going to respond on the topic of SHA1 a bit later.

For the history: thread [1] proposes the `pristine-checksum-salt` branch that
adds the infrastructure to support new pristine checksum kinds in the working
copy and makes a switch to the dynamically-salted SHA1.

>From the technical standpoint, I think that it would be better to release
the first version of the pristines-on-demand feature having this branch
merged, because now we rely on the checksum comparison to determine if a
file has changed — and currently it's a checksum kind with known collisions.

At the same time, having that branch merged probably isn't a formal release
blocker for the pristines-on-demand feature.  Also, considering that the
`pristine-checksum-salt` branch is currently vetoed by danielsh (presumably,
for an indefinite period of time), I'd like to note that personally I have
no objections to proceeding with a release of the pristines-on-demand
feature without this branch.

[1] https://lists.apache.org/thread/xmd7x6bx2mrrbw7k5jr1tdmhhrlr9ljc


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-18 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> Procedurally, the long hiatus is counterproductive.

This reminds me that the substantive discussion of your veto ended with my
email from 8 Feb 2023 that had four direct questions to you and was left
without an answer:

``
  > That's not how design discussions work.  A design discussion doesn't go
  > "state decision; state pros; implement"; it goes "state problem; discuss
  > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

  Well, I think it may not be as simple as it seems to you.  Who decided that
  we should follow the process you're describing?  Is there a thread with a
  consensus on this topic?  Or do you insist on using this specific process
  because it's the only process that seems obvious to you?  What alternatives
  to it have been considered?

  As far as I can tell, the process you're suggesting is effectively a
  waterfall-like process, and there are quite a lot of concerns about its
  effectiveness, because the decisions have to be made in the conditions of
  a lack of information.
``

It's been more than 11 months since that email, and those questions still
don't have an answer.  So if we are to resume this discussion, let's do it
from the proper point.

> You guys are welcome to try to /convince/ me to change my opinion, or to
> have the veto invalidated.  In either case, you will be more likely to
> succeed should your arguments relate not only to the veto's implications
> but also to its /sine qua non/ component: its rationale.

Just in case, my personal opinion here is that the veto is invalid.

Firstly, based on my understanding, the ASF rules prohibit casting a veto
without an appropriate technical justification (see [1], which I personally
agree with).  Secondly, it seems that the process you are imposing hasn't been
accepted in this community.  As far as I know, this topic was tangentially
discussed before (see [2], for example), and it looks like there hasn't been
a consensus to change our current Commit-Then-Review process into some
sort of Review-Then-Commit.

(At the same time I won't even try to /convince/ you, sorry.)

[1] https://www.apache.org/foundation/voting.html
[2] https://lists.apache.org/thread/ow2x68g2k4lv2ycr81d14p8r8w2jj1xl


Regards,
Evgeny Kotkov


Re: E160056 Item index too large in revision

2023-04-12 Thread Evgeny Kotkov via dev
Daniel Sahlberg  writes:

> In the TortoiseSVN-dev group it is claimed to be because of APR 1.7.3 and
> that reverting back to 1.7.2 restores normal operation, I havn't yet
> verified nor tried to figure out which of the changes between 1.7.2
> and 1.7.3 may be responsible for this issue. As I know some of the
> APR committers are also here, I'm hoping for some help.

The problem seems to be caused by a change in APR 1.7.3, authored by me
in an attempt to fix wrong file offsets reported after opening files on
Windows and Unix:

 *) Don't seek to the end when opening files with APR_FOPEN_APPEND on Windows.

Unfortunately, I missed a case where flushing a APR_APPEND | APR_BUFFERED
file was implicitly relying on that specific file offset, and that is what
caused the regression you're seeing.

I have now reverted this change in https://svn.apache.org/r1909088 and
https://svn.apache.org/r1909089, so hopefully this regression will be fixed
in the next APR 1.7.x patch release.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-04-01 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> What's the question or action item to/for me?  Thanks.

I'm afraid I don't fully understand your question.  As you probably remember,
the change is blocked by your veto.  To my knowledge, this veto hasn't been
revoked as of now, and I simply mentioned that in my email.  It is entirely
your decision whether or not to take any action regarding this matter.


Thanks,
Evgeny Kotkov


Re: [PROPOSAL] Allow plaintext passwords again.

2023-03-29 Thread Evgeny Kotkov via dev
Nathan Hartman  writes:

> I think a good middle ground is:
>
> * Build with --enable-plaintext-password-storage by default; users who
>   want to harden their system can do so, but will need to build their
>   own client.

+1.

> * Set the default run-time config to store-plaintext-passwords = no
>   (if it isn't already; haven't checked) and instruct users on how to
>   change it. This makes the decision to store in plaintext an explicit
>   one that the user can opt into. (I appreciate that this could be
>   changed without the user's knowledge; perhaps the systemwide config
>   should always take precedence over the user-controlled one for this
>   setting?)

So, apparently, the current default is "ask".

I haven't checked all the details, but I think that defaulting to "ask"
already makes the user decision explicit and allows it to happen naturally,
without requiring any additional instructions or knowledge.

If we change the default to "no", this part of the experience could be worse,
because for the end users it might look like the credentials aren't being
stored for unknown reasons / a bug in the software.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-03-22 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> > Now, how hard would this be to actually implement?
>
> To have a more or less accurate estimate, I went ahead and prepared the
> first-cut implementation of an approach that makes the pristine checksum
> kind configurable in a working copy.
>
> The current implementation passes all tests in my environment and seems to
> work in practice.  It is available on the branch:
>
>   https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind
>
> The implementation on the branch allows creating working copies that use a
> checksum kind other than SHA-1.

I extended the current implementation to use a dynamically salted SHA-1
checksum, rather than a SHA-1 with a statically hardcoded salt.
The dynamic salt is generated during the creation of a wc.db.

The implementation is available on a separate branch:

  https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

The change is a bit massive, but in the meantime I think that it should solve
the potential problem without any practical drawbacks, except for the lack
of the mentioned ra_serf fetch optimization.

So overall I'd propose to bring this change to trunk, to improve the current
state around checksum collisions in the working copy, and to also have the
infrastructure for supporting different checksum kinds in place, in case
we need it in the future.

This change is still being blocked by a veto, but if danielsh changes his
mind and if there won't be other objections, I'm ready to complete the few
remaining bits and merge it to trunk.


Thanks,
Evgeny Kotkov


Re: svn commit: r1907965 - in /subversion/trunk/subversion: include/ include/private/ libsvn_client/ libsvn_wc/ svn/ tests/cmdline/

2023-03-03 Thread Evgeny Kotkov via dev
Jun Omae  writes:

> After r1907965, build on Windows is failing with the following errors.
>
> [[[
> svn_client-1.lib(upgrade.obj) : error LNK2001: unresolved external symbol 
> svn_wc__version_string_from_format 
> [D:\a\subversion\subversion\build\win32\vcnet-vcproj\libsvn_client_dll.vcxproj]
> D:\a\subversion\subversion\Release\subversion\libsvn_client\libsvn_client-1.dll
>  : fatal error LNK1120: 1 unresolved externals 
> [D:\a\subversion\subversion\build\win32\vcnet-vcproj\libsvn_client_dll.vcxproj]
> ]]]

Thanks, should be fixed in r1908042.


Regards,
Evgeny Kotkov


Re: Initial patch for storing the pristines-on-demand setting in the working copy (Issue 4889)

2023-03-02 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> While working on the patch, I have stumbled across a couple of issues:
>
> A) `svn upgrade` without arguments fails for a working copy with latest format
>
>   $ svn checkout --compatible-version=1.15 wc
>   $ svn upgrade wc
>   $ svn: E155021: Working copy '…' is already at version 1.15 (format 32)
> and cannot be downgraded to version 1.8 (format 31)

While the work on pristine checksum kinds is blocked by a veto, I took a
look at other things we'd probably want to handle before the release.

I think that the above case qualifies as such, so I made a related improvement
by introducing a new config option and also fixed the described error:

- r1907964 introduces a new `compatible-version` config setting, to allow
  configuring the desired default wc compatibility level globally.

- r1907965 fixes an error when `svn upgrade` is called without any arguments
  for a working copy of the newer format.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-07 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> Look, it's pretty simple.  You said "We should do Y because it
> addresses X".  You didn't explain why X needs to be addressed, didn't
> consider what alternatives there are to Y, didn't consider any cons that
> Y may have… and when people had questions, you just began to
> implement Y, without responding to or even acknowledging those
> questions.
>
> That's not how design discussions work.  A design discussion doesn't go
> "state decision; state pros; implement"; it goes "state problem; discuss
> potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

Well, I think it may not be as simple as it seems to you.  Who decided that
we should follow the process you're describing?  Is there a thread with a
consensus on this topic?  Or do you insist on using this specific process
because it's the only process that seems obvious to you?  What alternatives
to it have been considered?

As far as I can tell, the process you're suggesting is effectively a
waterfall-like process, and there are quite a lot of concerns about its
effectiveness, because the decisions have to be made in the conditions of
a lack of information.

Personally, I prefer an alternative process that starts from finding out
all available bits of information, which are then used to make informed
decisions.  The unfortunate reality, however, is that the only guaranteed
way of collecting all information means implementing all (or almost all)
significant parts in code.  Roughly speaking, this process looks like a
research project that gets completed by trial and error.

Based on what you've been saying so far, I wouldn't be surprised if you
disagree.  But I still think that forcing the others to follow a certain
process by such means as vetoing a code change is maybe a bit over the
top.  (In the meantime, I certainly won't object if you're going to use this
waterfall-like process for the changes that you implement yourself.)


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-29 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> > (I'm not saying that the above rules have to be used in this particular case
> >  and that a veto is invalid, but still thought it’s worth mentioning.)
> >
>
> I vetoed the change because it hadn't been designed on the dev@ list,
> had not garnered dev@'s consensus, and was being railroaded through.
> (as far as I could tell)

I have *absolutely* no idea where "being railroaded through" comes from.
Really, it's a wrong way of portraying and thinking about the events that have
happened so far.

Reiterating over those events: I wrote an email containing my thoughts
and explaining the motivation for such change.  I didn't reply to some of
the questions (including some tricky questions, such as the one featuring
a theoretical hash function), because they have been at least partly
answered by others in the thread, and I didn't have anything valuable
to add at that time.

During that time, I was actively coding the core part of the change,
to check if it's possible technically.  Which is important, as far as
I believe, because not all theoretically possible solutions can be implemented
without facing significant practical or implementation-related issues, and
it seems to me that you significantly undervalue such an approach.

I do not say my actions were exemplary, but as far as I can tell, they're
pretty much in line with how svn-dev has been operating so far.  But, it all
resulted in an unclear veto without any _technical_ arguments, where what's
being vetoed is unclear as well, because the change was not ready at the
moment veto got casted.

And because your veto goes in favor of a specific process (considering that
no other arguments were given), the only thing that's *actually* being
railroaded is an odd form of an RTC (review-then-commit) process that is
against our usual CTR (commit-then-review) [1,2].  That's railroading,
because it hasn't been explicitly discussed anywhere and a consensus
on it has not been reached.

[1] https://www.apache.org/foundation/glossary.html#CommitThenReview
[2] https://www.apache.org/foundation/glossary.html#ReviewThenCommit


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-29 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> > That could happen after a public disclosure of a pair of executable
> > files/scripts where the forged version allows for remote code execution.
> > Or maybe something similar with a file format that is often stored in
> > repositories and that can be executed or used by a build script, etc.
> >
>
> Err, hang on.  Your reference described a chosen-prefix attack, while
> this scenario concerns a single public collision.  These are two
> different things.

A chosen-prefix attack allows finding more meaningful collisions such as
working executables/scripts.  When such collisions are made public, they
would have a greater exploitation potential than just a random collision.

> Disclosure of of a pair of executable files/scripts isn't by itself
> a problem unless one of the pair ("file A") is in a repository
> somewhere.  Now, was the colliding file ("file B") generated _before_ or
> _after_ file A was committed?
>
> - If _before_, then it would seem Mallory had somehow managed to:
>
>   1. get a file of his choosing committed to Alice's repository; and
>
>   2. get a wc of Alice's repository into one of the codepaths that
>  assume SHA-1 is one-to-one / collission-free (currently that's the
>  ra_serf optimization and the 1.15 wc status).

Not only.  There are cases when the working copy itself installs the working
file with a hash lookup in the pristine store.  This is more true for 1.14
than trunk, because in trunk we have the streamy checkout/update that avoid
such lookups by writing straight to the working file.  However, some of
the code paths still install the contents from the pristine store by hash.
Examples include reverting a file, copying an unmodified file, switching
a file with keywords, the mentioned ra_serf optimization, and etc.

>   Now, step #1 seems plausible enough.  As to step #2, it's not clear to
>   me how file B would reach the wc in step #2…

If Mallory has write access, she could commit both files, thus arranging for
a possible content change if both files are checked out to a single working
copy.  This isn't the same as just directly modifying the target file, because
file content isn't expected to change due to changes in other files (that can
be of any type), so this attack has much better chances of being unnoticed.

If Mallory doesn't have write access, there should be other vectors, such
as distributing a pair of files (harmless in the context of their respective
file formats) separately via two upstream channels.  Then, if both of the
upstream distributions are committed into a repository and their files are
checked out together, the content will change, allowing for a malicious
action.


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> > I can complete the work on this branch and bring it to a production-ready
> > state, assuming there are no objections.
>
> Your assumption is counterfactual:
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E

I don't see any explicit objections in these two emails (here I assume that
if something is not clear to a PMC member, it doesn't automatically become
an objection).  If the "why?" question is indeed an objection, then I would
say it has already been discussed and responded to in the thread.

Now, returning to the problem:

As described in the advisory [1], we have a supported configuration that
makes data forgery possible:

- A repository with disabled rep-sharing allows storing different files with
  colliding SHA-1 values.
- Having a repository with disabled rep-sharing is a supported configuration.
  There may be a certain number of such repositories in the wild
  (for example, created with SVN < 1.6 and not upgraded afterwise).
- A working copy uses an assumption that the pristine contents are equal if
  their SHA-1 hashes are equal.
- So committing different files with colliding SHA-1 values makes it possible
  to forge the contents of a file that will be checked-out and used by the
  client.

I would say that this state is worrying just by itself.

However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's
probably only a matter of time until the situation becomes worse.

That could happen after a public disclosure of a pair of executable
files/scripts where the forged version allows for remote code execution.
Or maybe something similar with a file format that is often stored in
repositories and that can be executed or used by a build script, etc.

[1] https://subversion.apache.org/security/sha1-advisory.txt
[2] https://sha-mbles.github.io/


Speaking of the proposed switch to SHA-256 or a different checksum, there's
an argument by contradiction: if we were designing the pristineless working
copy from scratch today, would we choose SHA-1 as the best available hash
that can be used to assert content equality?  If yes, how can one prove that?

> Objections have been raised, been left unanswered, and now implementation
> work has commenced following the original design.  That's not acceptable.
> I'm vetoing the change until a non-rubber-stamp design discussion has
> been completed on the public dev@ list.

I would like to note that vetoing a code modification should be accompanied
with a technical justification, and I have certain doubts that the above
arguments qualify as such:

https://www.apache.org/foundation/voting.html
[[[
To prevent vetoes from being used capriciously, the voter must provide
with the veto a technical justification showing why the change is bad
(opens a security exposure, negatively affects performance, etc. ).
A veto without a justification is invalid and has no weight.
]]]

(I'm not saying that the above rules have to be used in this particular case
 and that a veto is invalid, but still thought it’s worth mentioning.)

Anyway, I'll stop working on the branch, because a veto has been casted.


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Now, how hard would this be to actually implement?

To have a more or less accurate estimate, I went ahead and prepared the
first-cut implementation of an approach that makes the pristine checksum
kind configurable in a working copy.

The current implementation passes all tests in my environment and seems to
work in practice.  It is available on the branch:

  https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind

The implementation on the branch allows creating working copies that use a
checksum kind other than SHA-1.

The checksum kind is persisted in the settings table.  Upgraded working copies
of the older formats will have SHA-1 recorded as their pristine checksum kind
and will continue to use it for compatibility.  Newly created working copies
of the latest format (with --compatible-version=1.15 or --store-pristine=no),
as currently implemented, will use the new pristine checksum kind.

Currently, as a proof-of-concept, the branch uses salted SHA-1 as the new
pristine checksum kind.  For the production-ready state, I plan to support
using multiple new checksum types such as SHA-256.  I think that it would
be useful for future compatibility, because if we encounter any issues with
one checksum kind, we could then switch to a different kind without having
to change the working copy format.

One thing worth noting is that ra_serf contains a specific optimization for
the skelta-style updates that allows skipping a GET request if the pristine
store already contains an entry with the specified SHA-1 checksum.  Switching
to a different checksum type for the pristine entries is going to disable
that specific optimization.  Re-enabling it would require an update of the
server-side.  I consider this to be out of scope for this branch.

I can complete the work on this branch and bring it to a production-ready
state, assuming there are no objections.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-29 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Now, how hard would this be to actually implement?

I plan to take a more detailed look at that, but I'm currently on vacation
for the New Year holidays.


Thanks,
Evgeny Kotkov


Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> > While here, I would like to raise a topic of incorporating a switch from
> > SHA1 to a different checksum type (without known collisions) for the new
> > working copy format.  This topic is relevant to the pristines-on-demand
> > branch, because the new "is the file modified?" check relies on the
> > checksum comparison, instead of comparing the contents of working and
> > pristine files.
> >
> > And so while I consider it to be out of the scope of the pristines-on-
> > demand branch, I think that we might want to evaluate if this is something
> > that should be a part of the next release.
>
> Good point.  Maybe worth a new thread?

[Moving discussion to a new thread]

We currently have a problem that a working copy relies on the checksum type
with known collisions (SHA1).  A solution to that problem is to switch to a
different checksum type without known collisions in one of the newer working
copy formats.

Since we plan on shipping a new working copy format in 1.15, this seems to
be an appropriate moment of time to decide whether we'd also want to switch
to a checksum type without known collisions in that new format.

Below are the arguments for including a switch to a different checksum type
in the working copy format for 1.15:

1) Since the "is the file modified?" check now compares checksums, leaving
   everything as-is may be considered a regression, because it would
   introduce additional cases where a working copy currently relies on
   comparing checksums with known collisions.

2) We already need a working copy format bump for the pristines-on-demand
   feature.  So using that format bump to solve the SHA1 issue might reduce
   the overall number of required bumps for users (assuming that we'll still
   need to switch from SHA1 at some point later).

3) While the pristines-on-demand feature is not released, upgrading with a
   switch to the new checksum type seems to be possible without requiring a
   network fetch.  But if some of the pristines are optional, we lose the
   possibility to rehash all contents in place.  So we might find ourselves
   having to choose between two worse alternatives of either requiring a
   network fetch during upgrade or entirely prohibiting an upgrade of
   working copies with optional pristines.

Thoughts?


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-13 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> I think that the `pristines-on-demand-on-mwf` branch is now ready for a
> merge to trunk.  I could do that, assuming there are no objections.

Merged in https://svn.apache.org/r1905955

I'm going to respond on the topic of SHA1 a bit later.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-07 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> > IMHO, once the tests are ready, we could merge it and release
> > it to the world.
>
> Apart from the required test changes, there are some technical
> TODOs that remain from the initial patch and should be resolved.
> I'll try to handle them as well.

I think that the `pristines-on-demand-on-mwf` branch is now ready for a
merge to trunk.  I could do that, assuming there are no objections.

  
https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf

The branch includes the following:
– Core implementation of the new mode where required pristines are fetched
  at the beginning of the operation.
– A new --store-pristine=yes/no option for `svn checkout` that is persisted
  as a working copy setting.
– An update for `svn info` to display the value of this new setting.
– A standalone test harness that tests main operations in both
  --store-pristine modes and gets executed on every test run.
– A new --store-pristine=yes/no option for the test suite that forces all
  tests to run with a specific pristine mode.

The branch passes all tests in my Windows and Linux environments, in both
--store-pristine=yes and =no modes.


While here, I would like to raise a topic of incorporating a switch from
SHA1 to a different checksum type (without known collisions) for the new
working copy format.  This topic is relevant to the pristines-on-demand
branch, because the new "is the file modified?" check relies on the checksum
comparison, instead of comparing the contents of working and pristine files.

And so while I consider it to be out of the scope of the pristines-on-demand
branch, I think that we might want to evaluate if this is something that
should be a part of the next release.


Thanks,
Evgeny Kotkov


Re: svn commit: r1905663 - in /subversion/branches/pristines-on-demand-on-mwf/subversion: include/ include/private/ libsvn_client/ libsvn_wc/

2022-12-02 Thread Evgeny Kotkov via dev
Bert Huijben  writes:

> All the now deprecated functions now fail unconditionally when the setting
> is enabled. Isn’t it possible to do this more graceful whenever a file is
> encountered which misses it’s prisite version?

The problem with this approach is that the functions are going to work, but
only *sometimes*, and will fail unpredictably, depending on whether a pristine
was fetched or removed by any previous API call.

With that in mind, failing consistently seems to be a more appropriate choice
for a deprecated function than failing randomly.

> As far as I know it is expected that some of the files do have pristines,
> while others don’t… That would allow things like diffs on old clients that
> didn’t switch apis yet.

Thinking about this, the currently documented assumption is that a file has
no pristine if and only if it's being added/replaced without a copyfrom.
So maybe we cannot really extend that to "any file might not have a pristine"
without it being an incompatible change.


Thanks,
Evgeny Kotkov


Re: svn commit: r1905663 - in /subversion/branches/pristines-on-demand-on-mwf/subversion: include/ include/private/ libsvn_client/ libsvn_wc/

2022-12-01 Thread Evgeny Kotkov via dev
Daniel Sahlberg  writes:

>> +  /** @since New in 1.15 */
>> +  SVN_ERRDEF(SVN_ERR_WC_DEPRECATED_API_STORE_PRISTINE,
>> + SVN_ERR_WC_CATEGORY_START + 43,
>> + "This client was not updated to support working copies "
>> + "without local pristines")
>> +
>>/* fs errors */
>
> Is it really "This client"? It looks more to be based on the WC setting.

This new error should only occur when someone (a 3rd party client) uses the
deprecated API function for a working copy without pristines.

>From that point of view, I would say that the problem is in the client that is
still using the deprecated API and thus cannot properly handle such working
copies.

Also, the current message aims to be somewhat consistent with another existing
error for a case when a client is too old to read a newer working copy format:
 "This client is too old to work with the working copy […] (format 32)"

(After writing this, I realized that the "was not updated" part of the message
 may be misinterpreted as a requirement to update to the newer version of the
 software.  In r1905682, I slightly rephrased that part so that it would say:
   "This client uses a deprecated API that does not support working copies
without local pristines")


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-16 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Thank you, Evgeny!  Just to make sure I understand correctly --
> the status now on the 'pristines-on-demand-on-mwf' branch is:
>
> 1) One can do 'svn checkout --store-pristines=no' to get an
> entirely pristine-less working copy.  In that working copy,
> individual files will be hydrated/dehydrated automagically on an
> as-needed basis.
>
> 2) There is no command to hydrate or dehydrate a particular file.
> Hydration and dehydration only happen as a side effect of other
> regular Subversion operations.
>
> 3) There is no way to rehydrate the entire working copy.  E.g.,
> something like 'svn update --store-pristines=yes' or 'svn hydrate
> --depth=infinity' does not exist yet.
>
> 4) Likewise, there is no way to dehydrate an existing working copy
> that currently has its pristines (even if that working copy is at
> a high-enough version format to support pristinelessness).  E.g.,
> something like 'svn update --store-pristines=no' or 'svn dehydrate
> --depth=infinity' does not exist yet.
>
> Is that all correct?

Yes, I believe that is correct.

> By the way, I do not think (2), (3), and (4) are blockers.  Just
> (1) by itself is a huge step forward and solves issue #525;

+1 on keeping the scope of the feature to just (1) for now.

> IMHO, once the tests are ready, we could merge it and release
> it to the world.

Apart from the required test changes, there are some technical
TODOs that remain from the initial patch and should be resolved.
I'll try to handle them as well.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-15 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> Perhaps we could transition into that state by committing the patch
> and maybe re-evaluate things from there.  I could do that, assuming
> no objections, of course.

Committed the patch in https://svn.apache.org/r1905324

I'll try to handle the related tasks in the near future.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-08 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> By the way, in that thread, Evgeny Kotkov -- whose initial work
> much of this is based on -- follows up with a patch that does a
> first-pass implementation of 'svn checkout --store-pristines=no'
> (by implementing a new persistent setting in wc.db).

Perhaps we could transition into that state by committing the patch
and maybe re-evaluate things from there.  I could do that, assuming
no objections, of course.


Thanks,
Evgeny Kotkov