Re: Preferred form of modification for binary data used in unit testing?

2020-07-18 Thread Marvin Renich
* s...@debian.org  [200717 17:51]:
> On Fri, 17 Jul 2020 at 10:44:24 -0400, Marvin Renich wrote:
> > The intended purpose is to ensure that the recipient has every
> > reasonable opportunity to modify the software in any reasonable way the
> > recipient desires.  The sole purpose of the requirement for source is to
> > protect this freedom, and the requirement should not be applied
> > independently from this purpose.
> 
> I mostly agree, and I do agree with the resulting conclusion, but I
> don't think this is *quite* the whole story. What you said here maps
> to the FSF's "Freedom 3" and half of "Freedom 1", and also matches the
> justification given for the source code requirement in the annotated
> Open Source Definition.
> 
> In addition to freedom to modify, I think we also want to make sure a
> sufficiently knowledgeable recipient can inspect the unmodified software;
> that's the other half of the FSF's "Freedom 1" (freedom to study).
> However, I don't think considering freedom-to-study actually changes
> the conclusion in this case.

Thanks, Simon.  This is very instructive, and I agree.

...Marvin



Re: Preferred form of modification for binary data used in unit testing?

2020-07-18 Thread Christian Kastner
On 2020-07-17 18:30, Pirate Praveen wrote:
> On 2020, ജൂലൈ 17 8:14:24 PM IST, Marvin Renich  wrote:
>> The intended purpose is to ensure that the recipient has every
>> reasonable opportunity to modify the software in any reasonable way the
>> recipient desires.  The sole purpose of the requirement for source is to
>> protect this freedom, and the requirement should not be applied
>> independently from this purpose.
>>
>> So the question becomes how does the inclusion or exclusion of the
>> binary blob, without inclusion of the full source and build process of
>> the broken version of the software used to produce the binary blob,
>> enhance or detract from the recipient's ability to produce a modified
>> version of the current, good, distributed software.
> 
> Very, well put. Many times I see blind application of rules without any other 
> consideration. The rules serve a purpose, our purpose is not to blindly serve 
> the rules. If the rules are stopping us, we need to change them, not just 
> adjust ourselves to the rules once written.

I fully concur with your opinions, however I'm not sure that they are
universally shared and/or clear. Otherwise, this thread wouldn't exist.

Some norms avoid the risk of "blindly serving the rules" by expressly
rejecting that, for example Article 11 GDPR [1] holds that the GDPR is
not self-serving.

[1]
https://gdpr.eu/article-11-what-personal-data-can-a-controller-process-without-identification/



Re: Preferred form of modification for binary data used in unit testing?

2020-07-17 Thread smcv
On Fri, 17 Jul 2020 at 10:44:24 -0400, Marvin Renich wrote:
> I think, instead of pedantically applying the wording of the DFSG, we
> should be pedantically applying the intended purpose of the DFSG.

I think this is a good way to frame questions about the DFSG, and
particularly the requirement for source code. The DFSG is a set of
guidelines, not a deterministic algorithm for mapping inputs to their
freedom status, and the reasons why we want source code are important.

Also note that "preferred form for modification" does not appear
anywhere in the DFSG: that wording is specific to the *GPL family of
licenses. However, we often find it a useful tool for interpreting and
applying the DFSG, because the DFSG and the *GPL licenses are trying to
achieve the same or similar goals, so what's good for one is often good
for the other.

(We do need to be a bit more careful with preferred forms for modification
when we are assessing whether a work under a *GPL license is compliant
or non-compliant with that license, because that's about whether we are
behaving in a way that is legally allowed, not just about whether we
are following our own self-imposed guidelines.)

> The intended purpose is to ensure that the recipient has every
> reasonable opportunity to modify the software in any reasonable way the
> recipient desires.  The sole purpose of the requirement for source is to
> protect this freedom, and the requirement should not be applied
> independently from this purpose.

I mostly agree, and I do agree with the resulting conclusion, but I
don't think this is *quite* the whole story. What you said here maps
to the FSF's "Freedom 3" and half of "Freedom 1", and also matches the
justification given for the source code requirement in the annotated
Open Source Definition.

As with the *GPL licenses, the FSF's four freedoms and Free Software
definition and the OSI's Open Source Definition are not part of the DFSG,
but they can be useful tools for interpreting and applying the DFSG,
because we're trying to achieve the same or similar goals, so what's
desirable for them is probably also desirable for us.

In addition to freedom to modify, I think we also want to make sure a
sufficiently knowledgeable recipient can inspect the unmodified software;
that's the other half of the FSF's "Freedom 1" (freedom to study).
However, I don't think considering freedom-to-study actually changes
the conclusion in this case.

For a generated or hand-crafted binary blob that is used to reproduce
a specific bug or test a particular error-recovery path, inspecting
it would tend to consist of noting that it resembles a keepassx vault
(or whatever the binary blob is in this case); that, as intended, it has
one of the required patterns that reproduces that bug or triggers that
error-recovery; and that it doesn't have lots of unexplained content
that is not required for its purpose. Confirming that this is the case
might require a specialized program (keepassx or whatever), a hex-editor,
or even single-stepping in a debugger; I don't see that as a problem,
and I certainly wouldn't expect maintainers to do that work proactively
(other than checking that it isn't excessively large and isn't obviously
non-Free).

Note that I'm not saying that it would be OK for test data to contain
copyrightable works that are not freely licensed or have undergone a
lossy transformation from a source form. For example, test data for a
tar implementation shouldn't be a tar file containing object code that
was compiled from C source, without that source also being included;
it would usually be better to use a tar file containing some zeroes,
or some random numbers, or something that meets whatever other
requirements the test has (for example size or level of compressibility)
while being Freely licensed and obviously its own preferred form for
modification.

More generally, it's best if test data is either so trivial that
questions of copyright and preferred forms are somewhat irrelevant,
or is clearly Free.

As an example of trivial test data, the pre-generated valid and invalid
D-Bus messages in the GLib test suite consist of just enough of a message
to make them suitable for the test in question, with the parts that are
not fixed by the test's requirements taking short non-meaningful values
like /foo.

As an example of non-trivial Free test data, the rgain3 source package
needs non-trivial sound files with known/fixed content in a supported
format for its autopkgtest, so I included some short sound clips taken
from sound-theme-freedesktop (which are compressed, but would be easy to
modify by decompressing, editing and re-compressing, and do not appear
to have a separate lossless source form available).

On Wed, 15 Jul 2020 at 09:45:18 +0200, Philipp Hahn wrote:
> PS: This question is motivated while working on a private build of
> > E: keepassxc source: source-is-missing tests/data/keepassxc.opvault/default

Lintian cannot judge context or intent, and most 

Re: Preferred form of modification for binary data used in unit testing?

2020-07-17 Thread Bastian Blank
On Thu, Jul 16, 2020 at 05:27:40PM -0700, Sean Whitton wrote:
> On Thu 16 Jul 2020 at 05:19PM -07, Sean Whitton wrote:
> > You would need the buggy version of the software if you wanted to
> > make modified versions of the binary data to test for closely related
> > bugs, for example.

And there the problems begin.  Every software got bugs and compilers are
especially good in finding them.  So if you store the software, you
can't be sure it will produce the same output over time.  Sure, you
could store the checksum, which then got the same problem, it is not
the preferred form of modification.

> It seems that there is not a general answer to the question.  The binary
> test data may or may not be in its preferred form for modification,
> depending on how one would want to go about preparing other pieces of
> test data.

You are right, it depends.

Another data point: our own logo.  It is generated using an algorithm.
So if someone wants to see it really strict, the algorithm and the
parameters would be the source, not the resulting vector image we use
all the time.

Regards,
Bastian

-- 
No one may kill a man.  Not for any purpose.  It cannot be condoned.
-- Kirk, "Spock's Brain", stardate 5431.6



Re: Preferred form of modification for binary data used in unit testing?

2020-07-17 Thread Pirate Praveen



On 2020, ജൂലൈ 17 8:14:24 PM IST, Marvin Renich  wrote:
>[This was just a convenient point in the thread to which to reply; it's
>not really a reply to Sean's specific message.]
>
>I think, instead of pedantically applying the wording of the DFSG, we
>should be pedantically applying the intended purpose of the DFSG.  The
>legal profession has proven, time and time again, that no written
>language can perfectly express any sufficiently complex idea (nor can it
>express perfectly many very simple ideas).
>
>The intended purpose is to ensure that the recipient has every
>reasonable opportunity to modify the software in any reasonable way the
>recipient desires.  The sole purpose of the requirement for source is to
>protect this freedom, and the requirement should not be applied
>independently from this purpose.
>
>So the question becomes how does the inclusion or exclusion of the
>binary blob, without inclusion of the full source and build process of
>the broken version of the software used to produce the binary blob,
>enhance or detract from the recipient's ability to produce a modified
>version of the current, good, distributed software.

Very, well put. Many times I see blind application of rules without any other 
consideration. The rules serve a purpose, our purpose is not to blindly serve 
the rules. If the rules are stopping us, we need to change them, not just 
adjust ourselves to the rules once written.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Preferred form of modification for binary data used in unit testing?

2020-07-17 Thread Thomas Goirand
On 7/16/20 5:00 PM, Johannes Schauer wrote:
> Hi,
> 
> Quoting Christian Kastner (2020-07-16 14:08:34)
>> On 2020-07-16 12:53, Pirate Praveen wrote:
 Generally speaking, I think it's a mistake to apply the question of
 "preferred form for modification" to unit test payloads. Unit tests are
 purely about functionality. The original source to a payload is an
 arbitrary choice (possibly even randomly generated), and could be
 replaced with any other appropriate arbitrary choice at no detriment to
 the software or the user.
>>> I think this needs to be clearly documented in policy. I don't think
>>> this interpretation is generally accepted. I have seen many cases where
>>> tests are disabled for this reason.
>> Perhaps I spoke too generally. For example, I can see, as one of
>> probably many counter-examples, the case where the input is not
>> completely arbitrary (eg: input is a captured stream).
>>
>> But to take the other extreme, using completely arbitrary data, as an
>> example: say my code implements a ROT13 function and I create a test for
>> it using a blob of random data as well as the expected output.
>>
>> That random data was generated somehow, eg: using Python's random
>> module, and could therefore be regenerated given the correct program and
>> seed. However, I did not include the code to generate that data.
>>
>> Would we really reasonably expect anyone to act upon that random blob in
>> any way?
> 
> I have another data point with one of my packages (genext2fs) where I made a
> contribution to upstream. Their unit tests execute the program with some input
> and a given set of parameters and then check that the md5sum of the created
> ext2 filesystem image matches the expected value. Without thinking, I added 
> the
> following into their test script:
> 
> H4sIA+3WTW6DMBAF4Fn3FD6B8fj3PKAqahQSSwSk9vY1uKssGiJliFretzECJAYeY1s3JM4UKYRlLG7H5ZhdTIHZGevK+ZTYkgrypRFN17EdlKIh5/G3++5d/6N004qbA47er8/fWVduV2aLD7D7/A85C88Ba/ufA/sQIhk25VdA/2+h5t+1gx4/pd7vfv+Hm/ytmfNH/8vr+ql7e3UR8DK6uUx9L/uMtev/3P8p+KX/oyHlZMuqntX/9T34Z9yk9Gco8//xkGWf8Uj+Mbpl/Y+JVJQtq9r5/K+bj3Z474+Xk9wG4JH86/rvyzxAirfYnOw+/+vXWTb+uv9PaV3+JfiSv/WOlJVPf/f5AwAAAMD/9A0cPbO/ACgAAA==
> 
> This is a base64 encoded gzipped tarball with a few test files in it. I
> generated it using GNU tar but since I found it likely that a GNU tar version
> in the future (or the past) will produce a slightly different tarball and
> because I needed some fixed input without different output on systems without
> GNU tar (like BSD or MacOS) or on older systems or on future systems, I just
> dumped that binary blob into the upstream software. In the meantime, that
> binary blob is even in the Debian package:
> 
> https://sources.debian.org/src/genext2fs/1.5.0-1/test.sh/#L89
> 
> The curious thing for me personally is, that I didn't feel bad about this at
> all and at no point from writing the code up to me packaging and uploading the
> Debian package containing the blob, I thought even twice about whether this is
> DFSG compliant or not. Only now after having read this thread I start 
> wondering
> whether I have actually created an RC bug myself. Did I? I love the principles
> of the DFSG and it really surprises me that despite my love for these freedoms
> I didn't think twice about including that binary blob instead of generating it
> on the fly. Was my mind fooled by how short the blob is? A perl script
> generating the tarball such that it's bit-by-bit identical across all 
> platforms
> would be longer than this blob.
> 
> What do you guys think? Should I put work into writing a script which produces
> above binary blob as part of the test suite to avoid having my package be RC
> buggy? I would love to get some guidance.
> 
> Thanks!
> 
> cheers, josch

I'm not sure about this being DFSG compliant or not. Though it feels
like this bad practice anyways, because:

- It wont be obvious what you're doing, and someone will have to
reverse-engineer what you did to figure things out.
- probably there was ways to go around the "tar doesn't always produce
the same output" problem, like building the tarball, and compute its sum
during the test (I don't know enough details to be sure of this though...).

Cheers,

Thomas Goirand (zigo)



Re: Preferred form of modification for binary data used in unit testing?

2020-07-17 Thread Marvin Renich
[This was just a convenient point in the thread to which to reply; it's
not really a reply to Sean's specific message.]

I think, instead of pedantically applying the wording of the DFSG, we
should be pedantically applying the intended purpose of the DFSG.  The
legal profession has proven, time and time again, that no written
language can perfectly express any sufficiently complex idea (nor can it
express perfectly many very simple ideas).

The intended purpose is to ensure that the recipient has every
reasonable opportunity to modify the software in any reasonable way the
recipient desires.  The sole purpose of the requirement for source is to
protect this freedom, and the requirement should not be applied
independently from this purpose.

So the question becomes how does the inclusion or exclusion of the
binary blob, without inclusion of the full source and build process of
the broken version of the software used to produce the binary blob,
enhance or detract from the recipient's ability to produce a modified
version of the current, good, distributed software.

First, recognize that in this case, the software may be built with or
without the binary blob present, and the resulting software will be the
same.  The blob is only used to allow the person modifying the software
to check for mistakes made during modification.  My opinion would very
likely be different if this were not the case.

In what way does the absence of the blob's source limit the recipient's
ability to modify the current version of the software?  What real,
reasonable (as opposed to hypothetical and unreasonable) kind of
brokenness of the _current_ version of the software would you want to
produce a test for, that not having the old broken version of the source
code would hinder?  The real answer is that the programmer is much more
likely to base such tests on slight modifications of the _current_ code
rather than the _old_ code that had one specific bug that was the
impetus for producing the blob as a test case.

So the reduction of freedom by including the blob without its original
source is infinitesimally small (if not zero).  It is made even smaller
in this specific case by the fact that the source is available from an
older Debian distribution, though this is really beside the point, as
the current application of the DFSG treats one version of Debian as a
stand-alone entity that cannot depend on software in other versions of
Debian.

On the other hand, by including the blob, the test suite used to prevent
regressions due to modifications is significantly more robust, which is
a huge increase in the recipient's ability to modify the software
without unintended consequences.

So, in my opinion, the inclusion of the blob provides a significant
increase in the freedoms that the DFSG is intended to protect, without
any real decrease.

...Marvin



Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Holger Levsen
hi,

doesn't the subject already tell that we are not talking about software
and it's freeness, but rather...

something else, something...  important?

(not that I'd know more, here & now.)

my point is: i do think this is out of scope for policy as it is. and
rightfully so.


-- 
cheers,
Holger

---
   holger@(debian|reproducible-builds|layer-acht).org
   PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C


signature.asc
Description: PGP signature


Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Sean Whitton
Hello,

On Thu 16 Jul 2020 at 05:19PM -07, Sean Whitton wrote:

> You would need the buggy version of the software if you wanted to
> make modified versions of the binary data to test for closely related
> bugs, for example.

Hmm, perhaps this is not true.  Perhaps for making closely related
broken data, you would instead want to directly modify the binary blob.

It seems that there is not a general answer to the question.  The binary
test data may or may not be in its preferred form for modification,
depending on how one would want to go about preparing other pieces of
test data.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Sean Whitton
Hello,

On Thu 16 Jul 2020 at 07:42PM +02, Bastian Blank wrote:

> On Thu, Jul 16, 2020 at 08:42:24AM -0700, Sean Whitton wrote:
>> I would remove the test data because it does not seem DFSG-conformant.
>
> Care to explain?  You can't claim DFSG violation without showing which
> part.

That was a bit unclear -- I meant that it seems like a DFSG violation to
include the binary data but not the source code for the program that
generates that data, not that the binary data is inherently unfree.

You would need the buggy version of the software if you wanted to
make modified versions of the binary data to test for closely related
bugs, for example.

> Also please explain how you would make sure the code is tested.

I don't have a good answer for you, but whether or not something is
DFSG-free is not dependent on what purposes it serves.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Steve McIntyre
Hey Philipp,

Philipp Hahn wrote:
>
>if a *previous* version of a software generated a *buggy* binary
>database, that bug got fixed in a *newer* version and also some
>*recovery* mechanism was added to allow reading that broken format
>*once*, but there is no code the write the *broken* file again. For
>*unit testing* the upstream developers added an *example* of such a
>broken database to their test data.
>What's the preferred form of modification for that data set?
>
>* Should I include a copy of the *broken code* to generate that data?
>* Declare that there in no preferred form for modification, as a
>"open-save"-cycle with the current code will not re-create the bit
>idencial file again.
>* Remove the test data because it is not DFSG conformant and hope the
>Debian build will never break the recovery code.
>* Include instructions on how to re-build the broken version and give
>instructions on how to maybe rebuild a similar broken file.

Firstly, removing the test data would be absurd - less-tested code
does not serve us or our users well. If it happens to be a binary
artifact that cannot be easily recreated, then explain that. The
binary artifact has become the preferred form for modification once
you have it.

In some cases you won't be able to sensibly reproduce the artifact
that causes a problem, but you keep it around to ensure test
coverage. IMHO there is no issue here.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
"You can't barbecue lettuce!" -- Ellie Crane



Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Russ Allbery
Philipp Hahn  writes:

> * Declare that there in no preferred form for modification, as a
> "open-save"-cycle with the current code will not re-create the bit
> idencial file again.

This is my gut reaction.  Modifying this piece of testing data is mostly
pointless.  It's kind of like asking what the preferred form of
modification of a PGP public key is.

One might want to generate *more* testing data, I guess, but is that worth
keeping the old code around forever?  I'm dubious the benefit is worth it.

-- 
Russ Allbery (r...@debian.org)  



Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Bastian Blank
On Thu, Jul 16, 2020 at 08:42:24AM -0700, Sean Whitton wrote:
> I would remove the test data because it does not seem DFSG-conformant.

Care to explain?  You can't claim DFSG violation without showing which
part.

Also please explain how you would make sure the code is tested.

Bastian

-- 
Killing is wrong.
-- Losira, "That Which Survives", stardate unknown



Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Sean Whitton
Hello Philipp,

On Wed 15 Jul 2020 at 09:45AM +02, Philipp Hahn wrote:

> Hi,
>
> if a *previous* version of a software generated a *buggy* binary
> database, that bug got fixed in a *newer* version and also some
> *recovery* mechanism was added to allow reading that broken format
> *once*, but there is no code the write the *broken* file again. For
> *unit testing* the upstream developers added an *example* of such a
> broken database to their test data.
> What's the preferred form of modification for that data set?
>
> * Should I include a copy of the *broken code* to generate that data?
> * Declare that there in no preferred form for modification, as a
> "open-save"-cycle with the current code will not re-create the bit
> idencial file again.
> * Remove the test data because it is not DFSG conformant and hope the
> Debian build will never break the recovery code.
> * Include instructions on how to re-build the broken version and give
> instructions on how to maybe rebuild a similar broken file.

I would remove the test data because it does not seem DFSG-conformant.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Johannes Schauer
Hi,

Quoting Christian Kastner (2020-07-16 14:08:34)
> On 2020-07-16 12:53, Pirate Praveen wrote:
> >> Generally speaking, I think it's a mistake to apply the question of
> >> "preferred form for modification" to unit test payloads. Unit tests are
> >> purely about functionality. The original source to a payload is an
> >> arbitrary choice (possibly even randomly generated), and could be
> >> replaced with any other appropriate arbitrary choice at no detriment to
> >> the software or the user.
> > I think this needs to be clearly documented in policy. I don't think
> > this interpretation is generally accepted. I have seen many cases where
> > tests are disabled for this reason.
> Perhaps I spoke too generally. For example, I can see, as one of
> probably many counter-examples, the case where the input is not
> completely arbitrary (eg: input is a captured stream).
> 
> But to take the other extreme, using completely arbitrary data, as an
> example: say my code implements a ROT13 function and I create a test for
> it using a blob of random data as well as the expected output.
> 
> That random data was generated somehow, eg: using Python's random
> module, and could therefore be regenerated given the correct program and
> seed. However, I did not include the code to generate that data.
> 
> Would we really reasonably expect anyone to act upon that random blob in
> any way?

I have another data point with one of my packages (genext2fs) where I made a
contribution to upstream. Their unit tests execute the program with some input
and a given set of parameters and then check that the md5sum of the created
ext2 filesystem image matches the expected value. Without thinking, I added the
following into their test script:

H4sIA+3WTW6DMBAF4Fn3FD6B8fj3PKAqahQSSwSk9vY1uKssGiJliFretzECJAYeY1s3JM4UKYRlLG7H5ZhdTIHZGevK+ZTYkgrypRFN17EdlKIh5/G3++5d/6N004qbA47er8/fWVduV2aLD7D7/A85C88Ba/ufA/sQIhk25VdA/2+h5t+1gx4/pd7vfv+Hm/ytmfNH/8vr+ql7e3UR8DK6uUx9L/uMtev/3P8p+KX/oyHlZMuqntX/9T34Z9yk9Gco8//xkGWf8Uj+Mbpl/Y+JVJQtq9r5/K+bj3Z474+Xk9wG4JH86/rvyzxAirfYnOw+/+vXWTb+uv9PaV3+JfiSv/WOlJVPf/f5AwAAAMD/9A0cPbO/ACgAAA==

This is a base64 encoded gzipped tarball with a few test files in it. I
generated it using GNU tar but since I found it likely that a GNU tar version
in the future (or the past) will produce a slightly different tarball and
because I needed some fixed input without different output on systems without
GNU tar (like BSD or MacOS) or on older systems or on future systems, I just
dumped that binary blob into the upstream software. In the meantime, that
binary blob is even in the Debian package:

https://sources.debian.org/src/genext2fs/1.5.0-1/test.sh/#L89

The curious thing for me personally is, that I didn't feel bad about this at
all and at no point from writing the code up to me packaging and uploading the
Debian package containing the blob, I thought even twice about whether this is
DFSG compliant or not. Only now after having read this thread I start wondering
whether I have actually created an RC bug myself. Did I? I love the principles
of the DFSG and it really surprises me that despite my love for these freedoms
I didn't think twice about including that binary blob instead of generating it
on the fly. Was my mind fooled by how short the blob is? A perl script
generating the tarball such that it's bit-by-bit identical across all platforms
would be longer than this blob.

What do you guys think? Should I put work into writing a script which produces
above binary blob as part of the test suite to avoid having my package be RC
buggy? I would love to get some guidance.

Thanks!

cheers, josch

signature.asc
Description: signature


Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Christian Kastner
On 2020-07-16 12:53, Pirate Praveen wrote:
>> Generally speaking, I think it's a mistake to apply the question of
>> "preferred form for modification" to unit test payloads. Unit tests are
>> purely about functionality. The original source to a payload is an
>> arbitrary choice (possibly even randomly generated), and could be
>> replaced with any other appropriate arbitrary choice at no detriment to
>> the software or the user.
>>
> 
> I think this needs to be clearly documented in policy. I don't think
> this interpretation is generally accepted. I have seen many cases where
> tests are disabled for this reason.

Perhaps I spoke too generally. For example, I can see, as one of
probably many counter-examples, the case where the input is not
completely arbitrary (eg: input is a captured stream).

But to take the other extreme, using completely arbitrary data, as an
example: say my code implements a ROT13 function and I create a test for
it using a blob of random data as well as the expected output.

That random data was generated somehow, eg: using Python's random
module, and could therefore be regenerated given the correct program and
seed. However, I did not include the code to generate that data.

Would we really reasonably expect anyone to act upon that random blob in
any way?





Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Pirate Praveen




On Thu, Jul 16, 2020 at 12:28, Christian Kastner  wrote:

On 2020-07-15 09:45, Philipp Hahn wrote:

 if a *previous* version of a software generated a *buggy* binary
 database, that bug got fixed in a *newer* version and also some
 *recovery* mechanism was added to allow reading that broken format
 *once*, but there is no code the write the *broken* file again. For
 *unit testing* the upstream developers added an *example* of such a
 broken database to their test data.
 What's the preferred form of modification for that data set?

 * Should I include a copy of the *broken code* to generate that 
data?

 * Declare that there in no preferred form for modification, as a
 "open-save"-cycle with the current code will not re-create the bit
 idencial file again.
 * Remove the test data because it is not DFSG conformant and hope 
the
 Debian build will never break the recovery code.> * Include 
instructions on how to re-build the broken version and give

 instructions on how to maybe rebuild a similar broken file.


Personally, I would do nothing at all. At most, I would choose the 
last

of the above options (include instructions).

This is about the payload to a particular decoding unit test. It's a
common pattern to generate such payloads without storing the original
source or even intermediate steps -- which, unless I'm mistaken, would
imply that the final result has become the preferred form for
modification. The expectation is simply for a particular chunk of data
to produce a particular output.

I think it is reasonable to assume that upstream generated the broken
file with the old code, implemented the unit test, and discarded the
broken code. So given the current (shipped) version of the software,
even upstream couldn't recreate the broken file.

Generally speaking, I think it's a mistake to apply the question of
"preferred form for modification" to unit test payloads. Unit tests 
are

purely about functionality. The original source to a payload is an
arbitrary choice (possibly even randomly generated), and could be
replaced with any other appropriate arbitrary choice at no detriment 
to

the software or the user.



I think this needs to be clearly documented in policy. I don't think 
this interpretation is generally accepted. I have seen many cases where 
tests are disabled for this reason.





Re: Preferred form of modification for binary data used in unit testing?

2020-07-16 Thread Christian Kastner
On 2020-07-15 09:45, Philipp Hahn wrote:
> if a *previous* version of a software generated a *buggy* binary
> database, that bug got fixed in a *newer* version and also some
> *recovery* mechanism was added to allow reading that broken format
> *once*, but there is no code the write the *broken* file again. For
> *unit testing* the upstream developers added an *example* of such a
> broken database to their test data.
> What's the preferred form of modification for that data set?
> 
> * Should I include a copy of the *broken code* to generate that data?
> * Declare that there in no preferred form for modification, as a
> "open-save"-cycle with the current code will not re-create the bit
> idencial file again.
> * Remove the test data because it is not DFSG conformant and hope the
> Debian build will never break the recovery code.> * Include instructions on 
> how to re-build the broken version and give
> instructions on how to maybe rebuild a similar broken file.

Personally, I would do nothing at all. At most, I would choose the last
of the above options (include instructions).

This is about the payload to a particular decoding unit test. It's a
common pattern to generate such payloads without storing the original
source or even intermediate steps -- which, unless I'm mistaken, would
imply that the final result has become the preferred form for
modification. The expectation is simply for a particular chunk of data
to produce a particular output.

I think it is reasonable to assume that upstream generated the broken
file with the old code, implemented the unit test, and discarded the
broken code. So given the current (shipped) version of the software,
even upstream couldn't recreate the broken file.

Generally speaking, I think it's a mistake to apply the question of
"preferred form for modification" to unit test payloads. Unit tests are
purely about functionality. The original source to a payload is an
arbitrary choice (possibly even randomly generated), and could be
replaced with any other appropriate arbitrary choice at no detriment to
the software or the user.



Preferred form of modification for binary data used in unit testing?

2020-07-15 Thread Philipp Hahn
Hi,

if a *previous* version of a software generated a *buggy* binary
database, that bug got fixed in a *newer* version and also some
*recovery* mechanism was added to allow reading that broken format
*once*, but there is no code the write the *broken* file again. For
*unit testing* the upstream developers added an *example* of such a
broken database to their test data.
What's the preferred form of modification for that data set?

* Should I include a copy of the *broken code* to generate that data?
* Declare that there in no preferred form for modification, as a
"open-save"-cycle with the current code will not re-create the bit
idencial file again.
* Remove the test data because it is not DFSG conformant and hope the
Debian build will never break the recovery code.
* Include instructions on how to re-build the broken version and give
instructions on how to maybe rebuild a similar broken file.

Philipp

PS: This question is motivated while working on a private build of
> E: keepassxc source: source-is-missing tests/data/keepassxc.opvault/default