Re: Re: autopkgtest requiring large data sets (pique, hinge)

2021-12-22 Thread Nilesh Patra
Hello,

On 22 December 2021 8:06:57 pm IST, Lance Lin  wrote:
>> No, not really. autopkgtest has a `needs-internet` restriction, so you can 
>> access internet to get stuff. See here:
>>
>> https://people.debian.org/~eriberto/README.package-tests.html
>>
>> But yeah, this is usually better, since the server you fetch data from might 
>> choke someday, or might turn unresponsive or maybe block IPs if you do 
>> several `get` requests to it (which the CI machines would do) and so on, 
>> then that's a problem.
>
>Would it be acceptable to create salsa repos that hold the test data for 
>various medical packages (pique-data, hinge-data)? After ensuring that the 
>data sets are public domain with appropriate credit given, we could then 
>reference a fixed salsa repo. It would still require the 'needs-internet' 
>restriction but would ensure the data is available.

We had that discussion many months ago, and for several reasons, I think it's a 
bad idea.
I've mentioned all the reasons here [1] please consider to give it a read.

We eventually had a consensus to embed test data, which I then later added to 
our policy as well[2]

This solved our problem of testing data upto a few MBs which is fine for us.
But having gigabyte sized data is not very nice in any of our interests since 
it puts high load for us as contributors, and puts high load on CI machines as 
well.

Infact, if the size of things you're pulling/testing exceeds many gigabytes, an 
RC bug will be filed against the package. One prominent example that I remember 
is tiddit, take a look here[3]

[1]: https://lists.debian.org/debian-med/2020/09/msg00365.html
[2]:
https://med-team.pages.debian.net/policy/#embedding-large-test-data
[3]: 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964101

Hope that helps clarify things a bit,
Nilesh



Re: Re: autopkgtest requiring large data sets (pique, hinge)

2021-12-22 Thread Lance Lin
Nilesh, Pierre,

Thank you for the response.

> > Yes please, making efforts to write tests is definitely worth it. From my 
> > experience, you might contact upstream developers to ask them for 
> > meaningful commands requiring no more data that the ones that are in the 
> > source tree. Friendly upstreams usually  I would second that. If possible, ask upstream for sensible data size that is 
> manageable under a few MBs.

I will reach out to the upstreams to see if they have any smaller test cases.

> No, not really. autopkgtest has a `needs-internet` restriction, so you can 
> access internet to get stuff. See here:
>
> https://people.debian.org/~eriberto/README.package-tests.html
>
> But yeah, this is usually better, since the server you fetch data from might 
> choke someday, or might turn unresponsive or maybe block IPs if you do 
> several `get` requests to it (which the CI machines would do) and so on, then 
> that's a problem.

Would it be acceptable to create salsa repos that hold the test data for 
various medical packages (pique-data, hinge-data)? After ensuring that the data 
sets are public domain with appropriate credit given, we could then reference a 
fixed salsa repo. It would still require the 'needs-internet' restriction but 
would ensure the data is available.

Based on Tony's response in the thread, perhaps the data sets for this type of 
processing are large out of necessity? This is what led me to think of the 
above solution.

Lance Lin 
GPG Fingerprint:  8CAD 1250 8EE0 3A41 7223  03EC 7096 F91E D75D 028F

signature.asc
Description: OpenPGP digital signature